前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >文献笔记二十九:银合欢(Leucaena trichandra)线粒体基因组

文献笔记二十九:银合欢(Leucaena trichandra)线粒体基因组

作者头像
用户7010445
发布2020-08-17 16:29:01
9050
发布2020-08-17 16:29:01
举报
文章题目

PacBio-Based Mitochondrial Genome Assembly of Leucaena trichandra (Leguminosae) and an Intrageneric Assessment of Mitochondrial RNA Editing

发表期刊、单位、年份

GBE Genome Biology and Evolution Accepted: August 17, 2018 New Mexico State University Department of Systematic and Evolutionary Botany, University of Zurich, Switzerland(苏黎世大学) 论文本地存储名:evy179.pdf

现阶段还是重点关注完整线粒体的组装方法,原文数据公开,还公布了组装使用的shell脚本,争取重复组装过程

DNA Extraction, and Sequencing

sapling 树苗 polysaccharide 多糖 Aquagenomic DNA extraction protocol For each extraction 10 mg of fresh young leaf material was obtained from a L. trichandra sapling that had been kept in the dark for 24h to reduce polysaccharide concentration. DNA with an average fragment size of 21 kbp was submitted for sequencing. PacBIo P6-C4 chemistry

Genome Assembly

followed an iterative approach begins with the assembly of highly conserved regions and extends from that starting point. The pipeline involved:

  • using BLASR to map raw reads against the reference
  • filtering hits by a minumum aligned length (500 bp)
  • recovering the qualifying reads to a new fastq file using seqtk
  • assembling reads with Canu.

The L.trichandra PacBio reads provided sufficient long read data to also assemble the mitochondrial genome. Nonetheless, when we identified likely mt-genome contigs recovered from assemblies derived from all the available reads (which includes mitochndrial, nuclear, and plastid data in large computationally intensive analyses), the mitochondrial portion was moderately fragmented (> 7 contigs).

计算机资源:The project primarily employed an AMD7252 32 core server with 256 GB of RAM.

将路径改和数据替换为自己的以后运行脚本,遇到报错

代码语言:javascript
复制
[Pomgroup@localhost Pome_Mito_practice]$ bash Iternative_assembly_Pome_Mito.sh 
Iternative_assembly_Pome_Mito.sh: line 2: $'\r': command not found
Iternative_assembly_Pome_Mito.sh: line 4: syntax error near unexpected token `$'\r''
'ternative_assembly_Pome_Mito.sh: line 4: `

解决办法

https://hacpai.com/article/1488765818607

代码语言:javascript
复制
sed -i 's/\r$//' Iternative_assembly_Pome_Mito.sh

原因解释

https://blog.csdn.net/Lnho2015/article/details/51322289

Linux的基础知识还有好多得仔细看!

脚本对应的链接

https://github.com/cdb3ny/Mitochondrial-Genome-Scripts/blob/master/Iternative_assembly_script.sh

脚本中用到的命令逐行解释
  • 首先是blasr比对 用法是
代码语言:javascript
复制
blasr nanopore.fastq reference.fasta --nproc 16 > blasr.out

blasr.out 好像对应的是 https://github.com/PacificBiosciences/blasr/wiki/Blasr-Output-Format

这个链接上的 -m为1

  • 操作输出结果blasr.out
代码语言:javascript
复制
awk '{a=$8-$7;print $0,a;}' blastr.out

第8列减去第7列赋值给a并且将a添加到文件的最后一列

代码语言:javascript
复制
awk '{a=$8-$7;print $0,a;}' blastr.out | sort -n -r -k14,14

按照第14列倒叙排列

代码语言:javascript
复制
awk '{a=$8-$7;print $0,a;}' blastr.out | sort -n -r -k14,14 | awk '$14>500'

第14列大于500的行

代码语言:javascript
复制
awk '{a=$8-$7;print $0,a;}' blastr.out | sort -n -r -k14,14 | awk '$14>500' | cut -d ' ' -f1,1

以空格作为分隔符分割然后提取第一列 这样就得到了比对长度大于500的fastq的reads的id

代码语言:javascript
复制
grep -F -x -v -f

这行命令是干什么的还不知道

根据id提取序列(fastq)
代码语言:javascript
复制
seqtk subseq nanopore.fasta  ids.txt > aligned.fastq
canu组装
代码语言:javascript
复制
canu -p hehuan -d hehuan-oxford genomeSize=2000k -nanopore-raw aligned.fastq

最后再用canu软件组装的结果作为参考序列重复这个过程,原论文的脚本for i in 1:10相当于是重复了10次这个过程。

好了,这篇文章暂时看到这里了

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2020-08-12,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 小明的数据分析笔记本 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 文章题目
  • 发表期刊、单位、年份
  • DNA Extraction, and Sequencing
  • Genome Assembly
  • Linux的基础知识还有好多得仔细看!
  • 脚本中用到的命令逐行解释
  • 根据id提取序列(fastq)
  • canu组装
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档