之前看论文从全基因组重测序数据中提取叶绿体的reads会使用blast,自己一直在想如何具体实施,原来blast有一款工具专门在做这个事情的 —— Magic-Blast Magic-Blast is a tool for mapping large next-generation RNA or DNA sequencing runs against a whole genome or transcriptome. The reference genome or teanscriptome can be given as a Blast database or a Fasta file. it is preferable to use Blast database for large genomes, such as human, or transcript collections. The full list of options is listed when you use
-help
option. 论文题目、发表期刊及发表时间 Magic-BLAST, an accurate DNA and RNA-seq aligned for long and short reads 好像还没有发表,自己是在bioRxiv上找到的论文 first posted online Aug. 13, 2018 doi: http://dx.doi.org/10.1101/390013
makeblastdb -in Malus_baccata.fasta -dbtype nucl -parse_seqids -out Malus_baccata
-in
参考序列
-dbtype
数据类型:核苷酸和蛋白质可选
-parse_seqids
暂时还没搞懂这个参数的意思
-out
数据库的名称
# 默认输入文件为fasta格式
# 单个fasta文件
magicblast -query reads.fasta -db Malus_baccata
# 两个fasta文件
magicblast -query reads.fasta -query_mate mates.fasta -db Malus_baccata
# 如果输入文件为fastq格式
magicblast -query reads.fastq -db Malus_baccata -infmt fastq
# 双端数据
magicblast -query reads_R1.fastq -query_mate reads_R2.fastq -db Malus_baccata -infmt fastq
-num_threads
参数magicblast -query reads.fasta -db genome -num_threads 10