前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >终于看到了一个完整的mutect2使用脚本

终于看到了一个完整的mutect2使用脚本

作者头像
生信技能树
发布2018-07-27 09:51:46
1.8K0
发布2018-07-27 09:51:46
举报
文章被收录于专栏:生信技能树生信技能树

因为嫌麻烦,所以一直使用的是简化版mutect2流程,其实就一个命令:

代码语言:javascript
复制
time $GATK  --java-options "-Xmx10G -Djava.io.tmpdir=./"  Mutect2 -R $reference \
-I $tumor_bam  -tumor $(basename "$tumor_bam" _recal.bam) \
-I $normal_bam -normal $(basename "$normal_bam" _recal.bam) \
-O ${sample}_mutect2.vcf
$GATK  FilterMutectCalls -V ${sample}_mutect2.vcf -O ${sample}_somatic.vcf

但是很明显,这其实不符合官网教程的理念

https://gatkforums.broadinstitute.org/gatk/discussion/9183/how-to-call-somatic-snvs-and-indels-using-mutect2 https://software.broadinstitute.org/gatk/documentation/article?id=9183

但是官网教程的确太繁琐,里面涉及到6个步骤,而我只是运行了最后一个!

因为种种原因,没能抽出时间细致的探索mutect2的用法,但是无意中搜索到脚本一个: https://figshare.com/articles/scripts_sh/4542397

可以说是非常良心了:

代码语言:javascript
复制
#!/bin/sh

##GATK bundle download
wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/ucsc.hg19.fasta.gz -O /path/to/GATK/bundle/ucsc.hg19.fasta.gz
wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/dbsnp_138.hg19.vcf.gz -O /path/to/GATK/bundle/dbsnp_138.hg19.vcf.gz
wget https://ndownloader.figshare.com/files/7354246 -O /path/to/GATK/bundle/TP53.sorted.bed
wget https://ndownloader.figshare.com/files/7354213 -O /path/to/GATK/bundle/CosmicCodingMuts.chr.sort.head.vcf

##ANNOVAR database files download
export PATH=$PATH:/path/to/annovar

annotate_variation.pl -buildver hg19 -downdb -webfrom annovar refGene humandb/
annotate_variation.pl -buildver hg19 -downdb cytoBand humandb/
annotate_variation.pl -buildver hg19 -downdb genomicSuperDups humandb/ 
annotate_variation.pl -buildver hg19 -downdb -webfrom annovar esp6500siv2_all humandb/
annotate_variation.pl -buildver hg19 -downdb -webfrom annovar 1000g2015aug humandb/
annotate_variation.pl -buildver hg19 -downdb -webfrom annovar snp138 humandb/
annotate_variation.pl -buildver hg19 -downdb -webfrom annovar dbnsfp30a humandb/
annotate_variation.pl -buildver hg19 -downdb -webfrom annovar cosmic70 humandb/
annotate_variation.pl -buildver hg19 -downdb -webfrom annovar exac03 humandb/ 
annotate_variation.pl -buildver hg19 -downdb -webfrom annovar clinvar_20160302 humandb/



#define the reference file path
GATK=/path/to/GATK/GenomeAnalysisTK.jar
REF=/path/to/GATK/bundle/ucsc.hg19.fasta
DBSNP=/path/to/GATK/bundle/dbsnp_138.hg19.vcf
COSMIC=/path/to/GATK/bundle/CosmicCodingMuts.chr.sort.head.vcf
BED=/path/to/GATK/bundle/TP53.sorted.bed

#create a Panel of Normals (PoN) vcf file from the 4 low-grade tumour samples
for pathandfile in /path/to/EGA/normal/*.clean.recal.TP53.bam ; do
    basewithpath=${pathandfile%.clean.recal.TP53.*}
    basenopath=$(basename $basewithpath)

java -jar $GATK \
    -T MuTect2 \
    -R $REF \
    -I:tumor $(echo $basewithpath).clean.recal.TP53.bam \
    --dbsnp $DBSNP \
    --cosmic $COSMIC \
    --artifact_detection_mode \
    -L $BED \
    -o $(echo $basewithpath).clean.recal.TP53.normal.vcf
done


java -jar $GATK \
    -T CombineVariants \
    -R $REF \
    -V /path/to/EGA/normal/GB544-10_S18.clean.recal.TP53.normal.vcf -V /path/to/EGA/normal/GB624-11_S81.clean.recal.TP53.normal.vcf -V /path/to/EGA/normal/GB730-12_S41.clean.recal.TP53.normal.vcf -V /path/to/EGA/normal/GB909-13_S90.clean.recal.TP53.normal.vcf \
    -minN 2 \
    --setKey "null" \
    --filteredAreUncalled \
    --filteredrecordsmergetype KEEP_IF_ANY_UNFILTERED \
    -L $BED \
    -o /path/to/EGA/normal/MuTect2_PON.vcf
    
    
    

#call somatic variants
for pathandfile in /path/to/EGA/tumor/*.clean.recal.TP53.bam ; do
    basewithpath=${pathandfile%.clean.recal.TP53.*}
    basenopath=$(basename $basewithpath)
    
java -jar $GATK \
    -T MuTect2 \
    -R $REF \
    --dbsnp $DBSNP \
    --cosmic $COSMIC \
    -dt NONE \
    --input_file:tumor $(echo $basewithpath).clean.recal.TP53.bam \
    --intervals $BED \
    -PON /path/to/EGA/normal/MuTect2_PON.vcf \
    -o $(echo $basewithpath).clean.recal.TP53.vcf

done




#ANNOVAR annotation
mkdir /path/to/EGA/tumor/ANNOVAR
cp /path/to/EGA/tumor/*.vcf /path/to/EGA/tumor/ANNOVAR

##convert file format
for pathandfile in /path/to/EGA/tumor/ANNOVAR/*.vcf ; do
    filename=${pathandfile%.*} 
    convert2annovar.pl --format vcf4 --includeinfo --withzyg $pathandfile > $(echo $filename).avinput
done

##add sample information
for pathandfile in /path/to/EGA/tumor/ANNOVAR/*.avinput ; do 
    filename=$(basename $pathandfile)
    mainfilename=${filename%.clean.recal.TP53.avinput}
    pathandmain=${pathandfile%.*}
    awk -v var1="$mainfilename" '{OFS = "\t" ; print $0, var1}' $pathandfile > $(echo $pathandmain).avinputs
done

##merge file
cat /path/to/EGA/tumor/ANNOVAR/*.avinputs > /path/to/EGA/tumor/ANNOVAR/TP53.Tonly.avinput
    
##annotate
table_annovar.pl /path/to/EGA/tumor/ANNOVAR/TP53.Tonly.avinput /path/to/annovar/humandb/ -buildver hg19 -out /path/to/EGA/tumor/ANNOVAR/TP53.Tonly -remove -protocol refGene,cytoBand,genomicSuperDups,esp6500siv2_all,1000g2015aug_all,snp138,dbnsfp30a,cosmic70,exac03,clinvar_20160302 -operation g,r,r,f,f,f,f,f,f,f --otherinfo

虽然这个流程是基于 hg19 参考基因组的,但是很容易就能改写为 hg38版本的!

还有一个问题是他那个流程的项目背景是,得到的肿瘤样品是没有配对normal的,而是 create a Panel of Normals (PoN) vcf file from the 4 low-grade tumour samples

而且,这个流程是基于 GATK3成熟版本的,并不是GATK4哦。

如果大家有GATK4的MUTECT2经验,可以交流一下哈。

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2018-06-23,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 生信技能树 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档