前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >开箱即用版本 满分室间质评之GATK Somatic SNV+Indel+CNV+SV(2024-04-30更新)

开箱即用版本 满分室间质评之GATK Somatic SNV+Indel+CNV+SV(2024-04-30更新)

原创
作者头像
SliverWorkspace
修改2024-05-01 15:46:24
8200
修改2024-05-01 15:46:24
举报

本次更新内容:开箱即用的pipeline,能够根据样本version_reference自动选择参考基因组版本,根据project_bed文件选择项目bed,自动初始化环境、安装所需软件、下载ref文件和数据库的版本

最近准备为sliverworkspace 图形化生信平台开发报告设计器,需要一个较为复杂的pipeline作为测试数据,就想起来把之前的 [[满分室间质评之GATK Somatic SNV+Indel+CNV+SV(下)性能优化]]翻出来用一下。跑了一遍发现还是各种问题,于是想把pipeline改造成免部署、首次运行初始化环境的版本,以便需要时候能够直接运行起来,于是有了本文。

为了让pipeline能够直接运行,无需部署,这里使用docker容器保证运行环境的一致性:见文章:[[基于docker的生信基础环境镜像构建]],这里采用的方案是带ssh服务的docker+conda环境,整个pipeline在一个通用容器中运行。

本文代码较长,可能略复杂,想直接运行的可以下载 workflow文件,导入sliverworkspace图形化生信平台直接运行,获取并查看图形化分析报告

相关代码和workflow文件可以访问笔者github项目地址导入操作

导入操作

分析流程整体概览

docker 镜像拉取及部署配置

代码语言:javascript
复制
 # 拉取docker镜像
 docker     pull     doujiangbaozi/sliverworkspace:1.10

docker-compose.yml配置文件

代码语言:javascript
复制
 version: "3"
 services:
   GATK:
     image: doujiangbaozi/sliverworkspace:1.10
     container_name: GATK
     volumes:
       - /home/sliver/Data/data:/opt/data:rw                                #挂载输入数据目录
       - /home/sliver/Manufacture/gatk/envs:/root/mambaforge-pypy3/envs     #挂载envs目录
       - /home/sliver/Manufacture/sliver/ref:/opt/ref:rw                    #挂载reference目录
       - /home/sliver/Manufacture/gatk/result:/opt/result:rw                #挂载中间文件和结果目录
     environment:
       - TZ=Asia/Shanghai                                                   #设置时区
       - PS=20191124                                                        #设置ssh密码
       - PT=9024                                                            #设置ssh连接端口

docker 镜像部署运行

代码语言:javascript
复制
 # 在docker-compose.yml文件同级目录下运行
 docker-compose up -d
 ​
 # 或者docker-compose -f docker-compose.yml所在目录
 docker-compose -f somedir/docker-compose.yml up -d
 ​
 # 可以通过ssh连接到docker 运行pipeline命令了,连接端口和密码见docker-compose.yml配置文件相关字段
 ssh root@127.0.0.1 -p9024

测试数据

测试数据来自2017年卫计委室间质评提供的bed文件(pipeline会自动下载)和测试数据,修改命名以匹配pipeline输入端,也可以替换为自己的数据文件,因为室间质评目前参考基因组还停留在hg19版本,所以本流程仍然使用hg19(GRCH37),如果要切换到hg38,可以将version_reference变量值设置为hg38,project_bed设置为Illumina_pt2_hg38.bed。pipeline会使用hg38(GRCH38)版本和对应的bed文件。

文件名(按照需要有调整)

文件大小

MD5

B1701_R1.fq.gz

4.85G

07d3cdccee41dbb3adf5d2e04ab28e5b

B1701_R2.fq.gz

4.77G

c2aa4a8ab784c77423e821b9f7fb00a7

B1701NC_R1.fq.gz

3.04G

4fc21ad05f9ca8dc93d2749b8369891b

B1701NC_R2.fq.gz

3.11G

bc64784f2591a27ceede1727136888b9

变量名称

变量名

变量值

备注

sn

1701

样本编号

pn

GS03

pipeline 代号 GATK Somatic 03版本

project_bed

Illumina_pt2.bed

参考基因组hg19下的bed文件

version_reference

hg19

人参考基因组版本hg19 / hg38

version_fastqc

0.12.1

fastqc 版本

version_multiqc

1.21

multiqc 版本

version_cnvkit

0.9.10

cnvkit 版本

version_manta

1.6.0

manta版本

version_gatk

4.3.0.0

gatk 版本

version_sambamba

1.0.1

sambamba 版本

version_samtools

1.17

samtools 版本

version_bwa

0.7.17

bwa 版本

version_fastp

0.23.2

fastp 版本

version_vep

108

vep 版本

envs

/root/mambaforge-pypy3/envs

mamba envs 目录

threads

32

最大线程数

memory

32G

内存占用

scatter

8

Mutect2 分拆并行运行interval list 份数

event

2

gatk FilterMutectCalls --max-events-in-region 数值

snv_tlod

16.00

snv 过滤参数 tload 值

snv_vaf

0.01

snv 过滤参数 丰度/突变频率

snv_depth

500

snv 过滤参数 支持reads数/depth 测序深度

cnv_dep

1000

cnv 过滤参数 支持reads数/depth 测序深度

cnv_min

-0.5

cnv 过滤参数 log2最小值

cnv_max

0.5

cnv 过滤参数 log2 最大值

sv_score

200

sv 过滤参数 score

Pipeline/workflow 具体步骤:

1. fastp 默认参数过滤,看下原始数据质量,clean data

代码语言:javascript
复制
 #conda检测环境是否存在,首次运行不存在创建该环境并安装软件
 if [ ! -d "${envs}/${pn}.fastp" ]; then
     echo "Creating the environment ${pn}.fastp"
     mamba create -n ${pn}.fastp -y fastp=${version_fastp} fastqc=${version_fastqc} multiqc=${version_multiqc}
 fi
 ​
 mamba   activate ${pn}.fastp
 ​
 mkdir -p ${result}/${sn}/trimmed
 mkdir -p ${result}/${sn}/qc
 ​
 fastqc -t ${threads}\
     ${data}/GS03/${sn}_tumor_R1.fq.gz \
     ${data}/GS03/${sn}_tumor_R2.fq.gz \
     -o ${result}/${sn}/qc &
 ​
 fastqc -t ${threads}\
     ${data}/GS03/${sn}_normal_R1.fq.gz \
     ${data}/GS03/${sn}_normal_R2.fq.gz \
     -o ${result}/${sn}/qc &
 ​
 fastp -w 16 \
     -i ${data}/GS03/${sn}_tumor_R1.fq.gz  \
     -I ${data}/GS03/${sn}_tumor_R2.fq.gz  \
     -o ${result}/${sn}/trimmed/${sn}_tumor_R1_trimmed.fq.gz \
     -O ${result}/${sn}/trimmed/${sn}_tumor_R2_trimmed.fq.gz \
     -h ${result}/${sn}/trimmed/${sn}_tumor_fastp.html \
     -j ${result}/${sn}/trimmed/${sn}_tumor_fastp.json &
 ​
 fastp -w 16 \
     -i ${data}/GS03/${sn}_normal_R1.fq.gz  \
     -I ${data}/GS03/${sn}_normal_R2.fq.gz  \
     -o ${result}/${sn}/trimmed/${sn}_normal_R1_trimmed.fq.gz \
     -O ${result}/${sn}/trimmed/${sn}_normal_R2_trimmed.fq.gz \
     -h ${result}/${sn}/trimmed/${sn}_normal_fastp.html \
     -j ${result}/${sn}/trimmed/${sn}_normal_fastp.json &
 wait
 ​
 fastqc -t ${threads}\
     ${result}/${sn}/trimmed/${sn}_tumor_R1_trimmed.fq.gz \
     ${result}/${sn}/trimmed/${sn}_tumor_R2_trimmed.fq.gz \
     -o ${result}/${sn}/qc &
 ​
 fastqc -t ${threads}\
     ${result}/${sn}/trimmed/${sn}_normal_R1_trimmed.fq.gz \
     ${result}/${sn}/trimmed/${sn}_normal_R2_trimmed.fq.gz \
     -o ${result}/${sn}/qc &
 ​
 wait
 ​
 mamba   deactivate

2. normal文件fastq比对到参考基因组,sort 排序,mark duplicate 得到 marked.bam

代码语言:javascript
复制
 #conda检测环境是否存在,首次运行不存在创建该环境并安装软件
 if [ ! -d "${envs}/${pn}.align" ]; then
     mamba create -n ${pn}.align -y bwa=${version_bwa} samtools=${version_samtools} 
 fi
 ​
 #从github下载sambamba static 比 mamba 安装的版本速度快1倍以上.这是个很诡异的地方
 if [ ! -f "${envs}/${pn}.align/bin/sambamba" ]; then
     aria2c https://github.com/biod/sambamba/releases/download/v${version_sambamba}/sambamba-${version_sambamba}-linux-amd64-static.gz -d ${envs}/${pn}.align/bin
     gzip -cdf ${envs}/${pn}.align/bin/sambamba-${version_sambamba}-linux-amd64-static.gz  >  ${envs}/${pn}.align/bin/sambamba 
     chmod a+x ${envs}/${pn}.align/bin/sambamba
 fi
 ​
 mamba   activate ${pn}.align
 ​
 if [ "${version_reference}" == hg19 ]; then
     echo    "USE reference Version : ${version_reference}"
     mkdir   -p /opt/ref/hg19
     #如果没有检测到参考基因组序列,则下载序列并使用bwa创建索引
     if [ ! -f "/opt/ref/hg19/hg19.fasta" ]; then
         aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/ucsc.hg19.fasta.gz -d /opt/ref/hg19 -o hg19.fasta.gz
         cd /opt/ref/hg19 && gzip -d /opt/ref/hg19/hg19.fasta.gz
     else
         if [ -f "/opt/ref/hg19/ucsc.hg19.fasta.gz.aria2" ]; then
             aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/ucsc.hg19.fasta.gz -c -d /opt/ref/hg19 -o hg19.fasta.gz
             cd /opt/ref/hg19 && gzip -d /opt/ref/hg19/hg19.fasta.gz
         fi
     fi
     
     if  [ ! -f /opt/ref/hg19/hg19.fasta.amb ] ||
         [ ! -f /opt/ref/hg19/hg19.fasta.ann ] ||
         [ ! -f /opt/ref/hg19/hg19.fasta.bwt ] ||
         [ ! -f /opt/ref/hg19/hg19.fasta.pac ] ||
         [ ! -f /opt/ref/hg19/hg19.fasta.sa ]; then
         bwa index /opt/ref/hg19/hg19.fasta
     fi
     #检测samtools索引是否存在,如不存在则使用samtools创建参考基因组索引
     if [ ! -f "/opt/ref/hg19/hg19.fasta.fai" ]; then
         samtools faidx /opt/ref/hg19/hg19.fasta
     fi
 elif [ "${version_reference}" == "hg38" ]; then
     echo    "USE reference Version : ${version_reference}"
     mkdir   -p /opt/ref/hg38
 #如果没有检测到参考基因组序列,则下载序列并使用bwa创建索引
     if [ ! -f "/opt/ref/hg38/hg38.fasta" ]; then
         aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Homo_sapiens_assembly38.fasta.gz -d /opt/ref/hg38 -o hg38.fasta.gz
         cd /opt/ref/hg38 && gzip -d /opt/ref/hg38/hg38.fasta.gz
     else
         if [ -f "/opt/ref/hg38/hg38.fasta.gz.aria2" ]; then
             aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Homo_sapiens_assembly38.fasta.gz -d /opt/ref/hg38 -d /opt/ref/hg38 -o hg38.fasta.gz
             cd /opt/ref/hg38 && gzip -d /opt/ref/hg38/hg38.fasta.gz
         fi
     fi
     
     if  [ ! -f /opt/ref/hg38/hg38.fasta.amb ] ||
         [ ! -f /opt/ref/hg38/hg38.fasta.ann ] ||
         [ ! -f /opt/ref/hg38/hg38.fasta.bwt ] ||
         [ ! -f /opt/ref/hg38/hg38.fasta.pac ] ||
         [ ! -f /opt/ref/hg38/hg38.fasta.sa ]; then
         bwa index /opt/ref/hg38/hg38.fasta
     fi
     #检测samtools索引是否存在,如不存在则使用samtools创建参考基因组索引
     if [ ! -f "/opt/ref/hg38/hg38.fasta.fai" ]; then
         samtools faidx /opt/ref/hg38/hg38.fasta
     fi
 fi
 ​
 ​
 ​
 ​
 mkdir -p ${result}/${sn}/aligned
 #比对基因组管道输出成bam文件,管道输出排序
 bwa mem \
     -t ${threads} -M \
     -R "@RG\\tID:${sn}_normal\\tLB:${sn}_normal\\tPL:Illumina\\tPU:Miseq\\tSM:${sn}_normal" \
     /opt/ref/${version_reference}/${version_reference}.fasta  ${result}/${sn}/trimmed/${sn}_normal_R1_trimmed.fq.gz ${result}/${sn}/trimmed/${sn}_normal_R2_trimmed.fq.gz \
     | sambamba view -S -f bam -l 0 /dev/stdin \
     | sambamba sort -t ${threads} -m 2G --tmpdir=${result}/${sn}/aligned -o ${result}/${sn}/aligned/${sn}_normal_sorted.bam /dev/stdin
 ​
 #防止linux打开文件句柄数超过限制,报错退出
 ulimit -n 10240
 ​
 #使用sambamba对sorted bam文件标记重复
 sambamba markdup \
     --tmpdir ${result}/${sn}/aligned \
     -t ${threads} ${result}/${sn}/aligned/${sn}_normal_sorted.bam ${result}/${sn}/aligned/${sn}_normal_marked.bam 
 ​
 #修改marked bam文件索引名,gatk和sambamba索引文件名需要保持一致
 mv  ${result}/${sn}/aligned/${sn}_normal_marked.bam.bai ${result}/${sn}/aligned/${sn}_normal_marked.bai
 #删除sorted bam文件
 rm -f ${result}/${sn}/aligned/${sn}_normal_sorted.bam*
 ​
 mamba   deactivate

3. tumor文件fastq比对到参考基因组,排序,mark duplicate 得到 marked.bam,与第2步类似

代码语言:javascript
复制
 if [ ! -d "${envs}/${pn}.align" ]; then
     mamba create -n ${pn}.align -y bwa=${version_bwa} samtools=${version_samtools} 
 fi
 ​
 #从github下载sambamba static 比 mamba 安装的版本速度快1倍以上.
 if [ ! -f "${envs}/${pn}.align/bin/sambamba" ]; then
     aria2c https://github.com/biod/sambamba/releases/download/v${version_sambamba}/sambamba-${version_sambamba}-linux-amd64-static.gz -d ${envs}/${pn}.align/bin
     gzip -cdf ${envs}/${pn}.align/bin/sambamba-${version_sambamba}-linux-amd64-static.gz  >  ${envs}/${pn}.align/bin/sambamba 
     chmod a+x ${envs}/${pn}.align/bin/sambamba
 fi
 ​
 mamba   activate ${pn}.align
 ​
 if [ "${version_reference}" == hg19 ]; then
     echo    "USE reference Version : ${version_reference}"
     mkdir   -p /opt/ref/hg19
     #如果没有检测到参考基因组序列,则下载序列并使用bwa创建索引
     if [ ! -f "/opt/ref/hg19/hg19.fasta" ]; then
         aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/ucsc.hg19.fasta.gz -d /opt/ref/hg19 -o hg19.fasta.gz
         cd /opt/ref/hg19 && gzip -d /opt/ref/hg19/hg19.fasta.gz
     else
         if [ -f "/opt/ref/hg19/ucsc.hg19.fasta.gz.aria2" ]; then
             aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/ucsc.hg19.fasta.gz -c -d /opt/ref/hg19 -o hg19.fasta.gz
             cd /opt/ref/hg19 && gzip -d /opt/ref/hg19/hg19.fasta.gz
         fi
     fi
     
     if  [ ! -f /opt/ref/hg19/hg19.fasta.amb ] ||
         [ ! -f /opt/ref/hg19/hg19.fasta.ann ] ||
         [ ! -f /opt/ref/hg19/hg19.fasta.bwt ] ||
         [ ! -f /opt/ref/hg19/hg19.fasta.pac ] ||
         [ ! -f /opt/ref/hg19/hg19.fasta.sa ]; then
         bwa index /opt/ref/hg19/hg19.fasta
     fi
     #检测samtools索引是否存在,如不存在则使用samtools创建参考基因组索引
     if [ ! -f "/opt/ref/hg19/hg19.fasta.fai" ]; then
         samtools faidx /opt/ref/hg19/hg19.fasta
     fi
 elif [ "${version_reference}" == "hg38" ]; then
     echo    "USE reference Version : ${version_reference}"
     mkdir   -p /opt/ref/hg38
     #如果没有检测到参考基因组序列,则下载序列并使用bwa创建索引
     if [ ! -f "/opt/ref/hg38/hg38.fasta" ]; then
         aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Homo_sapiens_assembly38.fasta.gz -d /opt/ref/hg38 -o hg38.fasta.gz
         cd /opt/ref/hg38 && gzip -d /opt/ref/hg38/hg38.fasta.gz
     else
         if [ -f "/opt/ref/hg38/hg38.fasta.gz.aria2" ]; then
             aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Homo_sapiens_assembly38.fasta.gz -d /opt/ref/hg38 -d /opt/ref/hg38 -o hg38.fasta.gz
             cd /opt/ref/hg38 && gzip -d /opt/ref/hg38/hg38.fasta.gz
         fi
     fi
     
     if  [ ! -f /opt/ref/hg38/hg38.fasta.amb ] ||
         [ ! -f /opt/ref/hg38/hg38.fasta.ann ] ||
         [ ! -f /opt/ref/hg38/hg38.fasta.bwt ] ||
         [ ! -f /opt/ref/hg38/hg38.fasta.pac ] ||
         [ ! -f /opt/ref/hg38/hg38.fasta.sa ]; then
         bwa index /opt/ref/hg38/hg38.fasta
     fi
     #检测samtools索引是否存在,如不存在则使用samtools创建参考基因组索引
     if [ ! -f "/opt/ref/hg38/hg38.fasta.fai" ]; then
         samtools faidx /opt/ref/hg38/hg38.fasta
     fi
 fi
 ​
 mkdir   -p ${result}/${sn}/aligned
 ​
 bwa mem \
     -t ${threads} -M \
     -R "@RG\\tID:${sn}_tumor\\tLB:${sn}_tumor\\tPL:Illumina\\tPU:Miseq\\tSM:${sn}_tumor" \
     /opt/ref/${version_reference}/${version_reference}.fasta  ${result}/${sn}/trimmed/${sn}_tumor_R1_trimmed.fq.gz ${result}/${sn}/trimmed/${sn}_tumor_R2_trimmed.fq.gz \
     | sambamba view -S -f bam -l 0 /dev/stdin \
     | sambamba sort -t ${threads} -m 2G --tmpdir=${result}/${sn}/aligned -o ${result}/${sn}/aligned/${sn}_tumor_sorted.bam /dev/stdin
 ulimit -n 10240
 sambamba  markdup \
     --tmpdir ${result}/${sn}/aligned \
     -t ${threads} ${result}/${sn}/aligned/${sn}_tumor_sorted.bam ${result}/${sn}/aligned/${sn}_tumor_marked.bam
     mv  ${result}/${sn}/aligned/${sn}_tumor_marked.bam.bai ${result}/${sn}/aligned/${sn}_tumor_marked.bai
     rm  -f ${result}/${sn}/aligned/${sn}_tumor_sorted.bam*
 ​
 mamba   deactivate

4. 对上述bam文件生成重新校准表,为后续BQSR使用;Generates recalibration table for Base Quality Score Recalibration (BQSR)

代码语言:javascript
复制
 #conda检测环境是否存在,首次运行不存在创建该环境并安装软件
 if [ ! -f "${envs}/${pn}.gatk/bin/gatk" ]; then
     mamba create -n ${pn}.gatk -y gatk4=${version_gatk} \
         r-base=3.6.2 \
         r-data.table=1.12.8 \
         r-dplyr=0.8.5 \
         r-getopt=1.20.3 \
         r-ggplot2=3.3.0 \
         r-gplots=3.0.3 \
         r-gsalib=2.1 \
         r-optparse=1.6.4 \
         r-backports=1.1.10 \
         tabix
 fi
 ​
 mamba activate ${pn}.gatk
 ​
 if [ "${version_reference}" == hg19 ]; then
     echo    "USE reference Version : ${version_reference}"
     #这里有个巨坑,broadinstitute ftp站点bundle目录提供的hg19版本参考文件,默认格式运行会报错,提示没有索引,使用gatk创建索引仍然报错,其实是gz格式需要使用bgzip重新压缩并且使用tabix创建索引才行
     if [ ! -f "/opt/ref/hg19/dbsnp_138.hg19.vcf.gz" ]; then
         aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/dbsnp_138.hg19.vcf.gz -d /opt/ref/hg19
     else 
         if [ -f "/opt/ref/hg19/dbsnp_138.hg19.vcf.gz.aria2" ]; then
             echo 'download continue...'
             aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/dbsnp_138.hg19.vcf.gz -c -d /opt/ref/hg19
         fi
     fi
     
     if [ ! -f "/opt/ref/hg19/dbsnp_138.hg19.vcf.gz.tbi" ]; then
         gzip -k -f -d /opt/ref/hg19/dbsnp_138.hg19.vcf.gz > /opt/ref/hg19/dbsnp_138.hg19.vcf
         bgzip -f --threads ${threads} /opt/ref/hg19/dbsnp_138.hg19.vcf > /opt/ref/hg19/dbsnp_138.hg19.vcf.gz
         tabix -f /opt/ref/hg19/dbsnp_138.hg19.vcf.gz
     fi
 ​
     if [ ! -f "/opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz" ]; then
         aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz -d /opt/ref/hg19
     else 
         if [ -f "/opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz.aria2" ]; then
             echo 'download continue...'
             aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz -c -d /opt/ref/hg19
         fi
     fi
     
     if [ ! -f "/opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz.tbi" ]; then
         gzip -k -f -d /opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz > /opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf
         bgzip -f --threads ${threads} /opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf > /opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz
         tabix -f /opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz
     fi
 ​
     if [ ! -f "/opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz" ]; then
         aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz -d /opt/ref/hg19
     else 
         if [ -f "/opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz.aria2" ]; then
             echo 'download continue...'
             aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz -c -d /opt/ref/hg19
         fi
     fi
     
     if [ ! -f "/opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz.tbi" ]; then
         gzip -k -f -d /opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz > /opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf
         bgzip -f --threads ${threads} /opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf > /opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz
         tabix -f /opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz
     fi
 ​
     if [ ! -f "/opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf.gz" ]; then
         aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/1000G_phase1.indels.hg19.sites.vcf.gz -d /opt/ref/
     else 
         if [ -f "/opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf.gz.aria2" ]; then
             echo 'download continue...'
             aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/1000G_phase1.indels.hg19.sites.vcf.gz -c -d /opt/ref/hg19
         fi
     fi
     
     if [ ! -f "/opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf.gz.tbi" ]; then
         gzip -k -f -d /opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf.gz > /opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf
         bgzip -f --threads ${threads} /opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf > /opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf.gz
         tabix -f /opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf.gz
     fi
     #创建参考序列hg19的dict字典文件
     if [ ! -f "/opt/ref/hg19/hg19.dict" ]; then
         gatk CreateSequenceDictionary -R /opt/ref/hg19/hg19.fasta -O /opt/ref/hg19/hg19.dict
     fi
     
     if [ ! -f "/opt/ref/${version_reference}/${project_bed}" ]; then
         #根据下载的Illumina_pt2.bed 文件创建interval list文件,坐标转换,起始坐标0修改为1
         #sed 's/chr//; s/\t/ /g' /opt/ref/hg19/Illumina_pt2.bed > /opt/ref/hg19/Illumina_pt2.processed.bed
         mkdir  -p /opt/ref/${version_reference}
         aria2c https://raw.githubusercontent.com/doujiangbaozi/sliverworkspace-util/main/somatic/projects/${project_bed} -d /opt/ref/hg19
         if [ ! -f "/opt/ref/hg19/${project_bed}.interval_list" ]; then
             gatk BedToIntervalList \
                 -I  /opt/ref/hg19/${project_bed} \
                 -SD /opt/ref/hg19/hg19.dict \
                 -O  /opt/ref/hg19/${project_bed}.interval_list
         fi
     fi
     
 ​
 ​
     mkdir -p ${result}/${sn}/recal
 ​
     gatk BaseRecalibrator \
             --known-sites /opt/ref/hg19/dbsnp_138.hg19.vcf.gz \
             --known-sites /opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz \
             --known-sites /opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf.gz \
             -L /opt/ref/hg19/${project_bed}.interval_list \
             -R /opt/ref/hg19/hg19.fasta \
             -I ${result}/${sn}/aligned/${sn}_tumor_marked.bam \
             -O ${result}/${sn}/recal/${sn}_tumor_recal.table &
 ​
     gatk BaseRecalibrator \
             --known-sites /opt/ref/hg19/dbsnp_138.hg19.vcf.gz \
             --known-sites /opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz \
             --known-sites /opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf.gz \
             -L /opt/ref/hg19/${project_bed}.interval_list \
             -R /opt/ref/hg19/hg19.fasta \
             -I ${result}/${sn}/aligned/${sn}_normal_marked.bam \
             -O ${result}/${sn}/recal/${sn}_normal_recal.table &
 ​
     wait
 elif [ "${version_reference}" == "hg38" ]; then
     echo    "USE reference Version : ${version_reference}"
     if [ ! -f "/opt/ref/hg38/dbsnp_144.hg38.vcf.gz" ]; then
         aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/dbsnp_144.hg38.vcf.gz -d /opt/ref/hg38
     else 
         if [ -f "/opt/ref/hg38/dbsnp_144.hg38.vcf.gz.aria2" ]; then
             echo 'download continue...'
             aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/dbsnp_144.hg38.vcf.gz -c -d /opt/ref/hg38
         fi
     fi
     if [ ! -f "/opt/ref/hg38/dbsnp_144.hg38.vcf.gz.tbi" ]; then
         aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/dbsnp_144.hg38.vcf.gz.tbi -d /opt/ref/hg38
     else 
         if [ -f "/opt/ref/hg38/dbsnp_144.hg38.vcf.gz.tbi.aria2" ]; then
             echo 'download continue...'
             aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/dbsnp_144.hg38.vcf.gz.tbi -c -d /opt/ref/hg38
         fi
     fi
 ​
     if [ ! -f "/opt/ref/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz" ]; then
         aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz -d /opt/ref/hg38
     else 
         if [ -f "/opt/ref/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.aria2" ]; then
             echo 'download continue...'
             aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz -c -d /opt/ref/hg38
         fi
     fi
     if [ ! -f "/opt/ref/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi" ]; then
         aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi -d /opt/ref/hg38
     else 
         if [ -f "/opt/ref/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi.aria2" ]; then
             echo 'download continue...'
             aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi -c -d /opt/ref/hg38
         fi
     fi
 ​
     if [ ! -f "/opt/ref/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz" ]; then
         aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz -d /opt/ref/hg38
     else 
         if [ -f "/opt/ref/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz.aria2" ]; then
             echo 'download continue...'
             aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz -c -d /opt/ref/hg38
         fi
     fi
     if [ ! -f "/opt/ref/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi" ]; then
         aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi -d /opt/ref/hg38
     else 
         if [ -f "/opt/ref/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi.aria2" ]; then
             echo 'download continue...'
             aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi -c -d /opt/ref/hg38
         fi
     fi
     #创建参考序列hg38的dict字典文件
     if [ ! -f "/opt/ref/hg38/hg38.dict" ]; then
         gatk CreateSequenceDictionary -R /opt/ref/hg38/hg38.fasta -O /opt/ref/hg38/hg38.dict
     fi
     
     if [ ! -f "/opt/ref/${version_reference}/${project_bed}" ]; then
         #根据下载的Illumina_pt2.bed 文件创建interval list文件,坐标转换,起始坐标0修改为1
         #sed 's/chr//; s/\t/ /g' /opt/ref/hg38/Illumina_pt2.bed > /opt/ref/hg38/Illumina_pt2.processed.bed
         mkdir  -p /opt/ref/hg38
         aria2c https://raw.githubusercontent.com/doujiangbaozi/sliverworkspace-util/main/somatic/projects/${project_bed} -d /opt/ref/hg38 -o ${project_bed}
         if [ ! -f "/opt/ref/hg38/${project_bed}.interval_list" ]; then
             gatk BedToIntervalList \
                 -I  /opt/ref/hg38/${project_bed} \
                 -SD /opt/ref/hg38/hg38.dict \
                 -O  /opt/ref/hg38/${project_bed}.interval_list
         fi
     fi
     
     mkdir -p ${result}/${sn}/recal
 ​
     gatk BaseRecalibrator \
             --known-sites /opt/ref/hg38/dbsnp_144.hg38.vcf.gz \
             --known-sites /opt/ref/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \
             --known-sites /opt/ref/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz \
             -L /opt/ref/hg38/${project_bed}.interval_list \
             -R /opt/ref/hg38/hg38.fasta \
             -I ${result}/${sn}/aligned/${sn}_tumor_marked.bam \
             -O ${result}/${sn}/recal/${sn}_tumor_recal.table &
 ​
     gatk BaseRecalibrator \
             --known-sites /opt/ref/hg38/dbsnp_144.hg38.vcf.gz \
             --known-sites /opt/ref/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \
             --known-sites /opt/ref/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz \
             -L /opt/ref/hg38/${project_bed}.interval_list \
             -R /opt/ref/hg38/hg38.fasta \
             -I ${result}/${sn}/aligned/${sn}_normal_marked.bam \
             -O ${result}/${sn}/recal/${sn}_normal_recal.table &
 ​
     wait
 fi
 ​
 ​
 •    
 mamba deactivate

5. 使用校准表对bam碱基质量校准,因为这一步gatk效率感人,所以同时计算insertsize,拆分interval list(后续mutect2并行运行需要),运行cnvkit batch,运行samtools depth计算测序深度,samtools flagstat 统计mapping比例及质量

代码语言:javascript
复制
 mkdir -p ${result}/${sn}/bqsr
 mkdir -p ${result}/${sn}/stat
 mkdir -p ${result}/${sn}/cnv
 mkdir -p ${result}/${sn}/interval
 ​
 mamba activate ${pn}.gatk
 ​
 gatk ApplyBQSR \
     --bqsr-recal-file ${result}/${sn}/recal/${sn}_tumor_recal.table \
     -L /opt/ref/${version_reference}/${project_bed}.interval_list \
     -R /opt/ref/${version_reference}/${version_reference}.fasta \
     -I ${result}/${sn}/aligned/${sn}_tumor_marked.bam \
     -O ${result}/${sn}/bqsr/${sn}_tumor_bqsr.bam &
 ​
 ​
 •    
 gatk ApplyBQSR \
     --bqsr-recal-file ${result}/${sn}/recal/${sn}_normal_recal.table \
     -L /opt/ref/${version_reference}/${project_bed}.interval_list \
     -R /opt/ref/${version_reference}/${version_reference}.fasta \
     -I ${result}/${sn}/aligned/${sn}_normal_marked.bam \
     -O ${result}/${sn}/bqsr/${sn}_normal_bqsr.bam &
 ​
 ​
 •    
 gatk CollectInsertSizeMetrics \
     -I ${result}/${sn}/aligned/${sn}_tumor_marked.bam \
     -O ${result}/${sn}/stat/${sn}_tumor_insertsize_metrics.txt \
     -H ${result}/${sn}/stat/${sn}_tumor_insertsize_histogram.pdf &
 ​
 ​
 •    
 gatk CollectInsertSizeMetrics \
     -I ${result}/${sn}/aligned/${sn}_normal_marked.bam \
     -O ${result}/${sn}/stat/${sn}_normal_insertsize_metrics.txt \
     -H ${result}/${sn}/stat/${sn}_normal_insertsize_histogram.pdf &
 ​
 rm -f ${result}/${sn}/interval/*.interval_list
 gatk SplitIntervals \
     -L /opt/ref/${version_reference}/${project_bed}.interval_list \
     -R /opt/ref/${version_reference}/${version_reference}.fasta \
     -O ${result}/${sn}/interval \
     --scatter-count ${scatter} &
 ​
 mamba deactivate
 ​
 if [ ! -d "${envs}/${pn}.cnvkit" ]; then
     mamba create -n ${pn}.cnvkit -y cnvkit=${version_cnvkit}
 fi
 ​
 if [ ! -f "/opt/ref/${version_reference}/refFlat.txt" ]; then
     aria2c -x 16 -s 16 http://hgdownload.soe.ucsc.edu/goldenPath/${version_reference}/database/refFlat.txt.gz -d /opt/ref/${version_reference}
     cd /opt/ref/${version_reference} && gzip -d refFlat.txt.gz
 fi
 ​
 mamba activate ${pn}.cnvkit
 ​
 rm -f ${result}/${sn}/cnv/${sn}_reference.cnn
 ​
 cnvkit.py batch \
     ${result}/${sn}/aligned/${sn}_tumor_marked.bam \
     --normal ${result}/${sn}/aligned/${sn}_normal_marked.bam \
     --method hybrid \
     --fasta  /opt/ref/${version_reference}/${version_reference}.fasta \
     --targets /opt/ref/${version_reference}/${project_bed} \
     --annotate /opt/ref/${version_reference}/refFlat.txt \
     --output-reference ${result}/${sn}/cnv/${sn}_reference.cnn \
     --output-dir ${result}/${sn}/cnv/ \
     --diagram \
     -p ${threads} &
 ​
 mamba deactivate
 ​
 mamba activate ${pn}.align
 ​
 samtools depth -a -b /opt/ref/${version_reference}/Illumina_pt2.bed  --threads ${threads} \
     ${result}/${sn}/aligned/${sn}_tumor_marked.bam > \
     ${result}/${sn}/stat/${sn}_tumor_marked.depth  &
 ​
 samtools depth -a -b /opt/ref/${version_reference}/Illumina_pt2.bed  --threads ${threads} \
     ${result}/${sn}/aligned/${sn}_normal_marked.bam > \
     ${result}/${sn}/stat/${sn}_normal_marked.depth &
 ​
 samtools flagstat --threads ${threads} \
     ${result}/${sn}/aligned/${sn}_tumor_marked.bam  > \
     ${result}/${sn}/stat/${sn}_tumor_marked.flagstat   &
 ​
 samtools flagstat --threads ${threads} \
     ${result}/${sn}/aligned/${sn}_normal_marked.bam > \
     ${result}/${sn}/stat/${sn}_normal_marked.flagstat &
     
 mamba deactivate
 ​
 wait

6. 计算堆叠数据( pileup metrics )以便后续评估污染,也可以根据拆分的interval list并行处理,处理之后合并。

代码语言:javascript
复制
 #官方巨坑,默认提供的small_exac_common_3_b37.vcf.gz默认染色体坐标不是以chr开头而是数字
 mamba activate ${pn}.gatk
 ​
 if [ "${version_reference}" == hg19 ]; then
     echo    "USE reference Version : ${version_reference}"
     #这里有个巨坑,从broadinstitute ftp 站点bundle Mutect2目录下载的参考文件,与同样下载的参考序列基因组坐标系不一致,参考基因组参考序列是chr1这种格式,这个af-only-gnomad是1,2,3这种格式,需要编写脚本处理
     if [ ! -f "/opt/ref/hg19/small_exac_common_3_b37.processed.vcf.gz" ]; then
         aria2c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/Mutect2/GetPileupSummaries/small_exac_common_3_b37.vcf.gz -d /opt/ref/hg19
     else
         if [ -f "/opt/ref/hg19/small_exac_common_3_b37.vcf.gz.aria2" ]; then
             aria2c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/Mutect2/GetPileupSummaries/small_exac_common_3_b37.vcf.gz -c -d /opt/ref/hg19
         fi
     fi
     if [ ! -f "/opt/ref/${version_reference}/small_exac_common_3_b37.processed.vcf.gz" ]; then
		if [ ! -f "${envs}/VcfProcessUtil.py" ]; then
			aria2c https://raw.fgit.cf/doujiangbaozi/sliverworkspace-util/main/somatic/VcfProcessUtil.py -d ${envs}/
			#aria2c https://raw.githubusercontent.com/doujiangbaozi/sliverworkspace-util/main/somatic/VcfProcessUtil.py -d ${envs}/
			chmod a+x ${envs}/VcfProcessUtil.py
		fi
		mamba activate ${pn}.cnvkit
		${envs}/VcfProcessUtil.py \
			-f /opt/ref/${version_reference}/small_exac_common_3_b37.vcf.gz \
			-o /opt/ref/${version_reference}/small_exac_common_3_b37.processed.vcf
		mamba deactivate
		mamba activate ${pn}.gatk
		cd /opt/ref/${version_reference}
		bgzip -f --threads ${threads} small_exac_common_3_b37.processed.vcf
		tabix -f small_exac_common_3_b37.processed.vcf.gz
		mamba deactivate
     fi
	 mamba activate ${pn}.gatk
     
     for i in `ls ${result}/${sn}/interval/*.interval_list`;
     do
         echo $i
         gatk GetPileupSummaries \
             -R /opt/ref/${version_reference}/${version_reference}.fasta \
             -I ${result}/${sn}/bqsr/${sn}_tumor_bqsr.bam \
             -O ${i%.*}-tumor-pileups.table \
             -V /opt/ref/${version_reference}/small_exac_common_3_b37.processed.vcf.gz \
             -L $i \
             --interval-set-rule INTERSECTION &
 ​
         gatk GetPileupSummaries \
             -R /opt/ref/${version_reference}/${version_reference}.fasta \
             -I ${result}/${sn}/bqsr/${sn}_normal_bqsr.bam \
             -O ${i%.*}-normal-pileups.table \
             -V /opt/ref/${version_reference}/small_exac_common_3_b37.processed.vcf.gz \
             -L $i \
             --interval-set-rule INTERSECTION &
 ​
     done
     wait
     mamba deactivate
 elif [ "${version_reference}" == "hg38" ]; then
     echo    "USE reference Version : ${version_reference}"
     if [ ! -f "/opt/ref/${version_reference}/small_exac_common_3.hg38.vcf.gz" ]; then
         aria2c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/Mutect2/GetPileupSummaries/small_exac_common_3.hg38.vcf.gz -d /opt/ref/${version_reference}
     else
         if [ -f "/opt/ref/hg19/small_exac_common_3_b37.vcf.gz.aria2" ]; then
             aria2c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/Mutect2/GetPileupSummaries/small_exac_common_3.hg38.vcf.gz -c -d /opt/ref/${version_reference}
         fi
     fi
     if [ ! -f "/opt/ref/${version_reference}/small_exac_common_3.hg38.vcf.gz.tbi" ]; then
         aria2c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/Mutect2/GetPileupSummaries/small_exac_common_3.hg38.vcf.gz.tbi -d /opt/ref/${version_reference}
     else
         if [ -f "/opt/ref/hg19/small_exac_common_3_b37.vcf.gz.tbi.aria2" ]; then
             aria2c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/Mutect2/GetPileupSummaries/small_exac_common_3.hg38.vcf.gz.tbi -c -d /opt/ref/${version_reference}
         fi
     fi
     mamba activate ${pn}.gatk
     for i in `ls ${result}/${sn}/interval/*.interval_list`;
     do
         echo $i
         gatk GetPileupSummaries \
             -R /opt/ref/${version_reference}/${version_reference}.fasta \
             -I ${result}/${sn}/bqsr/${sn}_tumor_bqsr.bam \
             -O ${i%.*}-tumor-pileups.table \
             -V /opt/ref/${version_reference}/small_exac_common_3.hg38.vcf.gz \
             -L $i \
             --interval-set-rule INTERSECTION &
 ​
         gatk GetPileupSummaries \
             -R /opt/ref/${version_reference}/${version_reference}.fasta \
             -I ${result}/${sn}/bqsr/${sn}_normal_bqsr.bam \
             -O ${i%.*}-normal-pileups.table \
             -V /opt/ref/${version_reference}/small_exac_common_3.hg38.vcf.gz \
             -L $i \
             --interval-set-rule INTERSECTION &
 ​
     done
     wait
     mamba deactivate
 fi
 ​
 ​
 mamba activate ${pn}.gatk
 tables=
 for i in `ls ${result}/${sn}/interval/*-tumor-pileups.table`;
 do
     tables="$tables -I $i"
 done
 ​
 gatk GatherPileupSummaries \
     --sequence-dictionary /opt/ref/${version_reference}/${version_reference}.dict \
     $tables \
     -O ${result}/${sn}/stat/${sn}_tumor_pileups.table
 ​
 nctables=
 for i in `ls ${result}/${sn}/interval/*-normal-pileups.table`;
 do
     nctables="$nctables -I $i"
 done
 ​
 gatk GatherPileupSummaries \
     --sequence-dictionary /opt/ref/${version_reference}/${version_reference}.dict \
     $nctables \
     -O ${result}/${sn}/stat/${sn}_normal_pileups.table
     
 mamba deactivate

7. 使用GetPileupSummaries计算结果评估跨样本污染,结果用于后面 FilterMutectCall 过滤Mutect2输出结果

代码语言:javascript
复制
 mamba activate ${pn}.gatk
 ​
 gatk CalculateContamination \
     -matched ${result}/${sn}/stat/${sn}_normal_pileups.table \
     -I ${result}/${sn}/stat/${sn}_tumor_pileups.table \
     -O ${result}/${sn}/stat/${sn}_contamination.table
     
 mamba deactivate

8. Mutect2 call 突变,使用拆分的interval list,结束后将结果合并;同时并行运行manta call sv突变

代码语言:javascript
复制
 mkdir -p ${result}/${sn}/sv
 mkdir -p ${result}/${sn}/snv
 ​
 mamba activate ${pn}.gatk
 ​
 if [ "${version_reference}" == hg19 ]; then
     echo    "USE reference Version : ${version_reference}"
     #这里有个巨坑,从broadinstitute ftp 站点bundle Mutect2目录下载的参考文件,与同样下载的参考序列基因组坐标系不一致,参考基因组参考序列是chr1这种格式,这个af-only-gnomad是1,2,3这种格式,需要编写脚本处理;hg38貌似没有这个问题,hg19的数据都不维护了么?
     if [ ! -f "/opt/ref/hg19/af-only-gnomad.raw.sites.b37.vcf.gz" ]; then
         aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/Mutect2/af-only-gnomad.raw.sites.b37.vcf.gz -d /opt/ref/hg19
     else
         if [ -f "/opt/ref/hg19/af-only-gnomad.raw.sites.b37.vcf.gz.aria2" ]; then
             aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/Mutect2/af-only-gnomad.raw.sites.b37.vcf.gz -c -d /opt/ref/hg19
         fi
         if [ ! -f "${envs}/VcfProcessUtil.py" ]; then
             aria2c https://raw.fgit.cf/doujiangbaozi/sliverworkspace-util/main/somatic/VcfProcessUtil.py -d ${envs}/
             #aria2c https://raw.githubusercontent.com/doujiangbaozi/sliverworkspace-util/main/somatic/VcfProcessUtil.py -d ${envs}/
             chmod a+x ${envs}/VcfProcessUtil.py
         fi
         mamba activate ${pn}.cnvkit
         ${envs}/VcfProcessUtil.py \
             -f /opt/ref/hg19/af-only-gnomad.raw.sites.b37.vcf.gz \
             -o /opt/ref/hg19/af-only-gnomad.raw.sites.b37.processed.vcf
         mamba deactivate
         mamba activate ${pn}.gatk
         cd /opt/ref/hg19
         bgzip -f --threads ${threads} af-only-gnomad.raw.sites.b37.processed.vcf
         tabix -f af-only-gnomad.raw.sites.b37.processed.vcf.gz
         mamba deactivate
     fi
 elif [ "${version_reference}" == "hg38" ]; then
     echo    "USE reference Version : ${version_reference}"
     if [ ! -f "/opt/ref/${version_reference}/af-only-gnomad.hg38.vcf.gz" ]; then
         aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/Mutect2/af-only-gnomad.hg38.vcf.gz -d /opt/ref/${version_reference}
     else
         if [ -f "/opt/ref/${version_reference}/af-only-gnomad.hg38.vcf.gz.aria2" ]; then
             aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/Mutect2/af-only-gnomad.hg38.vcf.gz -c -d /opt/ref/${version_reference}
         fi
     fi
     if [ ! -f "/opt/ref/${version_reference}/af-only-gnomad.hg38.vcf.gz.tbi" ]; then
         aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/Mutect2/af-only-gnomad.hg38.vcf.gz.tbi -d /opt/ref/${version_reference}
     else
         if [ -f "/opt/ref/${version_reference}/af-only-gnomad.hg38.vcf.gz.tbi.aria2" ]; then
             aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/Mutect2/af-only-gnomad.hg38.vcf.gz.tbi -c -d /opt/ref/${version_reference}
         fi
     fi
 fi
 ​
 ​
 •    
 mamba activate ${pn}.gatk
 if [ ! -f "/opt/ref/${version_reference}/${project_bed}.gz" ]; then
     bgzip -f -c /opt/ref/${version_reference}/${project_bed} > /opt/ref/hg19/${project_bed}.gz
     tabix -f -p bed /opt/ref/${version_reference}/${project_bed}.gz
 else
     if [ ! -f "/opt/ref/${version_reference}/${project_bed}.gz.tbi" ]; then
         tabix -f -p bed /opt/ref/${version_reference}/${project_bed}.gz
     fi
 fi
 mamba deactivate
 ​
 if [ ! -d "${envs}/${pn}.manta" ]; then
     mamba create -n ${pn}.manta -y manta=1.6.0
 fi
 ​
 mamba activate ${pn}.manta
 ​
 rm -f ${result}/${sn}/sv/runWorkflow.py*
 configManta.py  \
     --normalBam ${result}/${sn}/bqsr/${sn}_normal_bqsr.bam \
     --tumorBam  ${result}/${sn}/bqsr/${sn}_tumor_bqsr.bam \
     --referenceFasta /opt/ref/${version_reference}/${version_reference}.fasta \
     --exome \
     --callRegions /opt/ref/${version_reference}/${project_bed}.gz \
     --runDir ${result}/${sn}/sv
 ​
 rm -rf ${result}/${sn}/sv/workspace
 python ${result}/${sn}/sv/runWorkflow.py -m local -j ${threads} &
 ​
 mamba deactivate
 ​
 mamba activate ${pn}.gatk
 ​
 if [ "${version_reference}" == hg19 ]; then
     echo    "USE reference Version : ${version_reference}"
     rm -f ${result}/${sn}/snv/vcf-file.list
     touch ${result}/${sn}/snv/vcf-file.list
     for i in `ls ${result}/${sn}/interval/*.interval_list`;
     do
         rm -f ${i%.*}_bqsr.vcf.gz
         gatk Mutect2 \
             -R /opt/ref/${version_reference}/${version_reference}.fasta \
             -I ${result}/${sn}/bqsr/${sn}_tumor_bqsr.bam  -tumor  ${sn}_tumor  \
             -I ${result}/${sn}/bqsr/${sn}_normal_bqsr.bam -normal ${sn}_normal \
             -L $i \
             -O ${i%.*}_bqsr.vcf.gz \
             --max-mnp-distance 10 \
             --germline-resource /opt/ref/${version_reference}/af-only-gnomad.raw.sites.b37.processed.vcf.gz \
             --native-pair-hmm-threads ${threads} &
         echo ${i%.*}_bqsr.vcf.gz >> ${result}/${sn}/snv/vcf-file.list
     done
 elif [ "${version_reference}" == "hg38" ]; then
     echo    "USE reference Version : ${version_reference}"
     rm -f ${result}/${sn}/snv/vcf-file.list
     touch ${result}/${sn}/snv/vcf-file.list
     for i in `ls ${result}/${sn}/interval/*.interval_list`;
     do
         rm -f ${i%.*}_bqsr.vcf.gz
         gatk Mutect2 \
             -R /opt/ref/${version_reference}/${version_reference}.fasta \
             -I ${result}/${sn}/bqsr/${sn}_tumor_bqsr.bam  -tumor  ${sn}_tumor  \
             -I ${result}/${sn}/bqsr/${sn}_normal_bqsr.bam -normal ${sn}_normal \
             -L $i \
             -O ${i%.*}_bqsr.vcf.gz \
             --max-mnp-distance 10 \
             --germline-resource /opt/ref/${version_reference}/af-only-gnomad.hg38.vcf.gz \
             --native-pair-hmm-threads ${threads} &
         echo ${i%.*}_bqsr.vcf.gz >> ${result}/${sn}/snv/vcf-file.list
     done
 fi
 wait
 ​
 •    
 rm -f ${result}/${sn}/snv/${sn}_bqsr.vcf.gz.stats
 stats=
 for z in `ls ${result}/${sn}/interval/*_bqsr.vcf.gz.stats`;
 do
     stats="$stats -stats $z"
 done
 ​
 gatk MergeMutectStats $stats \
     -O ${result}/${sn}/snv/${sn}_bqsr.vcf.gz.stats
 ​
 gatk MergeVcfs \
     -I ${result}/${sn}/snv/vcf-file.list \
     -O ${result}/${sn}/snv/${sn}_bqsr.vcf.gz
     
 mamba deactivate

9. FilterMutectCalls 对Mutect结果突变过滤

代码语言:javascript
复制
 mamba activate ${pn}.gatk
 ​
 gatk FilterMutectCalls \
     --max-events-in-region ${event} \
     --contamination-table ${result}/${sn}/stat/${sn}_contamination.table \
     -R /opt/ref/${version_reference}/${version_reference}.fasta \
     -V ${result}/${sn}/snv/${sn}_bqsr.vcf.gz \
     -O ${result}/${sn}/snv/${sn}_filtered.vcf.gz
     
 mamba deactivate

10. 使用Vep注释过滤结果

代码语言:javascript
复制
 #conda检测环境是否存在,首次运行不存在创建该环境并安装软件
 if [ ! -d "${envs}/${pn}.vep" ]; then
     echo "Creating the environment ${pn}.vep"
     mamba create -n ${pn}.vep -y ensembl-vep=${version_vep}
 fi
 ​
 mkdir -p /opt/result/${sn}/vcf
 ​
 if [ "${version_reference}" == hg19 ]; then
 echo    "USE reference Version : ${version_reference}"
     #检测vep注释数据库是否存在如果不存在则先下载
     if [ ! -d "/opt/ref/vep-cache/homo_sapiens/${version_vep}_GRCh37" ]; then
         aria2c -x 16 -s 48 https://ftp.ensembl.org/pub/release-${version_vep}/variation/indexed_vep_cache/homo_sapiens_vep_${version_vep}_GRCh37.tar.gz -d /opt/ref/
         tar -zxvf /opt/ref/homo_sapiens_vep_${version_vep}_GRCh37.tar.gz -C /opt/ref/vep-cache/
     elif [ -f "/opt/ref/homo_sapiens_vep_${version_vep}_GRCh37.tar.gz.aria2" ]; then
         aria2c -x 16 -s 48 https://ftp.ensembl.org/pub/release-${version_vep}/variation/indexed_vep_cache/homo_sapiens_vep_${version_vep}_GRCh37.tar.gz -c -d /opt/ref/
         tar -zxvf /opt/ref/homo_sapiens_vep_${version_vep}_GRCh37.tar.gz -C /opt/ref/vep-cache/
     fi
 ​
     if [ ! -d "/opt/ref/vep-cache/homo_sapiens_refseq/${version_vep}_GRCh37" ]; then
         aria2c -x 16 -s 48 http://ftp.ensembl.org/pub/release-${version_vep}/variation/vep/homo_sapiens_refseq_vep_${version_vep}_GRCh37.tar.gz -d /opt/ref/
         tar -zxvf /opt/ref/homo_sapiens_refseq_vep_${version_vep}_GRCh37.tar.gz -C /opt/ref/vep-cache/
     elif [ -f "/opt/ref/homo_sapiens_refseq_vep_${version_vep}_GRCh37.tar.gz.aria2" ]; then
         aria2c -x 16 -s 48 http://ftp.ensembl.org/pub/release-${version_vep}/variation/vep/homo_sapiens_refseq_vep_${version_vep}_GRCh37.tar.gz -c -d /opt/ref/
         tar -zxvf /opt/ref/homo_sapiens_refseq_vep_${version_vep}_GRCh37.tar.gz -C /opt/ref/vep-cache/
     fi
     
     mamba activate ${pn}.vep
 ​
     mkdir -p ${result}/${sn}/annotation
     vep \
         -i ${result}/${sn}/snv/${sn}_filtered.vcf.gz  \
         -o ${result}/${sn}/annotation/${sn}_filtered_vep.tsv \
         --offline \
         --cache \
         --cache_version ${version_vep} \
         --everything \
         --dir_cache /opt/ref/vep-cache/ \
         --dir_plugins /opt/ref/vep-cache/Plugins \
         --species homo_sapiens \
         --assembly GRCh37 \
         --fasta /opt/ref/${version_reference}/${version_reference}.fasta \
         --refseq \
         --force_overwrite \
         --format vcf \
         --tab \
         --shift_3prime 1  \
         --offline
 ​
     mamba deactivate
     
 elif [ "${version_reference}" == "hg38" ]; then
     echo    "USE reference Version : ${version_reference}"
     #检测vep注释数据库是否存在如果不存在则先下载
     if [ ! -d "/opt/ref/vep-cache/homo_sapiens/${version_vep}_GRCh38" ]; then
         aria2c -x 16 -s 48 https://ftp.ensembl.org/pub/release-${version_vep}/variation/indexed_vep_cache/homo_sapiens_vep_${version_vep}_GRCh38.tar.gz -d /opt/ref/
         tar -zxvf /opt/ref/homo_sapiens_vep_${version_vep}_GRCh38.tar.gz -C /opt/ref/vep-cache/
     elif [ -f "/opt/ref/homo_sapiens_vep_${version_vep}_GRCh38.tar.gz.aria2" ]; then
         aria2c -x 16 -s 48 https://ftp.ensembl.org/pub/release-${version_vep}/variation/indexed_vep_cache/homo_sapiens_vep_${version_vep}_GRCh38.tar.gz -c -d /opt/ref/
         tar -zxvf /opt/ref/homo_sapiens_vep_${version_vep}_GRCh38.tar.gz -C /opt/ref/vep-cache/
     fi
 ​
     if [ ! -d "/opt/ref/vep-cache/homo_sapiens_refseq/${version_vep}_GRCh38" ]; then
         aria2c -x 16 -s 48 http://ftp.ensembl.org/pub/release-${version_vep}/variation/vep/homo_sapiens_refseq_vep_${version_vep}_GRCh38.tar.gz -d /opt/ref/
         tar -zxvf /opt/ref/homo_sapiens_refseq_vep_${version_vep}_GRCh38.tar.gz -C /opt/ref/vep-cache/
     elif [ -f "/opt/ref/homo_sapiens_refseq_vep_${version_vep}_GRCh38.tar.gz.aria2" ]; then
         aria2c -x 16 -s 48 http://ftp.ensembl.org/pub/release-${version_vep}/variation/vep/homo_sapiens_refseq_vep_${version_vep}_GRCh38.tar.gz -c -d /opt/ref/
         tar -zxvf /opt/ref/homo_sapiens_refseq_vep_${version_vep}_GRCh38.tar.gz -C /opt/ref/vep-cache/
     fi
     
     mamba activate ${pn}.vep
 ​
     mkdir -p ${result}/${sn}/annotation
     vep \
         -i ${result}/${sn}/snv/${sn}_filtered.vcf.gz  \
         -o ${result}/${sn}/annotation/${sn}_filtered_vep.tsv \
         --offline \
         --cache \
         --cache_version ${version_vep} \
         --everything \
         --dir_cache /opt/ref/vep-cache/ \
         --dir_plugins /opt/ref/vep-cache/Plugins \
         --species homo_sapiens \
         --assembly GRCh37 \
         --fasta /opt/ref/${version_reference}/${version_reference}.fasta \
         --refseq \
         --force_overwrite \
         --format vcf \
         --tab \
         --shift_3prime 1  \
         --offline
 ​
     mamba deactivate
 fi

11. 使用脚本处理注释结果和过滤vcf结果,输出和室间质评要求格式的数据表格

代码语言:javascript
复制
 mamba activate ${pn}.cnvkit
 ​
 if [ ! -f "${envs}/MatchedSnvVepAnnotationFilter.py" ]; then
     aria2c https://raw.githubusercontent.com/doujiangbaozi/sliverworkspace-util/main/somatic/MatchedSnvVepAnnotationFilter.py -d ${envs}/
     chmod a+x ${envs}/MatchedSnvVepAnnotationFilter.py
 fi
 ​
 ${envs}/MatchedSnvVepAnnotationFilter.py \
     -e normal_artifact   \
     -e germline   \
     -i strand_bias   \
     -i clustered_events   \
     --min-vaf=${snv_vaf} \
     --min-tlod=${snv_tlod} \
     --min-depth=${snv_depth} \
     -v ${result}/${sn}/snv/${sn}_filtered.vcf.gz   \
     -a ${result}/${sn}/annotation/${sn}_filtered_vep.tsv   \
     -o ${result}/${sn}/annotation/${sn}.result.SNV.tsv
 ​
 mamba deactivate

12. 使用cnvkit提供工具输出分布图和热图

代码语言:javascript
复制
 mamba activate ${pn}.cnvkit
     
     cnvkit.py scatter ${result}/${sn}/cnv/${sn}_tumor_marked.cnr \
     -s ${result}/${sn}/cnv/${sn}_tumor_marked.cns \
     -i ' ' \
     -n ${sn}_normal \
     -o ${result}/${sn}/cnv/${sn}_cnv_scatter.png -t  &&
 ​
 cnvkit.py heatmap ${result}/${sn}/cnv/${sn}_tumor_marked.cns \
 -o ${result}/${sn}/cnv/${sn}_cnv_heatmap.png
 ​
 mamba deactivate

13. 使用cnvkit call 根据cnvkit batch输出结果推算拷贝数

代码语言:javascript
复制
 mamba activate ${pn}.cnvkit
 ​
 cnvkit.py call ${result}/${sn}/cnv/${sn}_tumor_marked.cns \
     -o ${result}/${sn}/cnv/${sn}_tumor_marked.call.cns
 ​
 mamba deactivate

14. 编写脚本处理cnvkit输出,计算cnv基因,exon位置,gain/lost,cn数

代码语言:javascript
复制
 mamba activate ${pn}.cnvkit
 ​
 if [ ! -f "${envs}/CnvAnnotationFilter.py" ]; then
     aria2c https://raw.githubusercontent.com/doujiangbaozi/sliverworkspace-util/main/somatic/CnvAnnotationFilter.py -d ${envs}/
     chmod a+x ${envs}/CnvAnnotationFilter.py
 fi
 ​
 if [ ! -f "/opt/ref/${version_reference}/refGene.txt" ]; then
     aria2c -x 16 -s 16 http://hgdownload.cse.ucsc.edu/goldenPath/${version_reference}/database/refGene.txt.gz -d /opt/ref/${version_reference} -o refGene.txt.gz
     cd /opt/ref/${version_reference} && gzip -d refGene.txt.gz
 fi
 ​
 python ${envs}/CnvAnnotationFilter.py  \
     -r /opt/ref/${version_reference}/refGene.txt \
     -i ${cnv_min} \
     -x ${cnv_max} \
     -D ${cnv_dep} \
     -f ${result}/${sn}/cnv/${sn}_tumor_marked.call.cns \
     -o ${result}/${sn}/cnv/${sn}.result.CNV.tsv
 ​
 mamba deactivate

15. 编写脚本处理manta的输出,获取最终sv输出结果,起始位置,基因、频率等

代码语言:javascript
复制
 mamba activate ${pn}.cnvkit
 ​
 if [ ! -f "${envs}/SvAnnotationFilter.py" ]; then
     aria2c https://raw.githubusercontent.com/doujiangbaozi/sliverworkspace-util/main/somatic/SvAnnotationFilter.py -d ${envs}/
     chmod a+x ${envs}/SvAnnotationFilter.py
 fi
 ​
 if [ ! -f "/opt/ref/${version_reference}/refGene.txt" ]; then
     aria2c -x 16 -s 16 http://hgdownload.cse.ucsc.edu/goldenPath/${version_reference}/database/refGene.txt.gz -d /opt/ref/${version_reference} -o refGene.txt.gz
     cd /opt/ref/${version_reference} && gzip -d refGene.txt.gz
 fi
 ​
 ${envs}/SvAnnotationFilter.py \
     -r /opt/ref/${version_reference}/refGene.txt \
     -s ${sv_score} \
     -f ${result}/${sn}/sv/results/variants/somaticSV.vcf.gz \
     -o ${result}/${sn}/sv/${sn}.result.SV.tsv
 ​
 mamba deactivate

16. 根据之前fastp,samtools depth,samtools flagstat,gatk CollectInsertSizeMetrics等输出,给出综合 QC数据

代码语言:javascript
复制
 mamba activate ${pn}.cnvkit
 ​
 if [ ! -f "${envs}/MatchedQcProcessor.py" ]; then
     aria2c https://raw.githubusercontent.com/doujiangbaozi/sliverworkspace-util/main/somatic/MatchedQcProcessor.py -d ${envs}/
     chmod a+x ${envs}/MatchedQcProcessor.py
 fi
 ​
 ${envs}/MatchedQcProcessor.py  --bed /opt/ref/${version_reference}/${project_bed} \
     --out ${result}/${sn}/stat/${sn}.result.QC.tsv \
     --sample-fastp=${result}/${sn}/trimmed/${sn}_tumor_fastp.json \
     --sample-depth=${result}/${sn}/stat/${sn}_tumor_marked.depth \
     --sample-flagstat=${result}/${sn}/stat/${sn}_tumor_marked.flagstat \
     --sample-insertsize=${result}/${sn}/stat/${sn}_tumor_insertsize_metrics.txt \
     --normal-fastp=${result}/${sn}/trimmed/${sn}_normal_fastp.json \
     --normal-depth=${result}/${sn}/stat/${sn}_normal_marked.depth \
     --normal-flagstat=${result}/${sn}/stat/${sn}_normal_marked.flagstat  \
     --normal-insertsize=${result}/${sn}/stat/${sn}_normal_insertsize_metrics.txt
 ​
 mamba deactivate
 ​
 mamba activate ${pn}.fastp
 ​
 multiqc ${result}/${sn}/ -f -o ${result}/${sn}/qc
 ​
 mamba deactivate
 ​

最终输出

文件名

备注

report.pdf

由sliverworkspace设计的图形化分析报告打印输出为PDF

1701/qc/multiqc_report.html

multiqc聚合报告

1701/1701.result.SNV.tsv

SNV最终突变结果

1701/1701/cnv/1701_cnv_heatmap.png

CNV结果热图

1701/cnv/1701_cnv_scatter.png

CNV结果分布图

1701/cnv/1701.result.CNV.tsv

CNV最终结果

1701.result.SV.tsv

SV最终结果

1701.result.QC.tsv

最终质控结果

1701/qc/multiqc_report.html

multiqc报告

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 本次更新内容:开箱即用的pipeline,能够根据样本version_reference自动选择参考基因组版本,根据project_bed文件选择项目bed,自动初始化环境、安装所需软件、下载ref文件和数据库的版本
  • 最近准备为sliverworkspace 图形化生信平台开发报告设计器,需要一个较为复杂的pipeline作为测试数据,就想起来把之前的 [[满分室间质评之GATK Somatic SNV+Indel+CNV+SV(下)性能优化]]翻出来用一下。跑了一遍发现还是各种问题,于是想把pipeline改造成免部署、首次运行初始化环境的版本,以便需要时候能够直接运行起来,于是有了本文。
    • 为了让pipeline能够直接运行,无需部署,这里使用docker容器保证运行环境的一致性:见文章:[[基于docker的生信基础环境镜像构建]],这里采用的方案是带ssh服务的docker+conda环境,整个pipeline在一个通用容器中运行。
      • 本文代码较长,可能略复杂,想直接运行的可以下载 workflow文件,导入sliverworkspace图形化生信平台直接运行,获取并查看图形化分析报告
        • 分析流程整体概览
          • docker 镜像拉取及部署配置
            • docker-compose.yml配置文件
              • docker 镜像部署运行
            • 测试数据
              • 变量名称:
              • Pipeline/workflow 具体步骤:
                • 1. fastp 默认参数过滤,看下原始数据质量,clean data
                  • 2. normal文件fastq比对到参考基因组,sort 排序,mark duplicate 得到 marked.bam
                    • 3. tumor文件fastq比对到参考基因组,排序,mark duplicate 得到 marked.bam,与第2步类似
                      • 4. 对上述bam文件生成重新校准表,为后续BQSR使用;Generates recalibration table for Base Quality Score Recalibration (BQSR)
                        • 5. 使用校准表对bam碱基质量校准,因为这一步gatk效率感人,所以同时计算insertsize,拆分interval list(后续mutect2并行运行需要),运行cnvkit batch,运行samtools depth计算测序深度,samtools flagstat 统计mapping比例及质量
                          • 6. 计算堆叠数据( pileup metrics )以便后续评估污染,也可以根据拆分的interval list并行处理,处理之后合并。
                            • 7. 使用GetPileupSummaries计算结果评估跨样本污染,结果用于后面 FilterMutectCall 过滤Mutect2输出结果
                              • 8. Mutect2 call 突变,使用拆分的interval list,结束后将结果合并;同时并行运行manta call sv突变
                                • 9. FilterMutectCalls 对Mutect结果突变过滤
                                  • 10. 使用Vep注释过滤结果
                                    • 11. 使用脚本处理注释结果和过滤vcf结果,输出和室间质评要求格式的数据表格
                                      • 12. 使用cnvkit提供工具输出分布图和热图
                                        • 13. 使用cnvkit call 根据cnvkit batch输出结果推算拷贝数
                                          • 14. 编写脚本处理cnvkit输出,计算cnv基因,exon位置,gain/lost,cn数
                                            • 15. 编写脚本处理manta的输出,获取最终sv输出结果,起始位置,基因、频率等
                                              • 16. 根据之前fastp,samtools depth,samtools flagstat,gatk CollectInsertSizeMetrics等输出,给出综合 QC数据
                                                • 最终输出
                                                相关产品与服务
                                                容器服务
                                                腾讯云容器服务(Tencent Kubernetes Engine, TKE)基于原生 kubernetes 提供以容器为核心的、高度可扩展的高性能容器管理服务,覆盖 Serverless、边缘计算、分布式云等多种业务部署场景,业内首创单个集群兼容多种计算节点的容器资源管理模式。同时产品作为云原生 Finops 领先布道者,主导开源项目Crane,全面助力客户实现资源优化、成本控制。
                                                领券
                                                问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档