首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >专栏 >使用 Manta 检测结构变异--肿瘤基因组测序数据分析专栏

使用 Manta 检测结构变异--肿瘤基因组测序数据分析专栏

作者头像
生信菜鸟团
发布2022-05-24 16:30:14
发布2022-05-24 16:30:14
5.2K10
代码可运行
举报
文章被收录于专栏:生信菜鸟团生信菜鸟团
运行总次数:0
代码可运行

简介

Manta 是 illumina 公司开发的一款用于检测结构变异 Structural Variant 和 Indel 的软件,Manta 检测 SV 和 Indel 分为两个主要步骤:(1)扫描基因组以找到SV相关区域,以及(2)分析,评分和输出在这些区域发现的SV。其输出的结果可以作为 strelka 的输入,以提高 strelka 对 indel 检测的准确性。 值得注意的事,该软件既可以用于 germline SV 的检测(家系样本),也可以用于 Somatic SV 的检测(肿瘤样本)

下载安装

该软件支持多种安装方式,如果要从源码安装,则需要解决环境问题如:python 2.6+、gcc 4.8+ 等,不推荐。开发团队提供了二进制版本,下载解压后即可使用,不过是面向 Centos 系统的(感兴趣的读者可以尝试在Ubuntu上使用这种方式安装)

代码语言:javascript
代码运行次数:0
运行
复制
wget -c https://github.com/Illumina/manta/releases/download/v1.6.0/manta-1.6.0.centos6_x86_64.tar.bz2
tar -jxvf manta-1.6.0.centos6_x86_64.tar.bz2

Ubuntu 系统的软件的安装方法是(注意,编译的时候要在python 2的环境下,另外需要修改--prefix 参数指定安装路径):

代码语言:javascript
代码运行次数:0
运行
复制
wget -c https://github.com/Illumina/manta/releases/download/v1.6.0/manta-1.6.0.release_src.tar.bz2
tar -xjf manta-1.6.0.release_src.tar.bz2
cd manta-1.6.0.release_src/
mkdir build && cd build
# Ensure that CC and CXX are updated to target compiler if needed, e.g.:
#     export CC=/path/to/cc
#     export CXX=/path/to/c++
../manta-1.6.0.release_src/configure --jobs=4 --prefix=${HOME}/biosoft/manta
make -j4 install

由于依赖的问题,安装比较麻烦,经常安装失败。或者可以考虑用conda安装。安装好了之后,可以使用下面命令检查一下(注意是python2):

代码语言:javascript
代码运行次数:0
运行
复制
python ${MANTA_INSTALL_PATH}/bin/runMantaWorkflowDemo.py

实际运行

运行过程分为两步:(1)配置 config (2)运行脚本: 对于 germline SV 的示例:

代码语言:javascript
代码运行次数:0
运行
复制
${MANTA_INSTALL_PATH}/bin/configManta.py \
--bam sample.bam \
--referenceFasta Homo_sapiens_assembly38.fasta \
--runDir ${MANTA_ANALYSIS_PATH}

${MANTA_ANALYSIS_PATH}/runWorkflow.py

如果是家系样本,可以同时输入多个bam文件:

代码语言:javascript
代码运行次数:0
运行
复制
${MANTA_INSTALL_PATH}/bin/configManta.py \
--bam sample1.bam \
--bam sample2.bam \
--bam sample3.bam \
--referenceFasta Homo_sapiens_assembly38.fasta \
--runDir ${MANTA_ANALYSIS_PATH}

${MANTA_ANALYSIS_PATH}/runWorkflow.py

对于 Somatic (也支持非配对的肿瘤样本,但是运行方法不一样):

代码语言:javascript
代码运行次数:0
运行
复制
${MANTA_INSTALL_PATH}/bin/configManta.py \
--normalBam normal.bam \
--tumorBam  tumor.bam  \
--referenceFasta Homo_sapiens_assembly38.fasta \
--runDir ${MANTA_ANALYSIS_PATH}

${MANTA_ANALYSIS_PATH}/runWorkflow.py

对于外显子测序,还可以使用 --callRegions指定 bed 文件,--exome 指定外显子测序。实际运行用到了下面的脚本:

代码语言:javascript
代码运行次数:0
运行
复制
#!/bin/sh 
normal=$1
tumor=$2

mkdir 6.manta/${tumor}
${MANTA_INSTALL_PATH}/bin/configManta.py \
--tumorBam   5.gatk/${tumor}_bqsr.bam \
--normalBam 5.gatk/${normal}_bqsr.bam \
--referenceFasta ${GENOME}            \
--callRegions ${bed}                  \
--exome                               \
--runDir 6.manta/${tumor}           \
1>6.manta/${tumor}/${tumor}_manta.log 2>&1

python2 6.manta/${tumor}/runWorkflow.py -m local -j 4 1>6.manta/${tumor}/${tumor}_manta.log 2>&1

这样的话,每一个样本生成的文件会保存到单独的目录下,有:

代码语言:javascript
代码运行次数:0
运行
复制
$ ls 6.manta/*/
case1_biorep_A_techrep_1/:
case1_biorep_A_techrep_1_manta.log  results/  runWorkflow.py*  runWorkflow.py.config.pickle  workflow.error.log.txt  workflow.exitcode.txt  workflow.warning.log.txt  workspace/

case1_biorep_B/:
case1_biorep_B_manta.log  results/  runWorkflow.py*  runWorkflow.py.config.pickle  workflow.error.log.txt  workflow.exitcode.txt  workflow.warning.log.txt  workspace/

case1_biorep_C/:
case1_biorep_C_manta.log  results/  runWorkflow.py*  runWorkflow.py.config.pickle  workflow.error.log.txt  workflow.exitcode.txt  workflow.warning.log.txt  workspace/

case1_techrep_2/:
case1_techrep_2_manta.log  results/  runWorkflow.py*  runWorkflow.py.config.pickle  workflow.error.log.txt  workflow.exitcode.txt  workflow.warning.log.txt  workspace/

case2_biorep_A/:
case2_biorep_A_manta.log  results/  runWorkflow.py*  runWorkflow.py.config.pickle  workflow.error.log.txt  workflow.exitcode.txt  workflow.warning.log.txt  workspace/

case2_biorep_B_techrep_1/:
case2_biorep_B_techrep_1_manta.log  results/  runWorkflow.py*  runWorkflow.py.config.pickle  workflow.error.log.txt  workflow.exitcode.txt  workflow.warning.log.txt  workspace/

case2_biorep_C/:
case2_biorep_C_manta.log  results/  runWorkflow.py*  runWorkflow.py.config.pickle  workflow.error.log.txt  workflow.exitcode.txt  workflow.warning.log.txt  workspace/

case2_techrep_2/:
case2_techrep_2_manta.log  results/  runWorkflow.py*  runWorkflow.py.config.pickle  workflow.error.log.txt  workflow.exitcode.txt  workflow.warning.log.txt  workspace/

主要的结果是在 results 中,如:

代码语言:javascript
代码运行次数:0
运行
复制
$ ls case1_biorep_A_techrep_1/results/*
case1_biorep_A_techrep_1/results/evidence:

case1_biorep_A_techrep_1/results/stats:
alignmentStatsSummary.txt  svCandidateGenerationStats.tsv  svCandidateGenerationStats.xml  svLocusGraphStats.tsv

case1_biorep_A_techrep_1/results/variants:
candidateSmallIndels.vcf.gz  candidateSmallIndels.vcf.gz.tbi  candidateSV.vcf.gz  candidateSV.vcf.gz.tbi  diploidSV.vcf.gz  diploidSV.vcf.gz.tbi  somaticSV.vcf.gz  somaticSV.vcf.gz.tbi

输出结果

其中,主要是几个 vcf 文件:

  • diploidSV.vcf.gz
  • SVs and indels scored and genotyped under a diploid model for the set of samples in a joint diploid sample analysis or for the normal sample in a tumor/normal subtraction analysis. In the case of a tumor/normal subtraction, the scores in this file do not reflect any information from the tumor sample.
  • somaticSV.vcf.gz
  • SVs and indels scored under a somatic variant model. This file will only be produced if a tumor sample alignment file is supplied during configuration
  • candidateSV.vcf.gz
  • Unscored SV and indel candidates. Only a minimal amount of supporting evidence is required for an SV to be entered as a candidate in this file. An SV or indel must be a candidate to be considered for scoring, therefore an SV cannot appear in the other VCF outputs if it is not present in this file. Note that by default this file includes indels of size 8 and larger. The smallest indels in this set are intended to be passed on to a small variant caller without scoring by manta itself (by default manta scoring starts at size 50).
  • candidateSmallIndels.vcf.gz
  • Subset of the candidateSV.vcf.gz file containing only simple insertion and deletion variants less than the minimum scored variant size (50 by default). Passing this file to a small variant caller will provide continuous coverage over all indel sizes when the small variant caller and manta outputs are evaluated together. Alternate small indel candidate sets can be parsed out of the candidateSV.vcf.gz file if this candidate set is not appropriate.

对于 somatic SV ,得到的结果会保存在 somaticSV.vcf.gz 中:

代码语言:javascript
代码运行次数:0
运行
复制
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  case1_germline  case1_biorep_A_techrep_1
chr1    196910455       MantaDEL:2669:0:2:1:0:0 G       <DEL>   .       PASS    END=196914442;SVTYPE=DEL;SVLEN=-3987;CIPOS=0,4;CIEND=0,4;HOMLEN=4;HOMSEQ=GTTG;SOMATIC;SOMATICSCORE=85   PR:SR   26,0:61,0       72,10:128,40

不过,对于外显子数据,检测 SV 并不准确。于是我又用 WGS 的数据,跑了一遍 germline SV 的检测,diploidSV.vcf.gz 是:

代码语言:javascript
代码运行次数:0
运行
复制
#CHROM    POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  A_V300028149
chr1    778705  MantaINS:11:0:0:0:0:0   CA  CGCCCTTGTGACGTCACGGAAGGCGCGCGCTTGCGACGTCACGGAAGGCGCGCCCTTGTGACGTCACGGAAGGCGCT   430 PASS    END=778706;SVTYPE=INS;SVLEN=76;CIGAR=1M76I1D    GT:FT:GQ:PL:PR:SR   0/1:PASS:249:480,0,246:0,0:17,13
chr1    789487  MantaDUP:TANDEM:10:601:606:0:0:0    G   <DUP:TANDEM>    18  MinQUAL END=224014523;SVTYPE=DUP;SVLEN=223225036;IMPRECISE;CIPOS=-98,98;CIEND=-147,148  GT:FT:GQ:PL:PR  0/1:PASS:18:68,0,461:31,9
chr1    1068747 MantaDUP:TANDEM:36:1:2:0:0:0    C   <DUP:TANDEM>    651 PASS    END=1068825;SVTYPE=DUP;SVLEN=78;CIPOS=0,9;CIEND=0,9;HOMLEN=9;HOMSEQ=CCACGCGGG   GT:FT:GQ:PL:PR:SR   0/1:PASS:84:701,0,81:4,0:23,27
chr1    1068824 MantaINS:36:2:2:0:0:0   G   GGCCACGCGGGCTGTGCAGATGCAGGTGCGGCGGGGCGGGGCCACGCGGGCTGTGAAGGTGCAGGTGCGGCGGGGCAGA 999 PASS    END=1068824;SVTYPE=INS;SVLEN=78;CIGAR=1M78I;CIPOS=0,10;HOMLEN=10;HOMSEQ=GCCACGCGGG  GT:FT:GQ:PL:PR:SR   1/1:PASS:108:999,111,0:0,0:0,44
chr1    1207339 MantaDEL:49:0:1:0:0:0   GGAGACTGTCCTATGTCTTTCTGAGCCTCAGTTTCCCCTGTGGGCACCGAGGGGTTCTGGGACCCTGCCTCCACCAGGAAGCCTCCCTGGATTGCCCAGCCCTGCTTCTGCGCCGTCCAGCACAGGTGGAGACCCCCATGAATGCTGGGGGTGGGGGCTCTCGGGAACGTGAGCGTGGATGTGGTTCAACACCCTTTTGAGACCTGCAGCCACCGCCTCACCCCGTAAGGCGGTTCCTCCTTTTCCAAGGTAAATGACAGGAATTAGCTGTTTGTGACACCCCGGAGTTCTCAAATCCAAGATGTAGGAGCCTGCCTTGGAGAGGCAGCCCTCAGACACTGCAGAGAAGGAAGGGGTCTCTGCAGCTCCAGGCCGCCCCGACGCTCGGAAGGAAAGGGGTGGGGCCAGCTGGGCCTGGGGGC  G   352 PASS    END=1207760;SVTYPE=DEL;SVLEN=-421;CIGAR=1M421GT:FT:GQ:PL:PR:SR  0/1:PASS:352:402,0,643:21,6:41,12
chr1    15503064    MantaBND:929:0:1:0:0:0:1    C   ]chr7:148936234]C   48  PASS    SVTYPE=BND;MATEID=MantaBND:929:0:1:0:0:0:0;IMPRECISE;CIPOS=-64,65;BND_DEPTH=45;MATE_BND_DEPTH=42    GT:FT:GQ:PL:PR  0/1:PASS:48:98,0,228:16,8
chr7    148936234   MantaBND:929:0:1:0:0:0:0    T   T[chr1:15503064[    48  PASS    SVTYPE=BND;MATEID=MantaBND:929:0:1:0:0:0:1;IMPRECISE;CIPOS=-132,133;BND_DEPTH=42;MATE_BND_DEPTH=45  GT:FT:GQ:PL:PR  0/1:PASS:48:98,0,228:16,8
......

Manta 得到的结构变异的 vcf 文件,和普通的 SNVs/INDELs 突变格式大致相同,但结构变异的 vcf 文件的第3列记录发生结构变异的类型:INS、DEL、DUP、BND 其中不同类型的变异记录方式略有不同,可以通过 vcf 文件第 3 列来判断发生变异的类型:

  • INDEL:如果检测到变异为 INDEL,片段长度一般不超过 1000 bp,且在 INFO 列还会记录 CIGAR值用以描述插入或者缺失的情况
  • BND:如果是 BND 事件,通常会有两个断点,为对应关系,记录在 INFO 列的 MATEID tag标签,即两条记录具有相同的 MATEID
  • Inversions:如果发生的变异是倒置,则会有 4 条记录,它们的 INFO 列中有共同的 EVENT标记,如以下示例:
代码语言:javascript
代码运行次数:0
运行
复制
chr1    205209478   MantaBND:9264:0:0:0:0:0:0   C   CACAAC]chr1:205209646]  999 PASS    SVTYPE=BND;MATEID=MantaBND:9264:0:0:0:0:0:1;SVINSLEN=5;SVINSSEQ=ACAAC;EVENT=MantaBND:9264:0:0:0:0:0:0;JUNCTION_QUAL=999;BND_DEPTH=46;MATE_BND_DEPTH=36  GT:FT:GQ:PL:PR:SR   1/1:PASS:174:999,177,0:0,12:0,32
chr1    205209503   MantaBND:9264:0:0:1:0:0:0   C   [chr1:205209707[C   999 PASS    SVTYPE=BND;MATEID=MantaBND:9264:0:0:1:0:0:1;CIPOS=0,41;HOMLEN=41;HOMSEQ=CAGAAATGGTGATAAAAATAAGATACTAAGATCATGACAGG;EVENT=MantaBND:9264:0:0:0:0:0:0;JUNCTION_QUAL=999;BND_DEPTH=42;MATE_BND_DEPTH=31  GT:FT:GQ:PL:PR:SR   1/1:PASS:174:999,177,0:0,11:0,20
chr1    205209646   MantaBND:9264:0:0:0:0:0:1   A   AGTTGT]chr1:205209478]  999 PASS    SVTYPE=BND;MATEID=MantaBND:9264:0:0:0:0:0:0;SVINSLEN=5;SVINSSEQ=GTTGT;EVENT=MantaBND:9264:0:0:0:0:0:0;JUNCTION_QUAL=999;BND_DEPTH=36;MATE_BND_DEPTH=46  GT:FT:GQ:PL:PR:SR   1/1:PASS:174:999,177,0:0,12:0,32
chr1    205209666   MantaBND:9264:0:0:1:0:0:1   C   [chr1:205209544[C   999 PASS    SVTYPE=BND;MATEID=MantaBND:9264:0:0:1:0:0:0;CIPOS=0,41;HOMLEN=41;HOMSEQ=TGTCATGATCTTAGTATCTTATTTTTATCACCATTTCTGGA;EVENT=MantaBND:9264:0:0:0:0:0:0;JUNCTION_QUAL=999;BND_DEPTH=31;MATE_BND_DEPTH=42  GT:FT:GQ:PL:PR:SR   1/1:PASS:174:999,177,0:0,11:0,20

另外,vcf 文件中的 INFO 列有很多 tag,不同的结构变异事件包含的列不同,每一个 tag 的信息是:

INFO TAG

Description

IMPRECISE

Flag indicating that the structural variation is imprecise, i.e. the exact breakpoint location is not found

SVTYPE

Type of structural variant

SVLEN

Difference in length between REF and ALT alleles

END

End position of the variant described in this record

CIPOS

Confidence interval around POS

CIEND

Confidence interval around END

CIGAR

CIGAR alignment for each alternate indel allele

MATEID

ID of mate breakend

EVENT

ID of event associated to breakend

HOMLEN

Length of base pair identical homology at event breakpoints

HOMSEQ

Sequence of base pair identical homology at event breakpoints

SVINSLEN

Length of insertion

SVINSSEQ

Sequence of insertion

LEFT_SVINSSEQ

Known left side of insertion for an insertion of unknown length

RIGHT_SVINSSEQ

Known right side of insertion for an insertion of unknown length

BND_DEPTH

Read depth at local translocation breakend

MATE_BND_DEPTH

Read depth at remote translocation mate breakend

JUNCTION_QUAL

If the SV junction is part of an EVENT (ie. a multi-adjacency variant), this field provides the QUAL value for the adjacency in question only

SOMATIC

Flag indicating a somatic variant

SOMATICSCORE

Somatic variant quality score

JUNCTION_SOMATICSCORE

If the SV junction is part of an EVENT (ie. a multi-adjacency variant), this field provides the SOMATICSCORE value for the adjacency in question only

CONTIG

Assembled contig sequence, if the variant is not imprecise (with --outputContig)

参考:https://github.com/Illumina/manta

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2022-05-02,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 生信菜鸟团 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 简介
  • 下载安装
  • 实际运行
  • 输出结果
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档