前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >5 一步法找基因变异流程

5 一步法找基因变异流程

作者头像
Y大宽
发布2019-06-03 08:53:03
1.5K0
发布2019-06-03 08:53:03
举报
文章被收录于专栏:Y大宽Y大宽Y大宽

1 先查看sam文件

随机选择3个

$ samtools mpileup SRR8517854.bam |head -95|tail -3
[mpileup] 1 samples in 1 input files
chr1    10105   N       8       AAAAcAAA        kuuu>6uK
chr1    10106   N       10      CCCCcCCCC^$c    uuuukAuuAK
chr1    10107   N       10      CCCCcCCCCc      upukaFfuKF

再看另外一个:

$ samtools mpileup SRR8517854.bam |head -168944|tail -3
[mpileup] 1 samples in 1 input files
chr1    909515  N       1       c       K
chr1    909516  N       1       c       K
chr1    909517  N       1       g       K

关于samtools mpileup的具体用法和结果解释请看

2 最简单的找变异流程

align文件夹

ref=/mnt/f/kelly/bioTree/server/wesproject/hg38/Homo_sapiens_assembly38.fasta
time samtools mpileup -ugf $ref *.bam|bcftools call -vmO z -o wxs_out.vcf.gz 
less -S wxs_out.vcf.gz

3载入IGV查看

3.1先构建索引

批量构建

ls *.bam|xargs -i samtools index {}

如果是服务器需要下载到本地查看

4 去除PCR重复

需要samtools的4个命令

align文件夹

samtools markdup -r SRR7696207.bam SRR7696207.rm.bam
[markdup] error: no ms score tag. Please run samtools fixmate on file first.
[markdup] error: no ms score tag. Please run samtools fixmate on file first.

提示先fixmate

$ samtools fixmate SRR7696207.bam SRR7696207.fixmate.bam
[bam_mating_core] ERROR: Coordinate sorted, require grouped/sorted by queryname.

提示先要sort,并且要以queryname进行sort

$ samtools sort -n -o SRR7696207.sort.bam SRR7696207.bam
$ samtools sort -n -o SRR7696207.namesort.bam SRR7696207.bam
[bam_sort_core] merging from 25 files and 1 in-memory blocks...

接下来就可以逆回去了,也就是是正确的顺序(本篇最下方有说明)

$ samtools sort -n -o SRR7696207.namesort.bam SRR7696207.bam
$ samtools fixmate -m SRR7696207.namesort.bam SRR7696207.fixmate.bam
$ samtools sort -o SRR7696207.positionsort.fixmat.bam SRR7696207.fixmate.bam
$ samtools markdup -r SRR7696207.positionsort.fixmat.bam SRR7696207.rm.bam

看下以上的文件结构和大小

├── [ 136]  1
├── [ 136]  2
├── [   0]  config
├── [ 42K]  markdup.bam
├── [ 22G]  SRR7696207.bam
├── [5.4G]  SRR7696207.fixmate.bam
├── [5.2G]  SRR7696207.namesort.bam
├── [4.0G]  SRR7696207.positionsort.fixmat.bam
├── [3.8G]  SRR7696207.rm.bam
├── [2.6G]  SRR8517853.bam
├── [3.4G]  SRR8517854.bam
└── [6.9G]  SRR8517856.bam

最后查看下rm.bam

samtools view SRR7696207.rm.bam |less -S
SRR7696207.27107225     2193    chr1    10024   0       86M64H  chr5    10324   0       CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
SRR7696207.27728977     99      chr1    10025   17      91M18S  =       10025   91      TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA
SRR7696207.27728977     147     chr1    10025   17      91M18S  =       10025   -91     TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA
SRR7696207.4339191      163     chr1    10027   0       69M     =       10039   123     ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC
SRR7696207.12487651     163     chr1    10028   0       86M     =       10028   81      CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
SRR7696207.9370306      83      chr1    10028   0       116M    =       10016   -128    CCCTAACCCTAACCCTAAACCTAACCCTAACCCTAACC
SRR7696207.12487651     83      chr1    10028   0       69S81M  =       10028   -81     TTTCTAGCAGTGGACTGCATACGTGTTCGCATACTGTG
SRR7696207.18904916     99      chr1    10030   0       81M69S  =       10030   79      CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
SRR7696207.18904916     147     chr1    10030   0       79M     =       10030   -79     CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT
SRR7696207.4339191      83      chr1    10039   0       111M7S  =       10027   -123    ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC
SRR7696207.20389847     99      chr1    10040   0       74M     =       10040   74      CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
SRR7696207.20389847     147     chr1    10040   0       76S74M  =       10040   -74     CATATTTTTGTTTTTTTTAATGTTACGGCGACCACCGA

看rm后的是否有差异

samtools flagstat SRR7696207.bam
samtools flagstat SRR7696207.rm.bam

结果如下

$ samtools flagstat SRR7696207.bam
55398860 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
372636 + 0 supplementary
0 + 0 duplicates
55374129 + 0 mapped (99.96% : N/A)
55026224 + 0 paired in sequencing
27513112 + 0 read1
27513112 + 0 read2
54512924 + 0 properly paired (99.07% : N/A)
54978908 + 0 with itself and mate mapped
22585 + 0 singletons (0.04% : N/A)
330146 + 0 with mate mapped to a different chr
252082 + 0 with mate mapped to a different chr (mapQ>=5)
$ samtools flagstat SRR7696207.rm.bam
52114377 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
372636 + 0 supplementary
0 + 0 duplicates
52089646 + 0 mapped (99.95% : N/A)
51741741 + 0 paired in sequencing
25868532 + 0 read1
25873209 + 0 read2
51246880 + 0 properly paired (99.04% : N/A)
51699203 + 0 with itself and mate mapped
17807 + 0 singletons (0.03% : N/A)
319884 + 0 with mate mapped to a different chr
242709 + 0 with mate mapped to a different chr (mapQ>=5)

可以再试试-S参数

samtools markdup -S SRR7696207.sam SRR7696207.mk.bam

补充:samtoolsmarkdup操作的正确顺序

  1. The first sort can be omitted if the file is already name ordered
samtools sort -n -o namesort.bam example.bam
  1. Add ms and MC tags for markdup to use later
samtools fixmate -m namesort.bam fixmate.bam
  1. Markdup needs position order
samtools sort -o positionsort.bam fixmate.bam
  1. Finally mark duplicates
samtools markdup positionsort.bam markdup.bam

以上是简单的找变异流程,并不完善。

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2019.06.02 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1 先查看sam文件
  • 2 最简单的找变异流程
  • 3载入IGV查看
    • 3.1先构建索引
    • 4 去除PCR重复
    • 接下来就可以逆回去了,也就是是正确的顺序(本篇最下方有说明)
    • 看下以上的文件结构和大小
    • 最后查看下rm.bam
    • 看rm后的是否有差异
    • 补充:samtoolsmarkdup操作的正确顺序
    领券
    问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档