【直播】我的基因组46:SNV突变(96种)频谱的制作

昨天我们学习了正常情况下,6种SNV(C>A, C>G, C>T, T>A, T>C, T>G)突变频谱的制作,但是如果考虑到突变的上下文,就可以变成96种(如下图所示)!(如果你还不了解mutation siganures,请自行复制http://www.bio-info-trainee.com/1619.html或查看原文)

The mutational spectrum of a set of SNVs was determined by classifying all SNVs contained in the set by their type of mutation (C . A, C . G, C . T, T . A, T . C, T . G) and the sequence context (i.e., the preceding and the following base). The resulting count matrix with dimensions 4 · 4 · 6 (with each cell representing a mutation of one base triplet into another) was then normalized for the observed frequency of each source base triplet in the genome that the calls were made against. An additional conversion into percentage was performed to allow for comparison of SNV sets with different sizes.

这里我们可以自己小脚本来做,也可以直接使用程序,我这里还是用号称可以替代生物学工程师的强大的bedtools软件。

http://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html

简单的阅读该软件的说明书就知道了,需要把vcf文件转为3列的bed格式,就染色体号,起始终止坐标即可!

cat realign.vcf |grep -v INDEL |grep -v "^#" |cut -f 1-5|grep -v "," |perl -alne '{print join("\t",$F[0],$F[1]-2,$F[1]+1,"$F[3]:$F[4]")}' >vcf.bed

注意VCF文件的坐标不仅仅是上下文3个碱基,起始坐标应该左移,这是bed文件的特性,从0开始的!

还有第4列是突变形式,在下面的bedtools里面会用得到!

然后调用bedtools即可,代码如下:

~/biosoft/bedtools/bedtools2/bin/bedtools getfasta -fi ~/reference/genome/human_g1k_v37/human_g1k_v37.fasta -bed vcf.bed -tab -name -fo vcf.fasta

结果显示共4131526行数据!

这个结果可以用一个网页工具来检查一下:http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr1:10248,10251

接下来我们就是要对上面的四百多万行数据进行统计咯,左边一列是突变形式,右边是上下文,我们还是跟上一讲一样,突变形式只考虑6种!【直播】我的基因组 45:SNV突变(6种)频谱的制作

代码如下:

perl -alne '{$tmp=$_;s/A:C/T:G/; s/A:T/T:A/; s/A:G/T:C/; s/G:A/C:T/; s/G:C/C:G/; s/G:T/C:A/; print "$tmp\t$_"}' vcf.fasta |cut -f 3,4 |sort |uniq -c >96context.counts

结果如下:

可视化如下,其实应该有更好的展现方式的,而且我的代码稍微有点复杂了:

dat=read.table('96context.counts')
colnames(dat)=c('counts','mut','context')
dat$percent = 100*dat$counts/sum(dat$counts)
library(ggplot2)
p=ggplot(dat,aes( x=1:nrow(dat),y=percent,fill=mut))+geom_bar(stat="identity")
p=p+ylab('Mutation type probabilty')+ xlab('context')+ggtitle("Mutation signature")
p=p+scale_x_continuous(breaks=1:192,labels = dat$context, expand = c(0, 0))+scale_y_continuous(expand = c(0, 0))
p=p+theme_set(theme_set(theme_bw(base_size=20)))
p=p+theme(text=element_text(face='bold'),
axis.text.x=element_text(angle=30,hjust=1,size =6),
plot.title = element_text(hjust = 0.5) ,
panel.grid = element_blank(),
#panel.border = element_blank()
)
print(p)

http://www.cookbook-r.com/Graphs/Axes_(ggplot2)/

http://stackoverflow.com/questions/40675778/center-plot-title-in-ggplot2

http://stackoverflow.com/questions/22945651/how-to-remove-space-between-axis-area-plot-in-ggplot2

如果要自己写脚本,请参考生信技能树论坛,我发的帖子:

http://www.biotrainee.com/thread-666-1-1.html

http://www.bio-info-trainee.com/1623.html

http://cancer.sanger.ac.uk/cosmic/signatures

文:Jimmy

图文编辑:吃瓜群众

原文发布于微信公众号 - 生信技能树(biotrainee)

原文发表时间:2017-01-22

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏我和未来有约会

Kit 3D 更新

Kit3D is a 3D graphics engine written for Microsoft Silverlight. Kit3D was inita...

2506
来自专栏hbbliyong

WPF Trigger for IsSelected in a DataTemplate for ListBox items

<DataTemplate DataType="{x:Type vm:HeaderSlugViewModel}"> <vw:HeaderSlug...

4054
来自专栏Ceph对象存储方案

Luminous版本PG 分布调优

Luminous版本开始新增的balancer模块在PG分布优化方面效果非常明显,操作也非常简便,强烈推荐各位在集群上线之前进行这一操作,能够极大的提升整个集群...

3095
来自专栏一个会写诗的程序员的博客

Spring Reactor 项目核心库Reactor Core

Non-Blocking Reactive Streams Foundation for the JVM both implementing a Reactiv...

2132
来自专栏杨龙飞前端

scrollto 到指定位置

2494
来自专栏C#

DotNet加密方式解析--非对称加密

    新年新气象,也希望新年可以挣大钱。不管今年年底会不会跟去年一样,满怀抱负却又壮志未酬。(不过没事,我已为各位卜上一卦,卦象显示各位都能挣钱...)...

4828
来自专栏陈仁松博客

ASP.NET Core 'Microsoft.Win32.Registry' 错误修复

今天在发布Asp.net Core应用到Azure的时候出现错误InvalidOperationException: Cannot find compilati...

4818
来自专栏张善友的专栏

LINQ via C# 系列文章

LINQ via C# Recently I am giving a series of talk on LINQ. the name “LINQ via C...

2625
来自专栏转载gongluck的CSDN博客

cocos2dx 打灰机

#include "GamePlane.h" #include "PlaneSprite.h" #include "BulletNode.h" #include...

5346
来自专栏张善友的专栏

Miguel de Icaza 细说 Mix 07大会上的Silverlight和DLR

Mono之父Miguel de Icaza 详细报道微软Mix 07大会上的Silverlight和DLR ,上面还谈到了Mono and Silverligh...

2677

扫码关注云+社区