前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >利用 Timescape 做肿瘤进化鱼图

利用 Timescape 做肿瘤进化鱼图

作者头像
生信菜鸟团
发布2020-04-30 18:17:51
4.4K1
发布2020-04-30 18:17:51
举报
文章被收录于专栏:生信菜鸟团生信菜鸟团

写在前面

前面我们使用 pyclone 分析了肿瘤样本的 clusters 结构,接下来我们进一步分析肿瘤进化,画一个鱼图,需要用到的工具是 citup 和 Timescape

参考:

  • https://github.com/sfu-compbio/citup
  • https://shahlab.ca/projects/citup/
  • https://www.cnblogs.com/xlij1205/p/10848917.html
  • https://bioconductor.org/packages/release/bioc/vignettes/timescape/inst/doc/timescape_vignette.html

citup 软件的安装和使用

CITUP 全称 Conality Inference in Tumors Using Phylogeny,是一种使用系统发育论进行肿瘤的克隆推断生物信息学工具,可用于从单个患者获得的多个样本来推断肿瘤异质性。

给定每个样本的突变频率,CITUP 使用基于优化的算法来找到最能解释数据的进化树。但是,前提条件是,CITUP 也是针对于深度测序数据的,这次勉强使用,为的是学习一下这个工具。

官网推荐我们使用 conda 来安装

代码语言:javascript
复制
# 添加channels
conda config --add channels http://conda.anaconda.org/dranew
conda config --add channels https://conda.anaconda.org/IBMDecisionOptimization/linux-64

# 创建小环境,然后安装,其实可以安装在前面的 pyclone 小环境里
conda create -n citup
conda activate citup
conda install -y citup h5py

# 安装完成后可以调用软件的帮助文档
run_citup_iter.py -h
run_citup_qip.py  -h

实际运行需要准备两个文件,一个是突变位点在不同样本的突变频率,行是突变位点,列是样本。另一个是突变的 cluster,只有单列,记录每个突变位点所在的 cluster 。两个文件的突变位点顺序要一致。具体可以参考:https://shahlab.ca/projects/citup/

在这里,我用前面 pyclone 结果进行处理:

代码语言:javascript
复制
for case in case{1..6}
do
cat ./9.pyclone/${case}_pyclone_analysis/tables/loci.tsv | cut -f 6 | sed '1d' | paste - - - -  >./9.pyclone/${case}_pyclone_analysis/freq.txt
cat ./9.pyclone/${case}_pyclone_analysis/tables/loci.tsv | cut -f 3 | sed '1d' | paste - - - - |cut -f 1 >./9.pyclone/${case}_pyclone_analysis/cluster.txt
## 获取 sample_id 信息,后面画图要用到
cat ./9.pyclone/${case}_pyclone_analysis/tables/loci.tsv | cut -f 2 | sed '1d' | head -4 >./9.pyclone/${case}_pyclone_analysis/sample_id
done

之后得到的文件结构就是:

代码语言:javascript
复制
$ ls ./9.pyclone/*
./9.pyclone/case1_pyclone_analysis:
cluster.txt  config.yaml  freq.txt  plots  sample_id  tables  trace  yaml

./9.pyclone/case2_pyclone_analysis:
cluster.txt  config.yaml  freq.txt  plots  sample_id  tables  trace  yaml

./9.pyclone/case3_pyclone_analysis:
cluster.txt  config.yaml  freq.txt  plots  sample_id  tables  trace  yaml

./9.pyclone/case4_pyclone_analysis:
cluster.txt  config.yaml  freq.txt  plots  sample_id  tables  trace  yaml

./9.pyclone/case5_pyclone_analysis:
cluster.txt  config.yaml  freq.txt  plots  sample_id  tables  trace  yaml

./9.pyclone/case6_pyclone_analysis:
cluster.txt  config.yaml  freq.txt  plots  sample_id  tables  trace  yaml

每个样本的 3 个文件是(以 case1 为例):

代码语言:javascript
复制
$ cat ./9.pyclone/case1_pyclone_analysis/freq.txt 
0.33636363636363636    0.41732283464566927 0.35135135135135137 0.12162162162162163
0.2191780821917808    0.28440366972477066 0.2125  0.11904761904761904
0.31137724550898205    0.2519480519480519  0.2246153846153846  0.13624678663239073
......

$ cat ./9.pyclone/case1_pyclone_analysis/cluster.txt 
1
0
0
0
......

$ cat ./9.pyclone/case1_pyclone_analysis/sample_id
case1_techrep_2
case1_biorep_A_techrep
case1_biorep_B
case1_biorep_C

然后就是运行,输入文件是 freq.txtcluster.txt,结果输出到 reasults.h5

代码语言:javascript
复制
for case in case{1..6}
do
run_citup_qip.py --submit local \
./9.pyclone/${case}_pyclone_analysis/freq.txt \
./9.pyclone/${case}_pyclone_analysis/cluster.txt \
./9.pyclone/${case}_pyclone_analysis/results.h5
done

接下来需要用 python 处理一下结果,因为得到的结果results.h5是一个hdf5格式,不是普通的文本文件,处理起来比较麻烦一些,用 R 语言或者 python 都可以处理,这里我尝试用 python 处理一下(代码有点丑):

然后就是从上述结果中取最佳拟合树、进化树节点,克隆组成等信息,我把处理的过程写成一个 python 脚本 citup.py

代码语言:javascript
复制
import sys
import h5py
hf=h5py.File(sys.argv[1],'r')
hf.keys()
opnum=hf["results/optimal/index"][0]
cellfreq=hf["trees/" + str(opnum) + "/clone_freq/block0_values"][:]
tree=hf["trees/" + str(opnum) + "/adjacency_list/block0_values"][:]
print cellfreq
print tree

然后就是批量处理:

代码语言:javascript
复制
for case in case{1..6}
do
python citup.py ./9.pyclone/${case}_pyclone_analysis/results.h5 | sed 's/^ \[//;s/\[//g;s/\]//g' | tr ' ' '\t'| grep '\.' >./9.pyclone/${case}_pyclone_analysis/cellfreq.txt
python citup.py ./9.pyclone/${case}_pyclone_analysis/results.h5 | sed 's/^ \[//;s/\[//g;s/\]//g' | tr ' ' '\t'| grep -v '\.' >./9.pyclone/${case}_pyclone_analysis/tree.txt
done

得到的 cellfreq.txttree.txt 这两个文件,前者行表示样本,列表示克隆 clusters,每个值代表克隆频率。后者表示的是进化树的克隆分支关系

代码语言:javascript
复制
$ cat ./9.pyclone/case1_pyclone_analysis/cellfreq.txt 
0.35833     0.076023    0.235088    0.330559        
0.348454    0.0699815   0.235574    0.34599         
0.401151    0.101147    0.217818    0.279883    
0.751012    0.00999059  0.120444    0.118553

$ cat ./9.pyclone/case1_pyclone_analysis/tree.txt 
0    1
1    2
0    3

Timescape 可视化

得到上面的结果,就可以进一步处理成 Timescape 的 input 了,处理的方法是:

The adjacency list can be written as a TSV with the column names source, target to be input into E-scape, and the clone frequencies should be reshaped such that each row represents a clonal frequency in a specific sample for a specific clone, with the columns representing the time or space ID, the clone ID, and the clonal prevalence.

也就是 tree.txt 不用做修改,cellfreq.txt 需要被处理为3列 time or space ID, the clone ID, and the clonal prevalence,R代码部分如下:

代码语言:javascript
复制
BiocManager::install("timescape")
library(timescape)
options(stringsAsFactors = F)
example("timescape")
browseVignettes("timescape") 
library(plotly)
library(htmlwidgets)
library(webshot)
for (i in 1:6) {
  # i = 1
  # tree 
  tree_edges = read.table(paste0("9.pyclone/case", i, "_pyclone_analysis/tree.txt"))
  colnames(tree_edges) = c("source","target")
  # clonal prevalences
  cellfreq = read.table(paste0("9.pyclone/case", i, "_pyclone_analysis/cellfreq.txt"))
  colnames(cellfreq) = 0:(length(cellfreq)-1)
  sample_id = read.table(paste0("9.pyclone/case", i, "_pyclone_analysis/sample_id"))
  cellfreq$timepoint = sample_id[ , 1]
  library(tidyr)
  clonal_prev = gather(cellfreq, key="clone_id", value = "clonal_prev", -timepoint)
  clonal_prev = clonal_prev[order(clonal_prev$timepoint),]
  # targeted mutations
  # mutations <- read.csv(system.file("extdata", "AML_mutations.csv", package = "timescape"))
  p = timescape(clonal_prev = clonal_prev, tree_edges = tree_edges,height=260)
  saveWidget(p, paste0("case", i,"_timescape", ".html"))
  }

因为 Timescape 的输出格式比较少见,上面我只能保存为网页文件,原文件是可交互的,找不到合适的转为图片的方法,所以下面我就直接放了截图。

可以看到,每个样本的克隆进化关系如图所示,当然结果并不是很理想,毕竟数据不是很完美,不过这并不影响我们对工具的学习:


生信技能树目前已经公开了三个生信知识库,记得来关注哦~

每周文献分享

https://www.yuque.com/biotrainee/weeklypaper

肿瘤外显子分析指南

https://www.yuque.com/biotrainee/wes

生物统计从理论到实践

https://www.yuque.com/biotrainee/biostat

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2020-04-27,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 生信菜鸟团 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 写在前面
  • citup 软件的安装和使用
  • Timescape 可视化
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档