免疫浸润一直是生信数据挖掘的重点,免疫浸润的各种算法也是大家学习生信数据挖掘必学的知识点。
前几天看到还有单独介绍各种免疫浸润方法实现的文章,其实现在不用那么麻烦了,已经有人帮我们写好了整合的R包,它就是:IOBR
。
这个包整合了常见的10种免疫浸润方法,只需要1行代码即可实现,而且输出的格式统一,方便进行可视化和数据整合操作!
完全不需要你自己分别进行操作,极大地简化了进行免疫浸润分析的步骤。
下面就给大家演示如何使用1行代码实现8种免疫浸润方法!
首先是安装R包。
先安装依赖包,我发现还有很多人被困在R包安装这一步,实在是不应该,我专门做了视频版教程帮大家解决R包安装的问题,建议R包安装还有问题的赶紧去看:可能是最好用的R包安装教程
最近(2023.05.09)R更新到了4.3.0版本,bioconductor也已经更新到了3.17版本,版本不匹配就会安装失败哈,初学者要注意!
# options("repos"= c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))
# options(BioC_mirror="http://mirrors.tuna.tsinghua.edu.cn/bioconductor/")
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
depens<-c('tibble', 'survival', 'survminer', 'sva', 'limma', "DESeq2","devtools",
'limSolve', 'GSVA', 'e1071', 'preprocessCore', 'ggplot2', "biomaRt",
'ggpubr', "devtools", "tidyHeatmap", "caret", "glmnet", "ppcor", "timeROC","pracma")
for(i in 1:length(depens)){
depen<-depens[i]
if (!requireNamespace(depen, quietly = TRUE))
BiocManager::install(depen,update = FALSE)
}
再安装IOBR
包:
if (!requireNamespace("IOBR", quietly = TRUE))
devtools::install_github("IOBR/IOBR")
这个R包不仅可以进行各种免疫浸润分析,还有很多使用的分析,这些我们放到以后再说,今天就主要介绍它的免疫浸润分析功能。
library(IOBR)
我们就用easyTCGA
包下载TCGA-COAD
的数据进行演示,这个包专门为小白解决TCGA数据下载和整理而写,可以参考:easyTCGA:让初学者也能享受“征服”TCGA的喜悦
library(easyTCGA)
getmrnaexpr("TCGA-COAD")
下载数据超级简单,如果你的网络通畅,只需要几分钟就可以得到6种表达矩阵和临床信息,且同时保存为rdata
和csv
两种格式。
我们使用tpm
数据进行演示,首先进行log2
转换,然后去掉一些低质量的基因。
load(file = "output_mRNA_lncRNA_expr/TCGA-COAD_mrna_expr_tpm.rdata")
expr_coad <- log2(mrna_expr_tpm+0.1)
expr_coad <- expr_coad[apply(expr_coad,1,sd)>0.5,]
expr_coad[1:4,1:4]
## TCGA-D5-6540-01A-11R-1723-07 TCGA-AA-3525-11A-01R-A32Z-07
## MT-CO2 14.47089 14.06391
## MT-CO3 14.42028 13.94628
## MT-ND4 14.35429 13.24382
## MT-CO1 14.63867 13.99546
## TCGA-AA-3525-01A-02R-0826-07 TCGA-AA-3815-01A-01R-1022-07
## MT-CO2 13.98376 14.84026
## MT-CO3 14.17448 14.66966
## MT-ND4 13.45308 14.51663
## MT-CO1 13.60652 14.53353
dim(expr_coad)
## [1] 17170 524
这样我们的表达矩阵就整理好了,下面就开始演示1行代码实现8种免疫浸润分析。
在此之前,先看看支持哪8种方法:
tme_deconvolution_methods
## MCPcounter EPIC xCell CIBERSORT
## "mcpcounter" "epic" "xcell" "cibersort"
## CIBERSORT Absolute IPS ESTIMATE SVR
## "cibersort_abs" "ips" "estimate" "svr"
## lsei TIMER quanTIseq
## "lsei" "timer" "quantiseq"
大家常见的方法全都囊括了,再也不用费功夫去找各种代码了!
并且每种方法都给出了参考文献和开源证书:
大家可能发现怎么没有ssGSEA
啊?这么优秀的算法,肯定有的,下面会给大家介绍的!
所有方法只要通过deconvo_tme()
函数即可完成,只要在method
中选择不同的方法即可!也不需要自己再去寻找各种方法对应的免疫细胞集,都是内置好的!
# MCPcounter
im_mcpcounter <- deconvo_tme(eset = expr_coad,
method = "mcpcounter"
)
##
## >>> Running MCP-counter
# EPIC
im_epic <- deconvo_tme(eset = expr_coad,
method = "epic",
arrays = F
)
##
## >>> Running EPIC
# xCell
im_xcell <- deconvo_tme(eset = expr_coad,
method = "xcell",
arrays = F
)
##
## >>> Running xCell
## [1] "Num. of genes: 9854"
## Estimating ssGSEA scores for 489 gene sets.
## [1] "Calculating ranks..."
## [1] "Calculating absolute values from ranks..."
|
|======================================================================| 100%
# CIBERSORT
im_cibersort <- deconvo_tme(eset = expr_coad,
method = "cibersort",
arrays = F,
perm = 1000
)
##
## >>> Running CIBERSORT
# IPS
im_ips <- deconvo_tme(eset = expr_coad,
method = "ips",
plot = F
)
##
## >>> Running Immunophenoscore
# quanTIseq
im_quantiseq <- deconvo_tme(eset = expr_coad,
method = "quantiseq",
scale_mrna = T
)
##
## Running quanTIseq deconvolution module
## Gene expression normalization and re-annotation (arrays: FALSE)
## Removing 17 noisy genes
## Removing 15 genes with high expression in tumors
## Signature genes found in data set: 134/138 (97.1%)
## Mixture deconvolution (method: lsei)
## Deconvolution sucessful!
# ESTIMATE
im_estimate <- deconvo_tme(eset = expr_coad,
method = "estimate"
)
##
## >>> Running ESTIMATE
## [1] "Merged dataset includes 9232 genes (1180 mismatched)."
## [1] "1 gene set: StromalSignature overlap= 136"
## [1] "2 gene set: ImmuneSignature overlap= 139"
# TIMER
im_timer <- deconvo_tme(eset = expr_coad
,method = "timer"
,group_list = rep("coad",dim(expr_coad)[2])
)
## ## Enter batch mode
## ## Loading immune gene expression
## [1] "Outlier genes: ACTB ACTG1 ACTG2 CLCA1 COL1A1 COL3A1 CXCL8 DEFA5 DEFA6 FTL GAPDH H4C3 HLA-C HLA-DRA JCHAIN LCN2 LGALS1 MT-ATP6 MT-ATP8 MT-CO1 MT-CO2 MT-CO3 MT-CYB MT-ND1 MT-ND2 MT-ND3 MT-ND4 MT-ND4L MT-ND5 MT-ND6 OLFM4 PAEP PI3 PIGR PLA2G2A PPBP PRSS2 REG1A REG1B REG3A RPL8 RPS11 RPS12 RPS18 RPS2 RPS21 RPS6 S100A6 S100P SLC25A6 SPINK4 TFF1 TFF3 TMSB10 TMSB4X"
## ## Removing the batch effect of C:\Users\liyue\AppData\Local\Temp\Rtmpw79DY6\file108c19af34b0
## Found2batches
## Adjusting for0covariate(s) or covariate level(s)
## Standardizing Data across genes
## Fitting L/S model and finding priors
## Finding parametric adjustments
## Adjusting the Data
# 需要提供reference,暂不演示!
#im_svr <- deconvo_tme(eset = expr_coad
# ,method = "svr"
# ,arrays = F
# ,reference =
# )
#im_lsei <- deconvo_tme(eset = expr_coad
# ,method = "lsei"
# ,arrays = F
# )
这样8种免疫浸润方法就做好了!是不是太easy了!
而且作者对结果进行了整理,都是统一的格式,方便你直接进行合并操作!随便给大家放几个看看:
dim(im_cibersort)
## [1] 524 26
im_cibersort[1:4,1:4]
## # A tibble: 4 × 4
## ID B_cells_naive_CIBERSORT B_cells_memory_…¹ Plasm…²
## <chr> <dbl> <dbl> <dbl>
## 1 TCGA-D5-6540-01A-11R-1723-07 0.0504 0 0
## 2 TCGA-AA-3525-11A-01R-A32Z-07 0.116 0 0.183
## 3 TCGA-AA-3525-01A-02R-0826-07 0.0655 0 0.149
## 4 TCGA-AA-3815-01A-01R-1022-07 0.0390 0 0
## # … with abbreviated variable names ¹B_cells_memory_CIBERSORT,
## # ²Plasma_cells_CIBERSORT
dim(im_xcell)
## [1] 524 68
im_xcell[1:4,1:4]
## # A tibble: 4 × 4
## ID aDC_xCell Adipocytes_xCell Astrocytes_xCell
## <chr> <dbl> <dbl> <dbl>
## 1 TCGA-D5-6540-01A-11R-1723-07 0.119 1.32e-18 2.03e- 2
## 2 TCGA-AA-3525-11A-01R-A32Z-07 0.451 0 1.13e-17
## 3 TCGA-AA-3525-01A-02R-0826-07 0.0792 6.63e-19 2.08e-19
## 4 TCGA-AA-3815-01A-01R-1022-07 0.418 4.95e-19 0
看到了吗?所有的结果都是:行是样本,列是细胞,行名是样本名,列名是细胞名,并且每种细胞名后面都有相应的方法名字的后缀,方便标识!
而且提供了方便的可视化函数,比如可视化cibersort
的结果:
library(tidyr)
# 取前12个样本做演示
res<-cell_bar_plot(input = im_cibersort[1:12,], title = "CIBERSORT Cell Fraction")
## There are seven categories you can choose: box, continue2, continue, random, heatmap, heatmap3, tidyheatmap
## >>>>=== Palette option for random: 1: palette1; 2: palette2; 3: palette3; 4: palette4
plot of chunk unnamed-chunk-9
关于免疫浸润的可视化,我们以后再专门介绍!
有了这样的数据方便大家直接进行合并操作!比如:
tme_combine <- im_mcpcounter %>%
inner_join(im_epic, by="ID") %>%
inner_join(im_xcell, by="ID") %>%
inner_join(im_cibersort, by="ID") %>%
inner_join(im_ips, by= "ID") %>%
inner_join(im_quantiseq, by="ID") %>%
inner_join(im_estimate, by= "ID") %>%
inner_join(im_timer, by= "ID")
tme_combine[1:4,1:4]
## # A tibble: 4 × 4
## ID T_cells_MCPcounter CD8_T_cells_MCPcounter Cytot…¹
## <chr> <dbl> <dbl> <dbl>
## 1 TCGA-D5-6540-01A-11R-1723-07 0.876 1.47 0.323
## 2 TCGA-AA-3525-11A-01R-A32Z-07 2.37 1.90 0.655
## 3 TCGA-AA-3525-01A-02R-0826-07 0.519 -1.00 -0.269
## 4 TCGA-AA-3815-01A-01R-1022-07 2.78 1.98 0.851
## # … with abbreviated variable name ¹Cytotoxic_lymphocytes_MCPcounter
dim(tme_combine)
## [1] 524 138
看看这统一的语法和格式,真是令人赏心悦目啊!
那ssGSEA
实现免疫浸润该怎么做呢?其实这个是该包的另一个重要的函数calculate_sig_score
实现的。
因为ssGSEA
是计算富集分数的算法,所以作者把它放到了这里,大家常见的这种方法对应的可能是28个免疫细胞的基因集,但是这个算法的牛逼之处就在于,你给它不同的基因集,它就可以计算不同的富集分数,并不是局限于28种细胞的那个基因集(上面8种方法都是局限于1种基因集的)。
除ssGSEA
之外,calculate_sig_score
还提供了pca/zscore/intergration
一共4种方法计算分数,真的是非常强!
## 基于ssGSEA计算免疫浸润分数
load(file = "../000files/ssGSEA28.Rdata")
im_ssgsea <- calculate_sig_score(eset = expr_coad
, signature = cellMarker # 这个28种细胞的文件需要自己准备
, method = "ssgsea" # 选这个就好了
)
##
## >>> Calculating signature score using ssGSEA method
## >>> log2 transformation is not necessary
## Estimating ssGSEA scores for 28 gene sets.
## [1] "Calculating ranks..."
## [1] "Calculating absolute values from ranks..."
##
|
|======================================================================| 100%
##
## [1] "Normalizing..."
im_ssgsea[1:4,1:4]
## # A tibble: 4 × 4
## ID `Activated B cell` `Activated CD4 T cell` Activ…¹
## <chr> <dbl> <dbl> <dbl>
## 1 TCGA-3L-AA1B-01A-11R-A37K-07 -0.192 0.0120 0.170
## 2 TCGA-4N-A93T-01A-11R-A37K-07 -0.249 0.00395 0.121
## 3 TCGA-4T-AA8H-01A-11R-A41B-07 -0.358 0.0552 0.0838
## 4 TCGA-5M-AAT4-01A-11R-A41B-07 -0.399 0.0909 0.107
## # … with abbreviated variable name ¹`Activated CD8 T cell`
这个28种细胞的文件需要自己准备,如果你需要,直接在本号后台回复ssgsea28
即可得到这个文件。
我已经在作者的github
提了issue,不知道会不会通过...
这个calculate_sig_score
函数得到的结果也是和上面一模一样的格式!也可以直接进行合并操作!
tme_combine <- tme_combine %>%
inner_join(im_ssgsea, by = "ID")
真的是太好用了!
前面我们说过,calculate_sig_score
可以根据不同的基因集计算不同的分数,作者为了方便大家,直接内置了已经公开发表的255种基因集,包括肿瘤微环境相关的、代谢相关的、m6A、外泌体、铁死亡和错配修复等。
可通过以下函数查看,大家可以自己探索一下:
# 总的
signature_collection
# 代谢相关
signature_metabolism
# 微环境相关
signature_tme
# 肿瘤相关
signature_tme
每一个signature
都是list结构:
signature_metabolism[1:4]
## $Cardiolipin_Metabolism
## [1] "CKMT1A" "CKMT1B" "CYCS" "NME4"
## [5] "TAZ" "TMEM256-PLSCR3"
##
## $Cardiolipin_Biosynthesis
## [1] "CRLS1" "PGS1" "PTPMT1" "TAMM41"
##
## $Cholesterol_Biosynthesis
## [1] "ACAT2" "CYP51A1" "DHCR24" "DHCR7" "EBP" "FDFT1" "FDPS"
## [8] "GGPS1" "HMGCR" "HMGCS1" "HSD17B7" "IDI1" "LBR" "LIPA"
## [15] "LSS" "MSMO1" "MVD" "MVK" "NSDHL" "PMVK" "SC5D"
## [22] "SOAT1" "SQLE" "TM7SF2"
##
## $Citric_Acid_Cycle
## [1] "ACLY" "ACO1" "ACO2" "CS" "DLAT" "DLD" "DLST" "FH"
## [9] "IDH1" "IDH2" "IDH3A" "IDH3B" "IDH3G" "MDH1" "MDH2" "MPC1"
## [17] "OGDH" "OGDHL" "PC" "PCK1" "PCK2" "PDHA1" "PDHA2" "PDHB"
## [25] "SDHA" "SDHB" "SDHC" "SDHD" "SUCLA2" "SUCLG1" "SUCLG2" "PDHX"
signature_metabolism[[1]]
## [1] "CKMT1A" "CKMT1B" "CYCS" "NME4"
## [5] "TAZ" "TMEM256-PLSCR3"
每种signature
还给出了参考文献,方便大家引用,绝对好用!
signature_collection_citation[1:2,]
## # A tibble: 2 × 6
## Signatures `Published year` Journal Title PMID DOI
## <chr> <dbl> <chr> <chr> <chr> <chr>
## 1 CD_8_T_effector 2018 Nature TGFβ attenuates tumour r… 2944… 10.1…
## 2 DDR 2018 Nature TGFβ attenuates tumour r… 2944… 10.1…
这么优秀的R包,大家快去用起来,使用时别忘记引用:
Zeng D, Ye Z, Shen R, Yu G, Wu J, Xiong Y,…, Liao W (2021) IOBR: Multi-Omics Immuno-Oncology Biological Research to Decode Tumor Microenvironment and Signatures. Frontiers in Immunology. 12:687975. doi: 10.3389/fimmu.2021.687975