前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Pathway and Gene Set Overdispersion Analysis

Pathway and Gene Set Overdispersion Analysis

作者头像
生信编程日常
发布2020-04-09 14:45:32
6430
发布2020-04-09 14:45:32
举报
代码语言:javascript
复制
require(devtools)
devtools::install_version('flexmix', '2.3-13')
devtools::install_github('hms-dbmi/scde', build_vignettes = FALSE)
代码语言:javascript
复制
library(scde)
data(pollen)
# remove poor cells and genes
cd <- clean.counts(pollen)
# check the final dimensions of the read count matrix
dim(cd)

将样本和组别转换成颜色

代码语言:javascript
复制
x <- gsub("^Hi_(.*)_.*", "\\1", colnames(cd))
l2cols <- c("coral4", "olivedrab3", "skyblue2", "slateblue3")[as.integer(factor(x, levels = c("NPC", "GW16", "GW21", "GW21+3")))]

为每个细胞构建错误模型。使用由knn.error.models()实现的k最近邻模型拟合过程。这个knn在测试数据集已经完成,这里并不运行,可直接导入

代码语言:javascript
复制
# EVALUATION NOT NEEDED
#knn <- knn.error.models(cd, k = ncol(cd)/4, n.cores = 1, min.count.threshold = 2, min.nonfailed = 5, max.model.plots = 10)
代码语言:javascript
复制
data(knn)
Normalizing variance
代码语言:javascript
复制
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = TRUE)
代码语言:javascript
复制
# list top overdispersed genes
sort(varinfo$arv, decreasing = TRUE)[1:10]
Controlling for sequencing depth
代码语言:javascript
复制
varinfo <- pagoda.subtract.aspect(varinfo, colSums(cd[, rownames(knn)]>0))
Evaluate overdispersion of pre-defined gene sets

为了检测单个细胞群体中显著的差异通路,pagoda识别在统计上显著表现协同变化的通路和基因集。对于每个基因集,测试了由第一个主成分解释的方差是否显著超过背景期望。既可以测试预先定义的基因集,也可以测试de novo基因集。

代码语言:javascript
复制
library(org.Hs.eg.db)
# translate gene names to ids
ids <- unlist(lapply(mget(rownames(cd), org.Hs.egALIAS2EG, ifnotfound = NA), function(x) x[1]))
rids <- names(ids); names(rids) <- ids 
# convert GO lists from ids to gene names
gos.interest <- unique(c(ls(org.Hs.egGO2ALLEGS)[1:100],"GO:0022008","GO:0048699", "GO:0000280")) 
go.env <- lapply(mget(gos.interest, org.Hs.egGO2ALLEGS), function(x) as.character(na.omit(rids[x]))) 
go.env <- clean.gos(go.env) # remove GOs with too few or too many genes
go.env <- list2env(go.env) # convert to an environment
代码语言:javascript
复制
#Now, we can calculate weighted first principal component magnitudes for each GO gene set in the provided environment.

pwpca <- pagoda.pathway.wPCA(varinfo, go.env, n.components = 1, n.cores = 1)
# evaluate the statistical significance of the observed overdispersion for each GO gene set.

df <- pagoda.top.aspects(pwpca, return.table = TRUE, plot = TRUE, z.score = 1.96)

image.png

Evaluate overdispersion of 'de novo' gene sets

还可以测试在给定数据集内其表达高度相关的“de novo”基因集。接下来的步骤将在数据中确定“de novo”基因簇,并建立基因集加权主成分大小期望的背景模型。并去掉离群值

代码语言:javascript
复制
clpca <- pagoda.gene.clusters(varinfo, trim = 7.1/ncol(varinfo$mat), n.clusters = 50, n.cores = 1, plot = TRUE)

image.png

代码语言:javascript
复制
df <- pagoda.top.aspects(pwpca, clpca, return.table = TRUE, plot = TRUE, z.score = 1.96)
head(df)

image.png

image.png

Visualize significant aspects of heterogeneity
代码语言:javascript
复制
# get full info on the top aspects
tam <- pagoda.top.aspects(pwpca, clpca, n.cells = NULL, z.score = qnorm(0.01/2, lower.tail = FALSE))
# determine overall cell clustering
hc <- pagoda.cluster.cells(tam, varinfo)
代码语言:javascript
复制
# combine pathways that are driven by the same sets of genes:
tamr <- pagoda.reduce.loading.redundancy(tam, pwpca, clpca)
#combine aspects that show similar patterns
tamr2 <- pagoda.reduce.redundancy(tamr, distance.threshold = 0.9, plot = TRUE, cell.clustering = hc, labRow = NA, labCol = NA, box = TRUE, margins = c(0.5, 0.5), trim = 0)

image.png

列代表细胞,行聚类代表相似的通路,绿色到橘色代表从低到高的加权PCA的分数,

代码语言:javascript
复制
#While each row here represents a cluster of pathways, the row names are assigned to be the top overdispersed aspect in each cluster.
col.cols <- rbind(groups = cutree(hc, 3))
pagoda.view.aspects(tamr2, cell.clustering = hc, box = TRUE, labCol = NA, margins = c(0.5, 20), col.cols = rbind(l2cols))

image.png

欢迎关注微信公众号~

参考:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4772672/

https://hms-dbmi.github.io/scde/pagoda.html

https://www.ncbi.nlm.nih.gov/pubmed/?term=Characterizing+transcriptional+heterogeneity+through+pathway+and+gene+set+overdispersion+analysis.

本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Normalizing variance
  • Controlling for sequencing depth
  • Evaluate overdispersion of pre-defined gene sets
  • Evaluate overdispersion of 'de novo' gene sets
  • Visualize significant aspects of heterogeneity
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档