前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >NC单细胞文章复现(七):Gene expression signatures(2)

NC单细胞文章复现(七):Gene expression signatures(2)

作者头像
生信技能树jimmy
发布2021-07-02 18:19:23
7230
发布2021-07-02 18:19:23
举报
文章被收录于专栏:单细胞天地单细胞天地

分享是一种态度

我们今天继续探索这3个gene signatures,首先看它在不同clusters的细胞之间的表达分布。

代码语言:javascript
复制
clust_avg_prognosis <- matrix(NA, nrow = length(unique(HSMM_allepith_clustering$Cluster)), ncol = ncol(prognosis_sig))
#列明是clusters1-5
rownames(clust_avg_prognosis) <- paste("clust", c(1:length(unique(HSMM_allepith_clustering$Cluster))), sep = "")
#行名是mammaprint、zenawerb、artega和Cluster 
colnames(clust_avg_prognosis) <- colnames(prognosis_sig)
#每个clusters的三个gene signatures的平均值
for (c in 1:length(unique(HSMM_allepith_clustering$Cluster))) {
  clust_avg_prognosis[c,] <- apply(prognosis_sig[which(HSMM_allepith_clustering$Cluster == c),], 2, mean)}

prognosis_sig <- as.data.frame(prognosis_sig)
all.equal(rownames(prognosis_sig), colnames(HSMM_allepith_clustering))
prognosis_sig$Cluster <- as.numeric(HSMM_allepith_clustering$Cluster)
prognosis_melt <- melt(prognosis_sig, id.vars = c("Cluster"))
prognosis_melt$value <- as.numeric(prognosis_melt$value)
prognosis_melt <- prognosis_melt %>%
  filter(Cluster %in% c(2,3,4))

#画fig4b
p <- ggplot(prognosis_melt, aes(factor(Cluster), value, fill = factor(Cluster))) +
  scale_fill_manual(values = c("1" = "#ee204d", "2" = "#17806d", "3" = "#b2ec5d", "4" = "#cda4de", "5" = "#1974d2"))
p + geom_violin(adjust = .5) + facet_wrap(~variable) + stat_summary(fun.y = mean, geom = "point", shape = 22, size = 3)

代码语言:javascript
复制
#画figS16
venn(list("PS" = mammaprint, "MBS" = zenawerb, "RTS" = artega))

可以看到这3个gene signatures没有重叠的基因,并且它们来源不同,但这3个 gene signatures均在clusters 2亚群中都高表达。

今天这一部分,作者主要是想跟临床预后联系到一起,于是选取了3个有代表性的数据集gene signature,对868个上皮细胞的5个clusters进行探索,结果提示clusters2具有与其他clusters不同的特征,接着对clusters2进行进一步探索。在之前的第五篇笔记tSNE中,已经用differentialGeneTest对每一个上皮细胞cluster与全部的上皮细胞进行差异分析后,以FDR=0.1为阈值,选取前10个有显著差异变化的基因在METABRIC 数据库中进行生存分析,作者在补充材料里已经描述得很详细。

结果显示高表达clusters2的肿瘤与OS显著负相关相关性。相反,但是在三个gene signatures中的预后则无统计学意义。此外,clusters1、3、4和5的基因无法预测临床结果。这些发现揭示了单细胞测序分析能揭示与临床预后相关细胞状态的能力,但这些细胞状态还没有在肿瘤真实世界样本中没被发现。

我们继续探索。在fig3a中,我们已经对868个上皮细胞进行tSNE,接着我们将这个tSNE结果,进行进一步注释可并可视化。

代码语言:javascript
复制
## 可视化normal signatures,figS10b
plot_cell_clusters(HSMM_allepith_clustering, 1, 2, color = "assignments_normsig_both", cell_size = 3)

代码语言:javascript
复制
#可视化TNBCtype4 Lehman signatures
#去掉immunomodulatory和mesenchymal_stem_like上皮细胞
lehman_avg_both_epithelial_new <- lehman_avg_both_epithelial[-which(rownames(lehman_avg_both_epithelial) %in% c("immunomodulatory", "mesenchymal_stem_like")),]
#给868个上皮细胞进行分类(basal_like_1,basal_like_2,mesenchymal,luminal_ar)
assignments_lehman_both_epithelial_new <- apply(lehman_avg_both_epithelial_new, 2, function(x){rownames(lehman_avg_both_epithelial_new)[which.max(x)]})
all.equal(colnames(HSMM_allepith_clustering), names(assignments_lehman_both_epithelial_new))
pData(HSMM_allepith_clustering)$assignments_lehman_both_new <- assignments_lehman_both_epithelial_new
#画figS10c
plot_cell_clusters(HSMM_allepith_clustering, 1, 2, color = "assignments_lehman_both_new", cell_size = 3)
代码语言:javascript
复制
#可视化mammaprint signature (PS),figS10d
plot_cell_clusters(HSMM_allepith_clustering, 1, 2, color = "mammaprint_avg_exprs", cell_size = 3) +
  scale_color_continuous(low = "yellow", high = "blue")

然后,通过研究在cluster2中与其他上皮细胞群的差异表达的基因中富集的通路变化,来确定cluster2亚群背后的功能机制。我们发现,最富集的通路与鞘糖脂的生物合成和溶酶体的周转有关,这两种途径影响了同样富集到的先天免疫系统的细胞因子途径。接着可视化每个样本的鞘糖脂的生物合成和先天免疫系统的细胞因子两条途径的热图。

糖鞘糖脂最近被认为是乳腺癌中许多促进肿瘤特性的介质,包括改变的生长因子信号、EMT和干样行为。该途径中的多个关键基因在cluster2亚群中高表达,包括糖脂转移蛋白基因GLTP和鞘脂生物合成关键亚单位基因SPTLC1。鞘磷脂也被认为是天然免疫系统和获得性免疫系统的重要调节剂,并与多种上皮组织炎症相关癌变有关。

代码语言:javascript
复制
#path1是糖鞘糖脂的关键基因
path1 <- c("ST3GAL4", "ST3GAL6", "ST8SIA1", "FUT1", "FUT2", "FUT3", "FUT4", "FUT6", "FUT9", "GLTP", "SPTLC1", "KDSR", "SPTLC2", "CERK", "ARSA")
idx_path1 <- match(path1, rownames(HSMM_allepith_clustering))
path1是先天免疫系统的细胞因子途径的关键基因
path2 <- c("CCL20", "CCL22", "CCL4", "CCR6", "IL11", "IL12RB1", "IL6R", "CSF2RA", "BMP7", "GLMN", "GPI", "INHBA", "TNF", "TNFSF15", "GHR", "LEPR", "TLR1", "TLR2", "TLR5", "TLR7", "TLR10", "F11R")
idx_path2 <- match(path2, rownames(HSMM_allepith_clustering))

path3 <- c("ERBB2", "AKT1", "AKT3", "PIK3R3", "PIK3R4", "RPS6KB2", "TRIB3", "BTK", "GRB10", "GRB2", "ILK", "PAK1", "PRKCZ", "CSNK2A1",
           "IRS1", "IRAK1", "MYD88", "MAP2K1", "MAPK8", "MAPK1", "PTPN11", "EIF4E", "EIF4EBP1", "EIF4G1", "FKBP1A", "PDK1", "RHEB", "RPS6KB1")
idx_path3 <- match(path3, rownames(HSMM_allepith_clustering))

all_idx_paths <- c(idx_path1, idx_path2)
#命名
names_path <- c(rep("glycosphigolipid metabolism", length(idx_path1)), rep("innate immunity", length(idx_path2)))
#上色
anno_cols_path <- c("glycosphigolipid metabolism" = "#ff9baa", "innate immunity" = "#17806d")
ha_path_rows <- HeatmapAnnotation(df = data.frame(pathway = names_path),
                                  annotation_legend_param = list(pathway = list(ncol = 1, title = "pathway", title_position = "topcenter")),
                                  which = "row", col = list("pathway" = anno_cols_path), annotation_width = unit(3, "mm"))
#分离每个cluster的矩阵,并提取关键基因。
mat_epith_allpats <- exprs(HSMM_allepith_clustering)
mats_epith_patient <- list()
pds_epith_patient <- list()
mats_epith_patient_clusters <- list()
for (i in 1:length(patients_now)) {
  pds_epith_patient[[i]] <- pData(HSMM_allepith_clustering)[which(pData(HSMM_allepith_clustering)$patient == patients_now[i]), ]
  mats_epith_patient[[i]] <- mat_epith_allpats[all_idx_paths, which(pData(HSMM_allepith_clustering)$patient == patients_now[i])]
  mats_epith_patient_clusters[[i]] <- list()
  for (c in 1:length(unique(HSMM_allepith_clustering$Cluster))) {
    mats_epith_patient_clusters[[i]][[c]] <- mats_epith_patient[[i]][, which(pds_epith_patient[[i]]$Cluster == c)]
  }
  names(mats_epith_patient_clusters[[i]]) <- paste0("clust", c(1:5))
}
names(mats_epith_patient) <- patients_now
names(pds_epith_patient) <- patients_now

# 接下来就是画图了。
ht_paths <-
  ha_path_rows + 
  Heatmap(mats_epith_patient_clusters[[1]][[1]], 
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          show_column_dend = TRUE, show_column_names = FALSE,
          name = "cluster1", column_title = "cluster1",
          row_names_gp = gpar(fontsize = 9, col = anno_cols_path),
          split = names_path) +
  Heatmap(mats_epith_patient_clusters[[1]][[2]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE,
          name = "cluster2", column_title = "cluster2",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[1]][[3]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          name = "cluster3", column_title = "cluster3",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[1]][[4]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          name = "cluster4", column_title = "cluster4",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[1]][[5]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          name = "cluster5", column_title = "cluster5",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[2]][[1]], 
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          show_column_dend = TRUE, show_column_names = FALSE,
          name = "cluster1_2", column_title = "cluster1",
          row_names_gp = gpar(fontsize = 9, col = anno_cols_path),
          split = names_path) +
  Heatmap(mats_epith_patient_clusters[[2]][[2]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE,
          name = "cluster2_2", column_title = "cluster2",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[2]][[3]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          name = "cluster3_2", column_title = "cluster3",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[2]][[4]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          name = "cluster4_2", column_title = "cluster4",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[2]][[5]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          name = "cluster5_2", column_title = "cluster5",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) + 
  Heatmap(mats_epith_patient_clusters[[3]][[1]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE,
          name = "cluster1_3", column_title = "cluster1",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[3]][[2]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE,
          name = "cluster2_3", column_title = "cluster2",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[3]][[3]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          name = "cluster3_3", column_title = "cluster3",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[3]][[4]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          name = "cluster4_3", column_title = "cluster4",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[3]][[5]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          name = "cluster5_3", column_title = "cluster5",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[4]][[1]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE,
          name = "cluster1_4", column_title = "cluster1",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[4]][[2]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE,
          name = "cluster2_4", column_title = "cluster2",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[4]][[3]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          name = "cluster3_4", column_title = "cluster3",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[4]][[4]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          name = "cluster4_4", column_title = "cluster4",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[4]][[5]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          name = "cluster5_4", column_title = "cluster5",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[5]][[1]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE,
          name = "cluster1_5", column_title = "cluster1",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[5]][[2]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE,
          name = "cluster2_5", column_title = "cluster2",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[5]][[3]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          name = "cluster3_5", column_title = "cluster3",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[5]][[4]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          name = "cluster4_5", column_title = "cluster4",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[5]][[5]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          name = "cluster5_5", column_title = "cluster5",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[6]][[1]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE,
          name = "cluster1_6", column_title = "cluster1",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[6]][[2]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE,
          name = "cluster2_6", column_title = "cluster2",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[6]][[3]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          name = "cluster3_6", column_title = "cluster3",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[6]][[4]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          name = "cluster4_6", column_title = "cluster4",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE) +
  Heatmap(mats_epith_patient_clusters[[6]][[5]],
          col = colorRamp2(c(-0.5, 1, 3), c("blue", "white", "red")),
          cluster_rows = TRUE, show_row_dend = FALSE, 
          name = "cluster5_6", column_title = "cluster5",
          show_row_names = FALSE,
          show_column_dend = TRUE, show_column_names = FALSE)

画得有点丑,画图参数得调整需要参考ComplexHeatmap包的说明书,继续美化一下。

到了这里,这篇文献的单细胞部分就学习完了,花了一个月的时间断断续续把这个系列共7篇笔记写完。作者把这1000多个细胞不断分群,不断探索,最后集中在了上皮细胞群,进一步细分,然后结合临床预后情况,继续探索。这是很标准的单细胞分析流程。当然,文章中也加了一些免疫组化的实验方法,加以验证。还有一部分外显子测序,这又是另外一个需要学习的领域了,在单细胞学习的路上,感恩得到了“单细胞天地”公众号团队的答疑解惑。2021年,希望能学到更多的东西,写更多的笔记,相信一切的努力都不会白白付出。

往期回顾

多组学分析肺结核队列的记忆T细胞状态

单细胞混样测序至少可以区分性别

构建seurat对象之初就应该是把基因名字转换好

OSCA单细胞数据分析笔记9—Clustering




如果你对单细胞转录组研究感兴趣,但又不知道如何入门,也许你可以关注一下下面的课程

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2021-06-09,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 单细胞天地 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档