这也能画？

生信技能树

发布于 2021-11-04 14:34:07

5810

发布于 2021-11-04 14:34:07

文章被收录于专栏：生信技能树

朋友圈看到了小伙伴转发了Twitter的一幅有意思的图片，如下所示：

有意思的图片

其实就是一个单细胞的降维聚类分群，特殊之处在于它出现了一个能被人类想象力丰富起来的造型，所以就有了左边他们全体实验室自己摆pose并且着装不同颜色衣服的模拟。

非常的形象，理论上这样的单细胞的降维聚类分群如果我们没有原始数据，是不可能重复出来它的原图。但是这些单细胞亚群其实就是大脑里面的，所以我们找一下所有神经退行性疾病的单细胞测序数据集即可，这里我们以PNAS杂志的一个关于AD的单细胞的数据集， https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE157827 为例子，它有 21个10x样品哦，来作为例子。

我把每个样品都独立处理后看其umap图，确实没有类似的造型，但是NC11这个样品比较容易被展示，https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4775575，下载这个10x单细胞转录组样品的3个文件，如下所示：

GSM4775575_NC11_barcodes.tsv.gz 19.0 Kb  
GSM4775575_NC11_features.tsv.gz 297.6 Kb 
GSM4775575_NC11_matrix.mtx.gz 12.7 Mb

把这些文件的前缀删除后，存放在一个文件夹里面，就开始读取这个文件夹，进行降维聚类分群，代码如下所示：

 library(scRNAstat) 
library(Seurat)
library(ggplot2)
library(clustree)
library(cowplot)
library(dplyr)
 x='check_nc11'
dir.create( x ) 
# 保证 NC11这个文件夹存在，并且NC11文件夹里面有3个文件，就是前面下载后并且改名的  
sce=CreateSeuratObject(counts = Read10X('NC11/') )
sce = basic_qc(sce=sce,org='human',
               dir = x)  
#sce = basic_filter(sce)  
sce = basic_workflow(sce,dir = x)   
markers_figures <- basic_markers(sce,
                                 org='human',
                                 group='seurat_clusters',
                                 dir = x)
p_umap = DimPlot(sce,reduction = 'umap',  
                 group.by = 'seurat_clusters',
                 label.box = T,  label = T,repel = T)
p_umap+markers_figures[[1]]

ggsave(paste0('umap_markers_for_',x,'.pdf'),width = 12)

因为它是脑部数据集，我们内置的标记基因在这里无法发挥作用，所以我们根据文献列出全新的基因，重新可视化：

astrocytes = c("AQP4", "ADGRV1", "GPC5", "RYR3") 
endothelial = c("CLDN5", "ABCB1", "EBF1") 
excitatory = c("CAMK2A", "CBLN2", "LDB2") 
inhibitory = c("GAD1", "LHFPL3", "PCDH15") 
microglia = c("C3", "LRMDA", "DOCK8") 
oligodendrocytes = c("MBP", "PLP1", "ST18") 
gene_list = list(
  Astro = astrocytes,
  Endo = endothelial,
  Excit = excitatory,
  Inhib = inhibitory,
  Mic = microglia,
  Oligo = oligodendrocytes
)

p_umap+ DotPlot(sce, assay = "RNA", features = gene_list, 
                group.by = 'seurat_clusters') + theme(axis.text.x = element_text(angle = 45, 
                                                                                 vjust = 0.5, hjust=0.5))
ggsave(paste0('umap_brain_markers_for_',x,'.pdf'),width = 15)

如下所示：

原始的降维聚类分群

这个时候需要根据各个细胞亚群高表达量基因列表，对原始的降维聚类分群后的 0-15个细胞亚群进行生物学命名，代码如下所示：

(n=length(unique(sce@meta.data$seurat_clusters))) #获取亚群个数
celltype=data.frame(ClusterID=0:(n-1),
                    celltype='Excit')  #构建数据框 
## 判断亚群ID属于那类细胞
celltype[celltype$ClusterID %in% c(10,14),2]='Inhib'
celltype[celltype$ClusterID %in% c(8),2]='Mic'
celltype[celltype$ClusterID %in% c(6,7),2]='Oligo' 
celltype[celltype$ClusterID %in% c(13),2]='Endo' 
celltype[celltype$ClusterID %in% c(9,11),2]='Astro'
## 重新赋值
sce@meta.data$celltype = "NA"
for(i in 1:nrow(celltype)){
  sce@meta.data[which(sce@meta.data$seurat_clusters == celltype$ClusterID[i]),'celltype'] <- celltype$celltype[i]}
table(sce@meta.data$celltype) 
cl=c('#1da95c','#f9ab2f','#c84c50','#9b6ca2','#2d89c8','#111312')
p1=DimPlot(sce, reduction = 'umap', group.by = 'celltype',
           cols = cl, # repel = T, label.box = T,    label = TRUE, 
     pt.size = 0.5) + NoLegend()
p2=DotPlot(sce, assay = "RNA", features = gene_list, 
        group.by = 'celltype') + theme(axis.text.x = element_text(angle = 45, 
                                                                         vjust = 0.5, hjust=0.5))
library(patchwork)
p1+p2
ggsave(paste0('umap_brain_markers_for_',x,'_by_cellType.pdf'),width = 15)

最后出图如下所示：