这两个疾病的患者的PBMC跟正常人的在单细胞转录组水平差异很大吗

生信技能树jimmy

发布于 2023-12-19 19:51:27

3160

发布于 2023-12-19 19:51:27

文章被收录于专栏：单细胞天地

众所周知，2型糖尿病是一种复杂的代谢紊乱，会引起各种并发症，包括牙周炎。所以研究者们就选取了这两个疾病（ Type 2 diabetes mellitus (DM)和 periodontitis (PD) ）的患者的PBMC公共数据集做研究：

GSE165816 includes the transcriptional profiles of PBMCs obtained from patients with DM and healthy donors.
GSE164241 is a scRNA‐seq dataset of gingival tissues from patients with PD and from healthy controls.

两个单细胞转录组数据集汇总后是：

11 healthy controls,
10 patients with PD without DM,
six patients with PDDM

值得一提的是这个数据挖掘是韩国人做的哦，标题是：《Immunological link between periodontitis and type 2 diabetes deciphered by single‐cell RNA analysis》

首先是降维聚类分群和分组后看比例变化

如下所示：

看比例变化

然后是各个单细胞亚群在不同分组的各种差异分析和富集分析

如下所示：

各种差异分析和富集分析

高级分析

主要是针对具体的某个细胞亚群看功能变化，比如这个文章就是针对 CD8T and NK cells

打分：The cytotoxicity, exhaustion and activity scores
拟时序
细胞通讯

首先看看GSE164241

是2021发表在CELL杂志的《Human oral mucosa cell atlas reveals a stromal- neutrophil axis regulating tissue immunity》，文章的第一层次降维聚类分群是比较简单的：

endothelial (ACKR1, RAMP2, SELE, VWF, PECAM1),
fibroblast (LUM, COL3A1, DCN, COL1A1, CFD),
immune (CD69, CD52, CXCR4, PTPRC, HCST),
epithelial (KRT14, KRT5, S100A2, CTSA, SPRR1B)

第一层次降维聚类分群是比较简单的

是标准的10x数据集，所以很容易导入到R里面进行分析

###### step1:导入数据 ######    
samples=list.files('GSE164241_RAW/outputs/')
samples 
sceList = lapply(samples,function(pro){ 
  # pro=samples[1]
  folder=file.path('GSE164241_RAW/outputs/',pro) 
  print(pro)
  print(folder)
  print(list.files(folder))
  sce=CreateSeuratObject(counts = Read10X(folder),
                         project =  pro ,
                         min.cells = 5,
                         min.features = 300)
  
  return(sce)
})
names(sceList)  
samples  
names(sceList)  = samples
sce.all <- merge(sceList[[1]], y= sceList[ -1 ] ,add.cell.ids =  samples) 

as.data.frame(sce.all@assays$RNA@counts[1:10, 1:2])
head(sce.all@meta.data, 10)
table(sce.all@meta.data$orig.ident) 

library(stringr)  
sce.all@meta.data$group = substring(sce.all@meta.data$orig.ident,1,2) 
table(sce.all@meta.data$group )
head(sce.all@meta.data)

然后看GSE165816

是文章： Single cell transcriptomic landscape of diabetic foot ulcers. Nat Commun 2022 ，它的第一层次降维聚类分群要复杂一点，一步到位了：

第一层次降维聚类分群要复杂一点

本质上也是读取文件即可，代码是：

library(data.table)
dir='GSE165816_RAW/' 
samples=list.files( dir )
samples 

library(data.table)
sceList = lapply(samples,function(pro){ 
  # pro=samples[1] 
  print(pro)
  ct=fread(file.path( dir ,pro),data.table = F)
  ct[1:4,1:4]
  rownames(ct)=ct[,1]
  ct=ct[,-1]
  sce=CreateSeuratObject(counts =  ct ,
                         project =  gsub('counts.csv.gz','',
                                         strsplit(pro,'_')[[1]][2]),
                         min.cells = 5,
                         min.features = 300,)
  
  return(sce)
})
names(sceList)  
 
samples =  gsub('counts.csv.gz','',samples)
sce.all=merge(x=sceList[[1]],
              y=sceList[ -1 ],
              add.cell.ids = samples)

as.data.frame(sce.all@assays$RNA@counts[1:10, 1:2])
head(sce.all@meta.data, 10)
table(sce.all$orig.ident)

我的问题是？

这两个疾病的患者的PBMC跟正常人的在单细胞转录组水平差异很大吗？有必要这样做吗？

现在呢，基本上每个疾病都是有公开的单细胞数据集，而且很多疾病都是多个数据集，是不是可以做各种各样的联合分析了呢？

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2023-12-18，如有侵权请联系 cloudcommunity@tencent.com 删除

数据挖掘

data