前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Seurat新版教程:Guided Clustering Tutorial-(下)

Seurat新版教程:Guided Clustering Tutorial-(下)

作者头像
生信技能树jimmy
发布2020-03-30 10:29:07
3.2K0
发布2020-03-30 10:29:07
举报
文章被收录于专栏:单细胞天地单细胞天地

回顾

Seurat新版教程:Guided Clustering Tutorial-(上)

好了,最重要的一步来了,聚类分析。Seurat采用的是graph-based聚类方法,k-means方法在V3中已经不存在了。

聚类

代码语言:javascript
复制
# Cluster the cells 
#Identify clusters of cells by a shared nearest neighbor (SNN) modularity optimization based clustering algorithm. First calculate k-nearest neighbors and construct the SNN graph. Then optimize the modularity function to determine clusters. For a full description of the algorithms, see Waltman and van Eck (2013) The European Physical Journal B. Thanks to Nigel Delaney (evolvedmicrobe@github) for the rewrite of the Java modularity optimizer code in Rcpp!
pbmc <- FindNeighbors(pbmc, dims = 1:10)
pbmc <- FindClusters(pbmc, resolution = 0.5)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 2638
Number of edges: 96033

Running Louvain algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.8720
Number of communities: 9
Elapsed time: 0 seconds

> table(pbmc@active.ident) # 查看每一类有多少个细胞

  0   1   2   3   4   5   6   7   8 
697 483 480 344 271 162 155  32  14 
 # 提取某一类细胞。
> head(subset(as.data.frame(pbmc@active.ident),pbmc@active.ident=="2"))

               pbmc@active.ident
AAACCGTGCTTCCG                 2
AAAGAGACGCGAGA                 2
AAAGCAGATATCGG                 2
AAAGTTTGTAGCGT                 2
AAATGTTGAACGAA                 2
AAATGTTGTGGCAT                 2

提取某一cluster细胞。

代码语言:javascript
复制
> pbmc
An object of class Seurat 
13714 features across 2638 samples within 1 assay 
Active assay: RNA (13714 features)
 1 dimensional reduction calculated: pca
> subpbmc<-subset(x = pbmc,idents="2")
> subpbmc
An object of class Seurat 
13714 features across 480 samples within 1 assay 
Active assay: RNA (13714 features)
 1 dimensional reduction calculated: pca

?WhichCells
head(WhichCells(pbmc,idents="2"))

[1] "AAACCGTGCTTCCG" "AAAGAGACGCGAGA" "AAAGCAGATATCGG" "AAAGTTTGTAGCGT" "AAATGTTGAACGAA" "AAATGTTGTGGCAT"
# Look at cluster IDs of the first 5 cells
head(Idents(pbmc), 5)

AAACATACAACCAC AAACATTGAGCTAC AAACATTGATCAGC AAACCGTGCTTCCG AAACCGTGTATGCG 
             1              3              1              2              6 
Levels: 0 1 2 3 4 5 6 7 8

#提取部分细胞
> pbmc
An object of class Seurat 
32738 features across 2700 samples within 1 assay 
Active assay: RNA (32738 features)
> head(colnames(pbmc@assays$RNA@counts)[1:30])
[1] "AAACATACAACCAC" "AAACATTGAGCTAC" "AAACATTGATCAGC" "AAACCGTGCTTCCG" "AAACCGTGTATGCG" "AAACGCACTGGTAC"
> subbset<-subset(x=pbmc,cells=colnames(pbmc@assays$RNA@counts)[1:30])
> subbset
An object of class Seurat 
32738 features across 30 samples within 1 assay 
Active assay: RNA (32738 features)

当然,我们用的基本都是默认参数,建议?FindClusters一下,看看具体的参数设置,比如虽然是图聚类,但是却有不同的算法,这个要看相应的文献了。

Algorithm for modularity optimization (1 = original Louvain algorithm; 2 = Louvain algorithm with multilevel refinement; 3 = SLM algorithm; 4 = Leiden algorithm). Leiden requires the leidenalg python.

系统发育分析(Phylogenetic Analysis of Identity Classes)

代码语言:javascript
复制
#Constructs a phylogenetic tree relating the 'average' cell from each identity class. 
# Tree is estimated based on a distance matrix constructed in either gene expression space or PCA spac

?BuildClusterTree
pbmc<-BuildClusterTree(pbmc)
Tool(object = pbmc, slot = 'BuildClusterTree')


Phylogenetic tree with 9 tips and 8 internal nodes.

Tip labels:
    0, 1, 2, 3, 4, 5, ...

Rooted; includes branch lengths.

PlotClusterTree(pbmc)

识别低质量样本。

代码语言:javascript
复制
#Calculate the Barcode Distribution Inflection
?CalculateBarcodeInflections
pbmc<-CalculateBarcodeInflections(pbmc)
SubsetByBarcodeInflections(pbmc)

可视化降维

非线性降维——这个目的是为了可视化,而不是特征提取(PCA),虽然它也可以用来做特征提取。

  • UMAP
代码语言:javascript
复制
#Run non-linear dimensional reduction (UMAP/tSNE)
# If you haven't installed UMAP, you can do so via reticulate::py_install(packages =
# 'umap-learn')
pbmc <- RunUMAP(pbmc, dims = 1:10)

UMAP(a=None, angular_rp_forest=False, b=None, init='spectral',
   learning_rate=1.0, local_connectivity=1, metric='correlation',
   metric_kwds=None, min_dist=0.3, n_components=2, n_epochs=None,
   n_neighbors=30, negative_sample_rate=5, random_state=None,
   repulsion_strength=1.0, set_op_mix_ratio=1.0, spread=1.0,
   target_metric='categorical', target_metric_kwds=None,
   target_n_neighbors=-1, target_weight=0.5, transform_queue_size=4.0,
   transform_seed=42, verbose=True)
Construct fuzzy simplicial set
Construct embedding
    completed  0  /  500 epochs
    completed  50  /  500 epochs
    completed  100  /  500 epochs
    completed  150  /  500 epochs
    completed  200  /  500 epochs
    completed  250  /  500 epochs
    completed  300  /  500 epochs
    completed  350  /  500 epochs
    completed  400  /  500 epochs
    completed  450  /  500 epochs
> head(pbmc@reductions$umap@cell.embeddings) # 提取UMAP坐标值。
                    UMAP_1    UMAP_2
AAACATACAACCAC   1.7449461 -2.770269
AAACATTGAGCTAC   4.6293025 12.092374
AAACATTGATCAGC   5.2499804 -2.011440
AAACCGTGCTTCCG -11.9875174 -1.568554
AAACCGTGTATGCG   0.1626114 -8.743275
AAACGCACTGGTAC   2.9192858 -1.487868
  • t-SNE
代码语言:javascript
复制
pbmc <- RunTSNE(pbmc, dims = 1:10)
 head(pbmc@reductions$tsne@cell.embeddings)
                   tSNE_1     tSNE_2
AAACATACAACCAC -11.701490   7.120466
AAACATTGAGCTAC -21.981401 -21.330793
AAACATTGATCAGC  -1.292978  23.822324
AAACCGTGCTTCCG  30.877776  -9.926240
AAACCGTGTATGCG -34.619197   8.773753
AAACGCACTGGTAC  -3.046157  13.013471

比较一下两个可视化的结果。

代码语言:javascript
复制
# note that you can set `label = TRUE` or use the LabelClusters function to help label
# individual clusters
plot1<-DimPlot(pbmc, reduction = "umap",label = TRUE)+scale_color_npg()
plot2<-DimPlot(pbmc, reduction = "tsne",label = TRUE)+scale_color_npg()
CombinePlots(plots = list(plot1, plot2),legend="bottom")

可以看出,两者的降维可视化的结构是一致的,UMAP方法更加紧凑——在降维图上,同一cluster离得更近,不同cluster离得更远,作为一种后来的算法有一定的优点,但是t-SNE结构也能很好地反映cluster的空间结构。

差异分析

代码语言:javascript
复制
# Finding differentially expressed features (cluster biomarkers) 
# find all markers of cluster 1
cluster1.markers <- FindMarkers(pbmc, ident.1 = 1, min.pct = 0.25)
head(cluster1.markers, n = 5)

            p_val avg_logFC pct.1 pct.2    p_val_adj
IL32 1.894810e-92 0.8373872 0.948 0.464 2.598542e-88
LTB  7.953303e-89 0.8921170 0.981 0.642 1.090716e-84
CD3D 1.655937e-70 0.6436286 0.919 0.431 2.270951e-66
IL7R 3.688893e-68 0.8147082 0.747 0.325 5.058947e-64
LDHB 2.292819e-67 0.6253110 0.950 0.613 3.144372e-63

这是一种one-others的差异分析方法,就是cluster1与其余的cluster来做比较,当然这个是可以指定的,参数就是ident.2。

代码语言:javascript
复制
# find all markers distinguishing cluster 5 from clusters 0 and 3
cluster5.markers <- FindMarkers(pbmc, ident.1 = 5, ident.2 = c(0, 3), min.pct = 0.25)
head(cluster5.markers, n = 5)
                     p_val avg_logFC pct.1 pct.2     p_val_adj
FCGR3A        7.583625e-209  2.963144 0.975 0.037 1.040018e-204
IFITM3        2.500844e-199  2.698187 0.975 0.046 3.429657e-195
CFD           1.763722e-195  2.362381 0.938 0.037 2.418768e-191
CD68          4.612171e-192  2.087366 0.926 0.036 6.325132e-188
RP11-290F20.3 1.846215e-188  1.886288 0.840 0.016 2.531900e-184

看看输出结果都是什么。

代码语言:javascript
复制
?FindMarkers 
data.frame with a ranked list of putative markers as rows, 
and associated statistics as columns (p-values, ROC score, etc., depending on the test used (test.use)). 
The following columns are always present:

avg_logFC: log fold-chage of the average expression between the two groups. Positive values indicate that the gene is more highly expressed in the first group
pct.1: The percentage of cells where the gene is detected in the first group
pct.2: The percentage of cells where the gene is detected in the second group
p_val_adj: Adjusted p-value, based on bonferroni correction using all genes in the dataset

我们还可以输出一个总表。

代码语言:javascript
复制
# find markers for every cluster compared to all remaining cells, report only the positive ones
> pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
Calculating cluster 0
   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 08s
Calculating cluster 1
   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 06s
Calculating cluster 2
   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 21s
Calculating cluster 3
   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 07s
Calculating cluster 4
   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 08s
Calculating cluster 5
   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 27s
Calculating cluster 6
   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 21s
Calculating cluster 7
   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 24s
Calculating cluster 8
   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 19s

> head(pbmc.markers)
              p_val avg_logFC pct.1 pct.2     p_val_adj cluster  gene
RPS12 2.008629e-140 0.5029988 1.000 0.991 2.754633e-136       0 RPS12
RPS27 2.624075e-140 0.5020359 0.999 0.992 3.598656e-136       0 RPS27
RPS6  1.280169e-138 0.4673635 1.000 0.995 1.755623e-134       0  RPS6
RPL32 4.358823e-135 0.4242773 0.999 0.995 5.977689e-131       0 RPL32
RPS14 3.618793e-128 0.4283480 1.000 0.994 4.962812e-124       0 RPS14
RPS25 5.612007e-124 0.5203411 0.997 0.975 7.696306e-120       0 RPS25

> pbmc.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)
# A tibble: 18 x 7
# Groups:   cluster [9]
       p_val avg_logFC pct.1 pct.2 p_val_adj cluster gene    
       <dbl>     <dbl> <dbl> <dbl>     <dbl> <fct>   <chr>   
 1 1.96e-107     0.730 0.901 0.594 2.69e-103 0       LDHB    
 2 1.61e- 82     0.922 0.436 0.11  2.20e- 78 0       CCR7    
 3 7.95e- 89     0.892 0.981 0.642 1.09e- 84 1       LTB     
 4 1.85e- 60     0.859 0.422 0.11  2.54e- 56 1       AQP3    
 5 0.            3.86  0.996 0.215 0.        2       S100A9  
 6 0.            3.80  0.975 0.121 0.        2       S100A8  
 7 0.            2.99  0.936 0.041 0.        3       CD79A   
 8 9.48e-271     2.49  0.622 0.022 1.30e-266 3       TCL1A   
 9 2.96e-189     2.12  0.985 0.24  4.06e-185 4       CCL5    
10 2.57e-158     2.05  0.587 0.059 3.52e-154 4       GZMK    
11 3.51e-184     2.30  0.975 0.134 4.82e-180 5       FCGR3A  
12 2.03e-125     2.14  1     0.315 2.78e-121 5       LST1    
13 7.95e-269     3.35  0.961 0.068 1.09e-264 6       GZMB    
14 3.13e-191     3.69  0.961 0.131 4.30e-187 6       GNLY    
15 1.48e-220     2.68  0.812 0.011 2.03e-216 7       FCER1A  
16 1.67e- 21     1.99  1     0.513 2.28e- 17 7       HLA-DPB1
17 7.73e-200     5.02  1     0.01  1.06e-195 8       PF4     
18 3.68e-110     5.94  1     0.024 5.05e-106 8       PPBP    

这里可以注意一下only.pos 参数,可以指定返回positive markers 基因。test.use可以指定检验方法,可选择的有:wilcox,bimod,roc,t,negbinom,poisson,LR,MAST,DESeq2。 cluster间保守conserved marker基因.

代码语言:javascript
复制
#Finds markers that are conserved between the groups
 head(FindConservedMarkers(pbmc, ident.1 = 0, ident.2 = 1, grouping.var = "groups"))
Testing group g2: (0) vs (1)
   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 04s
Testing group g1: (0) vs (1)
   |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 05s
            g2_p_val g2_avg_logFC g2_pct.1 g2_pct.2 g2_p_val_adj     g1_p_val g1_avg_logFC g1_pct.1 g1_pct.2
S100A4  6.687228e-33   -0.8410733    0.678    0.959 9.170864e-29 2.583278e-35   -0.9569347    0.687    0.945
MALAT1  2.380860e-16    0.2957331    1.000    1.000 3.265112e-12 7.501490e-20    0.3201783    1.000    1.000
VIM     1.192054e-14   -0.4921326    0.816    0.951 1.634783e-10 1.193930e-19   -0.4945881    0.798    0.945
ANXA2   3.115304e-13   -0.6406856    0.169    0.461 4.272327e-09 2.186333e-18   -0.7695240    0.164    0.504
ANXA1   5.226901e-18   -0.7544607    0.451    0.800 7.168173e-14 1.413468e-17   -0.6660324    0.507    0.803
S100A11 1.832303e-16   -0.6955104    0.288    0.665 2.512820e-12 1.208765e-17   -0.7757913    0.256    0.584
        g1_p_val_adj     max_pval minimump_p_val
S100A4  3.542707e-31 6.687228e-33   5.166555e-35
MALAT1  1.028754e-15 2.380860e-16   1.500298e-19
VIM     1.637356e-15 1.192054e-14   2.387860e-19
ANXA2   2.998337e-14 3.115304e-13   4.372665e-18
ANXA1   1.938430e-13 1.413468e-17   1.045380e-17
S100A11 1.657700e-13 1.832303e-16   2.417530e-17

绘制基因小提琴图,其中右边的图使用原始的count取log绘制的,其实趋势还是蛮一致的。

代码语言:javascript
复制
plot1<-VlnPlot(pbmc, features = c("MS4A1", "CD79A"))+scale_color_npg()
# you can plot raw counts as well
plot2<- VlnPlot(pbmc, features = c("MS4A1", "CD79A"),ncol=1, same.y.lims=T,slot = "counts", log = TRUE)+scale_color_npg()
CombinePlots(plots = list(plot1, plot2))
代码语言:javascript
复制
plot1<-FeaturePlot(pbmc, features = c("MS4A1", "GNLY", "CD3E", "CD14", "FCER1A", "FCGR3A", "LYZ", "PPBP", 
                               "CD8A"),min.cutoff = 0, max.cutoff = 4)
top10 <- pbmc.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)
plot2<-DoHeatmap(pbmc, features = top10$gene) + NoLegend()+scale_color_npg()
#CombinePlots(plots = list(plot1, plot2))

library(gridExtra) 
grid.arrange(plot1,plot2,ncol = 2, nrow = 1)

gridExtra 也是可以用来组合seurat绘制的图片的,不足为奇,seurat本身用的ggplot2语法。

细胞周期分析

与细胞周期相关的基因。

代码语言:javascript
复制
> cc.genes
$`s.genes`
 [1] "MCM5"     "PCNA"     "TYMS"     "FEN1"     "MCM2"     "MCM4"     "RRM1"     "UNG"      "GINS2"    "MCM6"    
[11] "CDCA7"    "DTL"      "PRIM1"    "UHRF1"    "MLF1IP"   "HELLS"    "RFC2"     "RPA2"     "NASP"     "RAD51AP1"
[21] "GMNN"     "WDR76"    "SLBP"     "CCNE2"    "UBR7"     "POLD3"    "MSH2"     "ATAD2"    "RAD51"    "RRM2"    
[31] "CDC45"    "CDC6"     "EXO1"     "TIPIN"    "DSCC1"    "BLM"      "CASP8AP2" "USP1"     "CLSPN"    "POLA1"   
[41] "CHAF1B"   "BRIP1"    "E2F8"    

$g2m.genes
 [1] "HMGB2"   "CDK1"    "NUSAP1"  "UBE2C"   "BIRC5"   "TPX2"    "TOP2A"   "NDC80"   "CKS2"    "NUF2"    "CKS1B"  
[12] "MKI67"   "TMPO"    "CENPF"   "TACC3"   "FAM64A"  "SMC4"    "CCNB2"   "CKAP2L"  "CKAP2"   "AURKB"   "BUB1"   
[23] "KIF11"   "ANP32E"  "TUBB4B"  "GTSE1"   "KIF20B"  "HJURP"   "CDCA3"   "HN1"     "CDC20"   "TTK"     "CDC25C" 
[34] "KIF2C"   "RANGAP1" "NCAPD2"  "DLGAP5"  "CDCA2"   "CDCA8"   "ECT2"    "KIF23"   "HMMR"    "AURKA"   "PSRC1"  
[45] "ANLN"    "LBR"     "CKAP5"   "CENPE"   "CTCF"    "NEK2"    "G2E3"    "GAS2L3"  "CBX5"    "CENPA"  

#

代码语言:javascript
复制
?CellCycleScoring
## Not run: 
# pbmc_small doesn't have any cell-cycle genes
# To run CellCycleScoring, please use a dataset with cell-cycle genes
# An example is available at http://satijalab.org/seurat/cell_cycle_vignette.html
pbmc <- CellCycleScoring(
  object = pbmc,
  g2m.features = cc.genes$g2m.genes,
  s.features = cc.genes$s.genes
)
head(x = pbmc@meta.data)[,c(7,8,9,10,11)] # 我是为了好看取了几列来看,你大可不必。

               seurat_clusters groups     S.Score   G2M.Score Phase
AAACATACAACCAC               1     g1  0.10502807 -0.04507596     S
AAACATTGAGCTAC               3     g1 -0.02680010 -0.04986215    G1
AAACATTGATCAGC               1     g2 -0.01387173  0.07792135   G2M
AAACCGTGCTTCCG               2     g2  0.04595348  0.01140204     S
AAACCGTGTATGCG               6     g1 -0.02602771  0.04413069   G2M
AAACGCACTGGTAC               1     g2 -0.03692587 -0.08341568    G1

VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt","percent.HB","G2M.Score","S.Score"), ncol = 6)#+scale_color_npg() 

在UMAP空间中绘制细胞周期信息。

代码语言:javascript
复制
umapem<-pbmc@reductions$umap@cell.embeddings
metada= pbmc@meta.data
dim(umapem);dim(metada)

metada$bar<-rownames(metada)
umapem$bar<-rownames(umapem)
ccdata<-merge(umapem,metada,by="bar")
head(ccdata)
library(ggplot2)
plot<-ggplot(ccdata, aes(UMAP_1, UMAP_2,label=Phase))+geom_point(aes(colour = factor(Phase)))+
  #plot<-plot+scale_colour_manual(values=c("#CC33FF","Peru","#660000","#660099","#990033","black","red", "#666600", "green","#6699CC","#339900","#0000FF","#FFFF00","#808080"))+
  labs("@yunlai",x = "", y="") 
plot=plot+scale_color_aaas()  +
  theme_bw()+theme(panel.grid=element_blank(),legend.title=element_blank(),legend.text = element_text(color="black", size = 10, face = "bold"))
plot<-plot+guides(colour = guide_legend(override.aes = list(size=5))) +theme(plot.title = element_text(hjust = 0.5))

plot

探索

我们可以用SingleR或者celaref来定义每一个细胞的类型,而不只是cluster某某某了。其中SingleR与seurat是无缝衔接的,seurat对象可以读到这个里面。这里先不写,假定我们已经知道了各个类群:

代码语言:javascript
复制
# singleR 
#Assigning cell type identity to clusters
new.cluster.ids <- c("Naive CD4 T", "Memory CD4 T", "CD14+ Mono", "B", "CD8 T", "FCGR3A+ Mono", 
                     "NK", "DC", "Platelet")
names(new.cluster.ids) <- levels(pbmc)
pbmc <- RenameIdents(pbmc, new.cluster.ids)
plot1<-DimPlot(pbmc, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()
plot2<-DimPlot(pbmc, reduction = "tsne", label = TRUE, pt.size = 0.5) + NoLegend()
grid.arrange(plot1,plot2,ncol = 2, nrow = 1)

平均表达谱函数AverageExpression

代码语言:javascript
复制
?AverageExpression

AverageExp<-AverageExpression(pbmc,features=unique(top10$gene))
Finished averaging RNA for cluster Naive CD4 T
Finished averaging RNA for cluster Memory CD4 T
Finished averaging RNA for cluster CD14+ Mono
Finished averaging RNA for cluster B
Finished averaging RNA for cluster CD8 T
Finished averaging RNA for cluster FCGR3A+ Mono
Finished averaging RNA for cluster NK
Finished averaging RNA for cluster DC
Finished averaging RNA for cluster Platelet
> typeof(AverageExp)
[1] "list"
> head(AverageExp$RNA)
      Naive CD4 T Memory CD4 T CD14+ Mono          B      CD8 T FCGR3A+ Mono         NK         DC  Platelet
RPS3A   80.173486    61.905672 24.1161656 65.4588054 53.2413307   26.3218430 38.9542301 39.4926725 8.2843982
LDHB    13.697489    13.663320  2.5790368  2.8923463  7.1405019    1.3868494  4.4117033  3.1508971 2.0904628
CCR7     2.984692     1.293789  0.1020109  1.0788038  0.1631841    0.1413904  0.1196927  0.1473077 0.0000000
CD3D    10.724878    11.342980  0.4632153  0.3310177 13.9581981    0.2767605  1.1144081  0.2506665 0.0000000
CD3E     7.320622     7.807142  0.4356602  0.5741410  7.6701063    0.4992164  3.5389591  0.4322447 1.6960081
NOSIP    5.912196     5.196203  1.2965789  0.8373659  2.4063675    2.0503487  2.1314856  1.0916285 0.0804829

library(psych)
library(pheatmap)
coorda<-corr.test(AverageExp$RNA,AverageExp$RNA,method="spearman")
pheatmap(coorda$r)

记得保存数据以便下次使用。

代码语言:javascript
复制
saveRDS(pbmc, file = "D:\\Users\\Administrator\\Desktop\\RStudio\\single_cell\\filtered_gene_bc_matrices\\hg19pbmc_tutorial.rds")

富集分析

有了基因列表其实就可以做富集分析了借助enrichplot等R包可以做出很多好的探索。

代码语言:javascript
复制
require(org.Hs.eg.db)
library(topGO)
library(DOSE)
x=as.list(org.Hs.egALIAS2EG)
geneList<-rep(0,nrow(pbmc))
names(geneList)<-row.names(pbmc)
geneList<-geneList[intersect(names(geneList),names(x))]
newwallgenes=names(geneList)

for (ii in 1:length(geneList)){
  names(geneList)[ii]<-x[[names(geneList)[ii]]][1]

}


gene_erichment_results=list()
for (c1 in as.character(unique(levels(pbmc.markers$cluster)))){
  print(paste0("RUN ", c1))
  testgenes<-subset(pbmc.markers,cluster==c1)$gene
  gene_erichment_results[[c1]]=list()
  testgeneList=geneList
  testgeneList[which(newwallgenes %in% testgenes)]= 1
  #gene_erichment_results=list()
  tab1=c()
  for(ont in c("BP","MF","CC")){
    sampleGOdata<-suppressMessages(new("topGOdata",description="Simple session",ontology=ont,allGenes=as.factor(testgeneList),
                                       nodeSize=10,annot=annFUN.org,mapping="org.Hs.eg.db",ID="entrez"))
    resultTopGO.elim<-suppressMessages(runTest(sampleGOdata,algorithm="elim",statistic="Fisher"))

    resultTopGO.classic<-suppressMessages(runTest(sampleGOdata,algorithm="classic",statistic="Fisher"))
    tab1<-rbind(tab1,GenTable(sampleGOdata,Fisher.elim=resultTopGO.elim,Fisher.classic=resultTopGO.classic,orderBy="Fisher.elim",
                              topNodes=200))
  }
  gene_erichment_results[[c1]][["topGO"]]=tab1
  x<-suppressMessages(enrichDO(gene=names(testgeneList)[testgeneList==1],ont="DO",pvalueCutoff=1,pAdjustMethod="BH",universe=names(testgeneList),
                                minGSSize=5,maxGSSize=500,readable=T))
  gene_erichment_results[[c1]][["DO"]]=x
  dgn<-suppressMessages(enrichDGN(names(testgeneList)[testgeneList==1]))
  gene_erichment_results[[c1]][["DGN"]]=dgn
}


gene_erichment_results[["8"]][["topGO"]][1:5,]

gene_erichment_results[["1"]][["topGO"]][1:5,]
       GO.ID                                        Term Annotated Significant Expected Rank in Fisher.classic Fisher.elim Fisher.classic
1 GO:0019221         cytokine-mediated signaling pathway       516          22     4.15                      3     4.5e-05        1.4e-10
2 GO:0045059            positive thymic T cell selection        11           3     0.09                     55     7.9e-05        8.7e-05
3 GO:0050850 positive regulation of calcium-mediated ...        30           4     0.24                     61     9.1e-05        0.00010
4 GO:0033209 tumor necrosis factor-mediated signaling...        98           6     0.79                     70     0.00013        0.00016
5 GO:0002250                    adaptive immune response       301          11     2.42                     45     0.00040        3.8e-05

可视化

代码语言:javascript
复制
library(enrichplot)
dotplot(gene_erichment_results[[1]][["DGN"]], showCategory=30) 
代码语言:javascript
复制
## categorySize can be scaled by 'pvalue' or 'geneNum'
p1<-cnetplot(gene_erichment_results[[1]][["DGN"]], categorySize="pvalue", foldChange=geneList)
p2<-cnetplot(gene_erichment_results[[1]][["DGN"]], foldChange=geneList, circular = TRUE, colorEdge = TRUE)

plot_grid(p1, p2, ncol=2)
本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2019-08-02,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 单细胞天地 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 聚类
  • 可视化降维
  • 差异分析
  • 细胞周期分析
  • 探索
  • 富集分析
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档