前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >端到端的单细胞管道SCP-标准流程

端到端的单细胞管道SCP-标准流程

作者头像
生信技能树jimmy
发布2023-10-30 15:32:17
2970
发布2023-10-30 15:32:17
举报
文章被收录于专栏:单细胞天地

分享是一种态度

删繁就简三秋树,领异标新二月花

本章介绍SCP中对于单细胞数据的标准处理流程,适用于单样本数据、无批次效应的多样本数据和其他探索性分析等。

  • 主要函数:Standard_SCP;
  • SCP版本:0.5.3;Seurat版本:v4.4.0;

Standard_SCP函数

Standard_SCP是对单细胞数据的标准处理流程。主要参考Seurat标准流程建立的,包括了单细胞数据的标准化、高变异基因(HVF)检测、线性和非线性降维、细胞聚类等步骤。

该流程有以下特点:

  1. 参数简化,直接参数均为各步骤中主要参数,其余参数可通过list递入,具体参数说明请查阅Standard_SCP函数文档[1]。;
  2. 自动化,例如自动检查数据类型、各步骤是否需要进行、自动估计线性降维空间的内在维度(intrinsic dimension)、细胞群编号自动排序等;
  3. 多种线性(pca,ica,nmf,mds,glmpca)或非线性降维方法(umap,tsne,dm,phate,pacmap,trimap,largevis,fr)组合分析;

标准流程示例

下面使用下采样后的小鼠胚胎E15.5天的胰腺上皮单细胞数据进行示例分析,通过在R中运行?pancreas_sub可以查看该示例数据相关信息。

代码语言:javascript
复制
library(SCP)
library(Seurat)
data("pancreas_sub")
pancreas_sub
#> An object of class Seurat 
#> 47874 features across 1000 samples within 3 assays 
#> Active assay: RNA (15958 features, 3467 variable features)
#>  2 other assays present: spliced, unspliced
#>  2 dimensional reductions calculated: PCA, UMAP

默认参数下,Standard_SCP将使用2000个HVF进行分析,线性降维方法选择PCA,利用intrinsicDimension::maxLikGlobalDimEst估计内在维度并进行UMAP非线性降维以及细胞分群等:

代码语言:javascript
复制
pancreas_sub <- Standard_SCP(srt = pancreas_sub)
#> [2023-10-27 06:36:02] Start Standard_SCP
#> [2023-10-27 06:36:02] Checking srtList... ...
#> Data 1/1 of the srtList is raw_counts. Perform NormalizeData(LogNormalize) on the data ...
#> Perform FindVariableFeatures on the data 1/1 of the srtList...
#> Use the separate HVF from srtList...
#> [2023-10-27 06:36:03] Finished checking.
#> [2023-10-27 06:36:03] Perform ScaleData on the data...
#> [2023-10-27 06:36:03] Perform linear dimension reduction (pca) on the data...
#> [2023-10-27 06:36:04] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:36:04] Reorder clusters...
#> [2023-10-27 06:36:05] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:36:12] Standard_SCP done
#> Elapsed time: 9.52 secs
CellDimPlot(pancreas_sub, group.by = c("SubCellType", "Standardclusters"))

返回的Seurat对象中包含了处理后的数据矩阵,默认参数下分析所用的assay是RNA,所以改动的数据主要在pancreas_sub[["RNA"]]中;同时新增分析过程产生的graphs或reductions,其中非线性降维默认返回细胞在2D和3D向量空间的embedding坐标;在meta.data中新增细胞clusters;所有新增的graphs、reductions、clusters的名称前缀默认为Standard,中间生成的reductions名称会附有线性(小写)和非线性降维(大写)的名称,最终的reduction只会保留非线性降维名称:

代码语言:javascript
复制
Graphs(pancreas_sub)
#> [1] "Standardpca_KNN" "Standardpca_SNN"
Reductions(pancreas_sub)
#> [1] "PCA"               "UMAP"              "Standardpca"      
#> [4] "StandardpcaUMAP2D" "StandardpcaUMAP3D" "StandardUMAP2D"   
#> [7] "StandardUMAP3D"
colnames(pancreas_sub@meta.data)
#>  [1] "orig.ident"              "nCount_RNA"             
#>  [3] "nFeature_RNA"            "S_score"                
#>  [5] "G2M_score"               "nCount_spliced"         
#>  [7] "nFeature_spliced"        "nCount_unspliced"       
#>  [9] "nFeature_unspliced"      "CellType"               
#> [11] "SubCellType"             "Phase"                  
#> [13] "Standardpca_SNN_res.0.6" "ident"                  
#> [15] "Standardpcaclusters"     "Standardclusters"

另外,CellDimPlot画图时默认使用DefaultReduction所返回的reduction,它将在每次运行Standard_SCP后更新。

代码语言:javascript
复制
names(pancreas_sub@reductions)
#> [1] "PCA"               "UMAP"              "Standardpca"      
#> [4] "StandardpcaUMAP2D" "StandardpcaUMAP3D" "StandardUMAP2D"   
#> [7] "StandardUMAP3D"
DefaultReduction(pancreas_sub)
#> [1] "StandardUMAP2D"

也可以根据需求更换assay并且修改前缀,以防止覆盖之前的结果。注意,指定assay会改变Seurat对象的默认assay,后面我们将继续使用RNA而非unspliced,所以需要更改回去:

代码语言:javascript
复制
pancreas_sub <- Standard_SCP(srt = pancreas_sub, assay = "unspliced", prefix = "unspliced")
#> [2023-10-27 06:36:13] Start Standard_SCP
#> [2023-10-27 06:36:13] Checking srtList... ...
#> Data 1/1 of the srtList is raw_counts. Perform NormalizeData(LogNormalize) on the data ...
#> Perform FindVariableFeatures on the data 1/1 of the srtList...
#> Use the separate HVF from srtList...
#> [2023-10-27 06:36:14] Finished checking.
#> [2023-10-27 06:36:14] Perform ScaleData on the data...
#> [2023-10-27 06:36:14] Perform linear dimension reduction (pca) on the data...
#> [2023-10-27 06:36:15] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:36:15] Reorder clusters...
#> [2023-10-27 06:36:15] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:36:27] Standard_SCP done
#> Elapsed time: 14.59 secs
DefaultAssay(pancreas_sub)
#> [1] "unspliced"
DefaultAssay(pancreas_sub) <- "RNA"
CellDimPlot(pancreas_sub, group.by = c("SubCellType", "unsplicedclusters"))

分析中常会手动调整所要使用的线性降维维度,例如计算50个PC,使用前30个PC进行非线性降维聚类:

代码语言:javascript
复制
pancreas_sub <- Standard_SCP(
  srt = pancreas_sub, prefix = "PC30",
  linear_reduction = "pca",
  linear_reduction_dims = 50,
  linear_reduction_dims_use = 1:30
)
#> [2023-10-27 06:36:29] Start Standard_SCP
#> [2023-10-27 06:36:29] Checking srtList... ...
#> Data 1/1 of the srtList has been log-normalized.
#> Perform FindVariableFeatures on the data 1/1 of the srtList...
#> Use the separate HVF from srtList...
#> [2023-10-27 06:36:30] Finished checking.
#> [2023-10-27 06:36:30] Perform ScaleData on the data...
#> [2023-10-27 06:36:31] Perform linear dimension reduction (pca) on the data...
#> [2023-10-27 06:36:31] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:36:32] Reorder clusters...
#> [2023-10-27 06:36:32] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:36:40] Standard_SCP done
#> Elapsed time: 10.53 secs
CellDimPlot(pancreas_sub, group.by = c("SubCellType", "PC30clusters"))

如果Seurat对象中已经有了线性降维的结果,我们也可以指定它从而跳过这部分的计算:

代码语言:javascript
复制
pancreas_sub <- Standard_SCP(
  srt = pancreas_sub, prefix = "SKIP",
  linear_reduction = "Standardpca"
)
#> [2023-10-27 06:36:41] Start Standard_SCP
#> [2023-10-27 06:36:41] Checking srtList... ...
#> Data 1/1 of the srtList has been log-normalized.
#> Perform FindVariableFeatures on the data 1/1 of the srtList...
#> Use the separate HVF from srtList...
#> [2023-10-27 06:36:42] Finished checking.
#> [2023-10-27 06:36:42] Perform ScaleData on the data...
#> [2023-10-27 06:36:42] Perform linear dimension reduction (Standardpca) on the data...
#> [2023-10-27 06:36:43] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:36:43] Reorder clusters...
#> [2023-10-27 06:36:43] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:36:54] Standard_SCP done
#> Elapsed time: 13.11 secs
CellDimPlot(pancreas_sub, group.by = c("SubCellType", "SKIPclusters"))

不同的线性+非线性降维方法将直接影响到降维效果和细胞分群,Standard_SCP可以一次进行多种方法的组合,为了避免过多的组合计算,我们分别使用以下组合进行示例分析:

1. 不同的线性降维方法+umap:

代码语言:javascript
复制
linear_reductions <- c("pca", "ica", "nmf", "mds", "glmpca")
pancreas_sub <- Standard_SCP(
  srt = pancreas_sub,
  linear_reduction = linear_reductions,
  nonlinear_reduction = "umap"
)
#> [2023-10-27 06:36:55] Start Standard_SCP
#> [2023-10-27 06:36:55] Checking srtList... ...
#> Data 1/1 of the srtList has been log-normalized.
#> Perform FindVariableFeatures on the data 1/1 of the srtList...
#> Use the separate HVF from srtList...
#> [2023-10-27 06:36:56] Finished checking.
#> [2023-10-27 06:36:56] Perform ScaleData on the data...
#> [2023-10-27 06:36:56] Perform linear dimension reduction (pca) on the data...
#> [2023-10-27 06:36:58] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:36:58] Reorder clusters...
#> [2023-10-27 06:36:58] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:37:06] Perform linear dimension reduction (ica) on the data...
#> [2023-10-27 06:37:09] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:37:09] Reorder clusters...
#> [2023-10-27 06:37:09] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:37:16] Perform linear dimension reduction (nmf) on the data...
#> [2023-10-27 06:37:30] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:37:30] Reorder clusters...
#> [2023-10-27 06:37:31] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:37:39] Perform linear dimension reduction (mds) on the data...
#> [2023-10-27 06:37:42] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:37:42] Reorder clusters...
#> [2023-10-27 06:37:43] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:37:57] Perform linear dimension reduction (glmpca) on the data...
#> [2023-10-27 06:40:20] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:40:20] Reorder clusters...
#> [2023-10-27 06:40:21] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:40:32] Standard_SCP done
#> Elapsed time: 3.61 mins
plist1 <- lapply(linear_reductions, function(lr) {
  CellDimPlot(pancreas_sub,
    group.by = "SubCellType",
    reduction = paste0("Standard", lr, "UMAP2D"),
    xlab = "", ylab = "", title = lr,
    legend.position = "none",
    theme_use = "theme_blank"
  )
})
patchwork::wrap_plots(plotlist = plist1)

2. pca+不同的非线性降维方法:

代码语言:javascript
复制
nonlinear_reductions <- c("umap", "tsne", "dm", "phate", "pacmap", "trimap", "largevis", "fr")
pancreas_sub <- Standard_SCP(
  srt = pancreas_sub,
  linear_reduction = "pca",
  nonlinear_reduction = nonlinear_reductions
)
#> [2023-10-27 06:40:33] Start Standard_SCP
#> [2023-10-27 06:40:33] Checking srtList... ...
#> Data 1/1 of the srtList has been log-normalized.
#> Perform FindVariableFeatures on the data 1/1 of the srtList...
#> Use the separate HVF from srtList...
#> [2023-10-27 06:40:35] Finished checking.
#> [2023-10-27 06:40:35] Perform ScaleData on the data...
#> [2023-10-27 06:40:35] Perform linear dimension reduction (pca) on the data...
#> [2023-10-27 06:40:37] Perform FindClusters (louvain) on the data...
#> [2023-10-27 06:40:37] Reorder clusters...
#> [2023-10-27 06:40:38] Perform nonlinear dimension reduction (umap) on the data...
#> [2023-10-27 06:40:51] Perform nonlinear dimension reduction (tsne) on the data...
#> [2023-10-27 06:41:36] Perform nonlinear dimension reduction (dm) on the data...
#> [2023-10-27 06:41:38] Perform nonlinear dimension reduction (phate) on the data...
#> [2023-10-27 06:42:06] Perform nonlinear dimension reduction (pacmap) on the data...
#> [2023-10-27 06:42:23] Perform nonlinear dimension reduction (trimap) on the data...
#> [2023-10-27 06:42:49] Perform nonlinear dimension reduction (largevis) on the data...
#> [2023-10-27 06:47:58] Perform nonlinear dimension reduction (fr) on the data...
#> [2023-10-27 06:48:02] Standard_SCP done
#> Elapsed time: 7.49 mins
plist2 <- lapply(nonlinear_reductions, function(nr) {
  CellDimPlot(pancreas_sub,
    group.by = "SubCellType",
    reduction = paste0("Standardpca", toupper(nr), "2D"),
    xlab = "", ylab = "", title = nr,
    legend.position = "none",
    theme_use = "theme_blank"
  )
})
patchwork::wrap_plots(plotlist = plist2)

文中资料

[1]

Standard_SCP函数文档: https://zhanghao-njmu.github.io/SCP/reference/Standard_SCP.html

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2023-10-27,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 单细胞天地 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Standard_SCP函数
  • 标准流程示例
    • 1. 不同的线性降维方法+umap:
      • 2. pca+不同的非线性降维方法:
        • 文中资料
        领券
        问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档