前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Seurat3教程: 自定义降维方法MDS

Seurat3教程: 自定义降维方法MDS

作者头像
生信技能树jimmy
发布2020-07-01 14:53:52
1.6K0
发布2020-07-01 14:53:52
举报
文章被收录于专栏:单细胞天地单细胞天地

分享是一种态度

Seurat - Dimensional Reduction Vignette

我们知道单细胞转录组数据一个主要的特点就是数据稀疏,维度较高。基于此,Seurat提供了不少降维的方法:

主要是PCA,TSNE,UMAP三种,其实降维方法何其的多:

那么,我们如果想对我们的数据应用其他降维方法,我们需要如何操作呢?今天我们就带大家走一走,Seurat对象的【multi-dimensional scaling (MDS)】降维方法。若要求原始空间中样本之间的距离在低维空间中得以保持,即得到"多维缩放" (Multiple Dimensional Scaling,简称 MDS),基于此,来探究降维的一般方法以及进一步了解Seurat的数据结构。

什么,PCA,TSNE,UMAP我还没搞明白呢?MDS是什么意思?看看运来哥上一段感情经历的笔记啊:

数量生态学笔记||非约束排序|NMDS

Seurat3 中的降维结构

在Seurat v3.0中,存储和与维度缩减信息的交互已经被一般化并正式化为DimReduc对象。每个维度缩减过程作为一个命名列表的元素存储在object@slot中的DimReduc对象中。访问这些缩减可以通过[[操作符调用所需的缩减的名称来完成。例如,在使用RunPCA运行主成分分析之后,object[['pca']]将包含pca的结果。通过向列表中添加新元素,用户可以添加额外的、自定义的维度缩减。每个存储的维度缩减包含以下slot:

  • cell.embeddings: stores the coordinates for each cell in low-dimensional space.
  • feature.loadings: stores the weight for each feature along each dimension of the embedding
  • feature.loadings.projected:Seurat typically calculate the dimensional reduction on a subset of genes (for example, high-variance genes), and then project that structure onto the entire dataset (all genes). The results of that projection (calculated with ProjectDim ) are stored in this slot. Note that the cell loadings will remain unchanged after projection but there are now feature loadings for all feature
  • stdev: The standard deviations of each dimension. Most often used with PCA (storing the square roots of the eigenvalues of the covariance matrix) and can be useful when looking at the drop off in the amount of variance that is explained by each successive dimension.
  • key: Sets the column names for the cell.embeddings and feature.loadings matrices. For example, for PCA, the column names are PC1, PC2, etc., so the key is “PC”.
  • jackstraw: Stores the results of the jackstraw procedure run using this dimensional reduction technique. Currently supported only for PCA.
  • misc: Bonus slot to store any other information you might want

为了访问这些插槽,我们提供了EmbeddingsLoadingsStdev函数:

代码语言:javascript
复制
library(Seurat)
pbmc_small[["pca"]]

A dimensional reduction object with key PC_ 
 Number of dimensions: 19 
 Projected dimensional reduction calculated:  TRUE 
 Jackstraw run: TRUE 
 Computed using assay: RNA

我们用相应的函数方法来查看一下啊

代码语言:javascript
复制
> head(Embeddings(pbmc_small, reduction = "pca")[, 1:5])  # 细胞  PCA坐标值
                      PC_1       PC_2       PC_3      PC_4       PC_5
ATGCCAGAACGACT -0.77403708 -0.8996461 -0.2493078 0.5585948  0.4650838
CATGGCCTGTGCAT -0.02602702 -0.3466795  0.6651668 0.4182900  0.5853204
GAACCTGATGAACC -0.45650250  0.1795811  1.3175907 2.0137210 -0.4818851
TGACTGGATTCTCA -0.81163243 -1.3795340 -1.0019320 0.1390503 -1.5982232
AGTCAGACTGCACA -0.77403708 -0.8996461 -0.2493078 0.5585948  0.4650838
TCTGATACACGTGT -0.77403708 -0.8996461 -0.2493078 0.5585948  0.4650838
> head(Loadings(pbmc_small, reduction = "pca")[, 1:5])  # 基因在每个主成分中的loading值
              PC_1        PC_2        PC_3        PC_4         PC_5
PPBP    0.33832535  0.04095778  0.02926261  0.03111034 -0.090420744
IGLL5  -0.03504289  0.05815335 -0.29906272  0.54744454  0.214603428
VDAC3   0.11990482 -0.10994433 -0.02386025  0.06015126 -0.809207588
CD1C   -0.04690284  0.19835522 -0.35090617 -0.51112169 -0.130306281
AKR1C3 -0.03894635 -0.42880452  0.08845847 -0.27274386  0.087791646
PF4     0.34392057  0.02474860 -0.02519515 -0.01231411 -0.006725932
> head(Stdev(pbmc_small, reduction = "pca"))  # 标准差
[1] 2.7868782 1.6145733 1.3162945 1.1241143 1.0347596 0.9876531

Seurat提供了RunPCA (pca)和RunTSNE (tsne),并表示了通常应用于scRNA-seq数据的降维技术。当使用这些功能时,所有插槽都会自动填充。

我们还允许用户添加单独计算的自定义维缩减技术的结果(例如,多维缩放(MDS)或零膨胀因子分析)。您所需要的只是一个矩阵,其中包含低维空间中每个单元的坐标,如下所示.

存储自定义维度缩减计算

Classical (Metric) Multidimensional Scaling Classical multidimensional scaling (MDS) of a data matrix. Also known as principal coordinates analysis (Gower, 1966).

虽然不是作为Seurat包的一部分,但它很容易在r中运行多维缩放(MDS)。如果你有兴趣运行MDS并将输出存储在Seurat对象中:

代码语言:javascript
复制
# Before running MDS, we first calculate a distance matrix between all pairs of cells.  Here we
# use a simple euclidean distance metric on all genes, using scale.data as input
d <- dist(t(GetAssayData(pbmc_small, slot = "scale.data")))
# Run the MDS procedure, k determines the number of dimensions
mds <- cmdscale(d = d, k = 2)

head(mds)
                     [,1]       [,2]
ATGCCAGAACGACT 0.77403708 -0.8996461
CATGGCCTGTGCAT 0.02602702 -0.3466795
GAACCTGATGAACC 0.45650250  0.1795811
TGACTGGATTCTCA 0.81163243 -1.3795340
AGTCAGACTGCACA 0.77403708 -0.8996461
TCTGATACACGTGT 0.77403708 -0.8996461
代码语言:javascript
复制
# cmdscale returns the cell embeddings, we first label the columns to ensure downstream
# consistency
colnames(mds) <- paste0("MDS_", 1:2)
# We will now store this as a custom dimensional reduction called 'mds'
pbmc_small[["mds"]] <- CreateDimReducObject(embeddings = mds, key = "MDS_", assay = DefaultAssay(pbmc_small))

pbmc_small
An object of class Seurat 
230 features across 80 samples within 1 assay 
Active assay: RNA (230 features)
 3 dimensional reductions calculated: pca, tsne, mds

我们的对象中已经有了mds这个slot了,下面我们像pca , tsne. umap,那样可视化它:

代码语言:javascript
复制
# We can now use this as you would any other dimensional reduction in all downstream functions
DimPlot(pbmc_small, reduction = "mds", pt.size = 0.5)
代码语言:javascript
复制
pbmc_small <- ProjectDim(pbmc_small, reduction = "mds")
MDS_ 1 
Positive:  HLA-DPB1, HLA-DQA1, S100A9, S100A8, GNLY, RP11-290F20.3, CD1C, AKR1C3, IGLL5, VDAC3 
       PARVB, RUFY1, PGRMC1, MYL9, TREML1, CA2, TUBB1, PPBP, PF4, SDPR 
Negative:  SDPR, PF4, PPBP, TUBB1, CA2, TREML1, MYL9, PGRMC1, RUFY1, PARVB 
       VDAC3, IGLL5, AKR1C3, CD1C, RP11-290F20.3, GNLY, S100A8, S100A9, HLA-DQA1, HLA-DPB1 
MDS_ 2 
Positive:  HLA-DPB1, HLA-DQA1, S100A8, S100A9, CD1C, RP11-290F20.3, PARVB, IGLL5, MYL9, SDPR 
       PPBP, CA2, RUFY1, TREML1, PF4, TUBB1, PGRMC1, VDAC3, AKR1C3, GNLY 
Negative:  GNLY, AKR1C3, VDAC3, PGRMC1, TUBB1, PF4, TREML1, RUFY1, CA2, PPBP 
       SDPR, MYL9, IGLL5, PARVB, RP11-290F20.3, CD1C, S100A9, S100A8, HLA-DQA1, HLA-DPB1 
Warning message:
In print.DimReduc(x = redeuc, dims = dims.print, nfeatures = nfeatures.print,  :
  Only 2 dimensions have been computed.
代码语言:javascript
复制
# Display the results as a heatmap
DimHeatmap(pbmc_small, reduction = "mds", dims = 1, cells = 500, projected = TRUE, balanced = TRUE)
代码语言:javascript
复制
VlnPlot(pbmc_small, features = "MDS_1")

查看MDS1维度如何与PC1维度相关性:

代码语言:javascript
复制
# See how the first MDS dimension is correlated with the first PC dimension
FeatureScatter(pbmc_small, feature1 = "MDS_1", feature2 = "PC_1")
代码语言:javascript
复制
FeatureScatter(pbmc_small, feature1 = "MDS_1", feature2 = "tSNE_1")

References

[1] 数量生态学笔记||非约束排序|NMDS: https://www.jianshu.com/p/39021ec7d1dd [2] Dimensional Reduction Vignette: https://links.jianshu.com/go?to=https%3A%2F%2Fsatijalab.org%2Fseurat%2Fv3.0%2Fdim_reduction_vignette.html

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2020-06-29,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 单细胞天地 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Seurat3 中的降维结构
  • 存储自定义维度缩减计算
  • References
相关产品与服务
对象存储
对象存储(Cloud Object Storage,COS)是由腾讯云推出的无目录层次结构、无数据格式限制,可容纳海量数据且支持 HTTP/HTTPS 协议访问的分布式存储服务。腾讯云 COS 的存储桶空间无容量上限,无需分区管理,适用于 CDN 数据分发、数据万象处理或大数据计算与分析的数据湖等多种场景。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档