文章题目:Deep generative model embedding of single-cell RNA-Seq profiles on hyperspheres and hyperbolic spaces
作者:Jiarui Ding & Aviv Regev(通讯作者大家都熟知),来自Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
链接:https://www.nature.com/articles/s41467-021-22851-4
发表在:Nature Communications
日期:2021-05-05
项目地址:https://github.com/klarman-cell-observatory/scPhere
文章是这么定义这个工具的:
scPhere, a scalable deep generative model to embed cells into low-dimensional hyperspherical or hyperbolic spaces to accurately represent scRNA-seq data
“small” datasets were: (1) a blood cell dataset with only 10 erythroid cell profiles and 2293 CD14+ monocytes; (2) 3314 human lung cells, (3) 1378 mouse white adipose tissue stromal cells, and (4) 1755 human splenic nature killer cells spanning four subtypes
“large” datasets were: (1) 35,699 retinal ganglion cells in 45 cell subsets; and (2) 599,926 cells spanning 102 subsets across 59 human tissues in the Human Cell Landscape
测试数据集:301,749 cells we previously profiled in a complex experimental design from the colon mucosa of 18 patients with ulcerative colitis (UC 溃疡性结肠炎), a major type of inflammatory bowel diseases (IBD 发炎性肠症), and 12 healthy individuals
其中除了病人这个批次信息,还有:
individuals were either healthy or with UC
cells were collected separately from the epithelial
lamina propria fractions of each biopsy
two replicate biopsies for each healthy individual
samples were collected at two time periods, separated by over a year
比较工具:Harmony, LIGER, and Seurat3 CCA(因为后两个只能处理一个批次信息,就选取了不同的人作为批次)
得到结论:scPhere’s batch correction on this complex dataset (30 patients with disease and location factors) performed better than Harmony, Seurat3 CCA, and LIGER based on classification accuracies of cell types for stromal, epithelial, and immune cells
图n和o都是scPhere的两个展示方式,分别是Embedding和Equal Earth map projection,对300,000个stromal, epithelial, and immune cells进行降维。其中还加入了是否患病、疾病类型、患病位点作为批次信息。
when we quantify time continuity, by comparing the k-nearest neighbor time point classification accuracies, accuracies from scPhere (in 2D) were higher than those from t-SNE, UMAP, and PHATE
总结
几大特性:
Accounting for multilevel complex batch effects: ScPhere’s ability to handle complex batch factors is an advantage over previous methods for batch correction (e.g., SAUCIE, scVI, LIGER, Seurat3 CCA, fastMNN, Scanorama, and Conos), which handle only one batch vector.
Especially useful for analyzing large scRNA-seq datasets: does not suffer from “cell-crowding” even with large numbers of input cells; better preserves hierarchical, global structures; forms a reference to annotate new profiled cells from future studies (这一点对大型项目非常有帮助,比如健康群体的Human Cell Atlas项目,疾病群体的Human Tumor Atlas Network项目,都需要构建一个reference map)
modifying it to spatially map cells
未来扩展:
include semi-supervised learning to annotate cell types
imputing missing counts in scRNA-seq data and removing ambient RNA contamination
integrative analysis of multimodal data (e.g. spatial transcriptomics, single-cell ATAC-seq)
learn discrete hierarchical trees for betterd interpreting developmental trajectories