前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >整合单细胞和空转数据多种方法之R包semla

整合单细胞和空转数据多种方法之R包semla

作者头像
生信菜鸟团
发布2024-04-25 18:26:40
810
发布2024-04-25 18:26:40
举报
文章被收录于专栏:生信菜鸟团生信菜鸟团

我注意到这个R包也提供了一个整合单细胞和空转数据的算法NNLS(https://ludvigla.github.io/semla/articles/cell_type_mapping_with_NNLS.html)。因此,本文对这个算法进行测试。

一 . NNLS代码实战

Step0.R包安装

github在https://github.com/ludvigla/semla

官方教程在:https://ludvigla.github.io/semla/articles/cell_type_mapping_with_NNLS.html

安装:

代码语言:javascript
复制
remotes::install_github("ludvigla/semla")
Step1.加载示例数据
代码语言:javascript
复制
library(readr)
library(semla)
library(tibble)
library(dplyr)
library(Seurat)
library(purrr)
library(patchwork)
library(ggplot2)
#install.packages('RcppML')  
library(RcppML)
代码语言:javascript
复制
# 空转
brain_st_cortex <- read_rds("./Rawdata/brain_st_cortex.rds")

# 单细胞
brain_sc <- read_rds("./Rawdata/brain_sc.rds")

## Rename the cells/spots with syntactically valid names
brain_st_cortex <- RenameCells(brain_st_cortex, new.names=make.names(Cells(brain_st_cortex)))
brain_sc <- RenameCells(brain_sc, new.names=make.names(Cells(brain_sc)))

可视化空转数据:

代码语言:javascript
复制
## Visualize the ST data
SpatialDimPlot(brain_st_cortex)

image-20231105144510038

可视化单细胞数据:

代码语言:javascript
复制
## Visualize the scRNA-seq data
DimPlot(brain_sc, label = T, label.size = 4.5, group.by = "cell_type")

image-20231105144526583

Step2. 标准流程

这里,我们将对10x Visium数据和单细胞数据应用相同的对数归一化处理流程。这里将高变基因的数量设置得非常高,因为稍后我们将对单细胞数据中的高变基因与10x Visium数据中的高变基因取交集。

由于NNLS算法运行非常快,因此不需要像常规处理那样只取2000个高变基因。相反,我们可以使用在单细胞和10x Visium数据中存在交集的所有基因。

代码语言:javascript
复制
# Normalize data and find variable features for Visium data
brain_st_cortex <- brain_st_cortex |>
  NormalizeData() |>
  FindVariableFeatures(nfeatures = 10000)

# Normalize data and run vanilla analysis to create UMAP embedding
brain_sc <- brain_sc |>
  NormalizeData() |>
  FindVariableFeatures() |> 
  ScaleData() |> 
  RunPCA() |> 
  RunUMAP(reduction = "pca", dims = 1:30)

# Rerun FindVariableFeatures to increase the number before cell type deconvolution
brain_sc <- brain_sc |> 
  FindVariableFeatures(nfeatures = 10000)
Step3. Run NNLS

使用RunNNLS()函数,输入单细胞数据和空转数据,指定单细胞细胞类型的标签名称,即可运行NNLS反卷积:

代码语言:javascript
复制
DefaultAssay(brain_st_cortex) <- "Spatial"

ti <- Sys.time()
brain_st_cortex <- RunNNLS(object = brain_st_cortex, 
                            singlecell_object = brain_sc, 
                            groups = "cell_type",
                           minCells_per_celltype = 10)
# ── Predicting cell type proportions ──
# 
# ℹ Fetching data from Seurat objects
# →   Filtering out features that are only present in one data set
# →   Kept 2961 features for deconvolution
# ℹ Preparing data for NNLS
# →   Downsampling scRNA-seq data to include a maximum of 50 cells per cell type
# →   Kept 15 cell types after filtering
# →   Calculating cell type expression profiles
# ℹ Predicting cell type proportions with NNLS for 15 cell types
# ℹ Returning results in a new 'Assay' named 'celltypeprops'
# ℹ Setting default assay to 'celltypeprops'
# ✔ Finished
sprintf("RunNNLS completed in %s seconds", round(Sys.time() - ti, digits = 2))
#"RunNNLS completed in 0.91 seconds"

1秒钟不到就运行结束了。

注意:这个函数默认会过滤掉小于10个细胞的细胞类型,我们可以指定minCells_per_celltype = 参数进行设置。

代码语言:javascript
复制
# Check available cell types
rownames(brain_st_cortex)
# [1] "Astro"   "Endo"    "L2/3 IT" "L4"      "L5 IT"   "L5 PT"   "L6 CT"   "L6 IT"   "L6b"     "Lamp5"   "NP"      "Pvalb"  
# [13] "Sncg"    "Sst"     "Vip" 
Step4. 可视化
代码语言:javascript
复制
# Plot selected cell types
DefaultAssay(se_brain_spatial) <- "celltypeprops"

selected_celltypes <- c('L6 CT', 'L5 IT', 'L6b', 'L2/3 IT', 'L4', 'Astro')
plots <- lapply(seq_along(selected_celltypes), function(i) {
  SpatialFeaturePlot(brain_st_cortex,features = selected_celltypes[1]) +
    scale_fill_gradientn(colours = RColorBrewer::brewer.pal(n = 9, name = "Spectral") %>% rev())+
  plot_layout(guides = "collect") & 
  theme(legend.position = "right", legend.margin = margin(b = 50),
        legend.text = element_text(angle = 0),
        plot.title = element_blank())
  }) %>%  setNames(nm = selected_celltypes)

wrap_plots(plots, ncol=4)

image-20240419182522039

这里我们对比一下cell2location的结果【整合单细胞和空转数据多种方法之Cell2location】:

image-20231228230033246

就肉眼来看,结果还是挺一致的。

这里使用NNLS和cell2location的结果做一个基于spot的相关性分析:

代码语言:javascript
复制
library(ggpubr)
cell2locat_res = read.csv("./Rawdata/st_cell2location_res.csv", row.names = 1, check.names = F)
head(cell2locat_res)
table(row.names(cell2locat_res) == colnames(brain_st_cortex))
# TRUE 
# 1075 
  • 例如L6 CT细胞:
代码语言:javascript
复制
input.plot = data.frame(L6CT_cell2locat = cell2locat_res$`q05cell_abundance_w_sf_L6 CT`,
                        L6CT_NNSL = as.numeric(brain_st_cortex@assays$celltypeprops["L6 CT",]*100))

ggscatter(input.plot, cor.method = "p",
          x = "L6CT_cell2locat", y = "L6CT_NNSL",
          title = "L6 CT cells",
          add = "reg.line", conf.int = TRUE, 
          add.params = list(fill = "lightgray"))+
  stat_cor(method = "p",
           label.sep = "\n",
           label.y.npc = "top",
           label.x = 10, 
           label.y = 1)

image-20240419203539900

  • L2/3 IT细胞:
代码语言:javascript
复制
input.plot = data.frame(L23ICT_cell2locat = cell2locat_res$`q05cell_abundance_w_sf_L2/3 IT`,
                        L23ICT_NNSL = as.numeric(brain_st_cortex@assays$celltypeprops["L2/3 IT",]* 100))

ggscatter(input.plot, cor.method = "p",
          x = "L23ICT_cell2locat", y = "L23ICT_NNSL",
          title = "L2/3 IT cells",
          add = "reg.line", conf.int = TRUE, 
          add.params = list(fill = "lightgray"))+
  stat_cor(method = "p",
           label.sep = "\n",
           label.y.npc = "top",
           label.x = 20, 
           label.y = 2)

image-20240419203624188

  • L4细胞:
代码语言:javascript
复制
input.plot = data.frame(L4_cell2locat = cell2locat_res$q05cell_abundance_w_sf_L4,
                        L4_NNSL = as.numeric(brain_st_cortex@assays$celltypeprops["L4",]*100))

ggscatter(input.plot, cor.method = "p",
          x = "L4_cell2locat", y = "L4_NNSL",
          title = "L4 cells",
          add = "reg.line", conf.int = TRUE, 
          add.params = list(fill = "lightgray"))+
  stat_cor(method = "p",
           label.sep = "\n",
           label.y.npc = "top",
           label.x = 16, 
           label.y = 1)

image-20240419203656828

  • .丰度较低的星形胶质细胞:
代码语言:javascript
复制
library(ggpubr)
input.plot = data.frame(Astro_cell2locat = cell2locat_res$q05cell_abundance_w_sf_Astro,
                        Astro_NNSL = as.numeric(brain_st_cortex@assays$celltypeprops["Astro",]*100))

ggscatter(input.plot, cor.method = "p",
          x = "Astro_cell2locat", y = "Astro_NNSL",
          title = "Astro cells",
          add = "reg.line", conf.int = TRUE, 
          add.params = list(fill = "lightgray"))+
  stat_cor(method = "p",
           label.sep = "\n",
           label.y.npc = "top",
           label.x = 7, 
           label.y = 1)

image-20240419203731108

尽管两种算法的相关性非常好,但是可以看到NNSL计算得到的每种细胞类型的结果区间(区间比较大)均大于cell2location算法的结果区间。

我们还可以使用semla的内置函数MapMultipleFeatures()在同一张slide里可视化多种细胞类型的分布情况(cell2location也有类似功能),但是需要先将seurat对象转换为semla需要的对象,这点我在【空转可视化R包semla(一)入门介绍】介绍过:

代码语言:javascript
复制
semla.data <- UpdateSeuratForSemla(brain_st_cortex)
# Plot multiple features
MapMultipleFeatures(semla.data, 
                    pt_size = 2, max_cutoff = 0.99,
                    override_plot_dims = TRUE, 
                    colors = c("#332288", "#88CCEE", "#44AA99", "#117733", "#DDCC77", "#CC6677"),
                    features = selected_celltypes) +
  plot_layout(guides = "collect")

image-20240419183205407

semla还提供了细胞类型的共定位分析可视化函数。通过计算不同细胞类型之间的成对相关性,我们可以了解哪些细胞类型经常出现在相同的spot上:

代码语言:javascript
复制
cor_matrix <- FetchData(semla.data, selected_celltypes) |> 
  mutate_all(~ if_else(.x<0.1, 0, .x)) |>  # Filter lowest values (-> set as 0)
  cor()

diag(cor_matrix) <- NA
max_val <- max(cor_matrix, na.rm = T)
cols <- RColorBrewer::brewer.pal(7, "RdYlBu") |> rev(); cols[4] <- "white"
pheatmap::pheatmap(cor_matrix, 
                   breaks = seq(-max_val, max_val, length.out = 100),
                   color=colorRampPalette(cols)(100),
                   cellwidth = 14, cellheight = 14, 
                   treeheight_col = 10, treeheight_row = 10, 
                   main = "Cell type correlation\nwithin spots")

image-20240419183405325

三. 总结

总的来说,NNLS解卷积算法运行速度非常快。R包semla还提供了一些额外的可视化方案,但NNLS解卷积算法的精确度尚待进一步探讨(特别是需要和其他解卷积算法进行benchmark分析)。

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2024-04-19,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 生信菜鸟团 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 一 . NNLS代码实战
    • Step0.R包安装
      • Step1.加载示例数据
        • Step2. 标准流程
          • Step3. Run NNLS
            • Step4. 可视化
            • 三. 总结
            领券
            问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档