前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >如何确定细胞聚类的PC数

如何确定细胞聚类的PC数

作者头像
生信技能树jimmy
发布2020-03-30 14:33:30
5.7K0
发布2020-03-30 14:33:30
举报
文章被收录于专栏:单细胞天地单细胞天地

作者 | 单细胞天地小编 柠檬不酸

准备

官网上PC数目的确定(https://satijalab.org/seurat/v3.1/pbmc3k_tutorial.html)

代码语言:javascript
复制
library(Seurat)

load(file = 'Cluster_seurat.Rdata') # data.filt
seurat_data <- data.filt

方法一:DimHeatmap函数

代码语言:javascript
复制
# Explore heatmap of PCs
DimHeatmap(seurat_data, dims = 1:6, cells = 500, balanced = TRUE)
代码语言:javascript
复制
DimHeatmap(seurat_data , dims = 7:12, cells = 500, balanced = TRUE)

方法二:ElbowPlot函数

代码语言:javascript
复制
# Plot the elbow plot
ElbowPlot(object = seurat_data ,  ndims = 30)

方法三:JackStrawPlot函数

代码语言:javascript
复制
# Slow slow slow
seurat_data  <- JackStraw(object = seurat_data, dims = 50)
seurat_data  <- ScoreJackStraw(seurat_data, dims = 1:50)
JackStrawPlot(object = seurat_data, dims = 1:50)

上面三种方法只能给出PC数的粗略范围,选择不同PC数目,细胞聚类效果差别较大,因此,需要一个更具体的PC数目。作者提出一个确定PC阈值的三个标准:

  • 主成分累积贡献大于90%
  • PC本身对方差贡献小于5%
  • 两个连续PCs之间差异小于0.1%
代码语言:javascript
复制
# Determine percent of variation associated with each PC
pct <- seurat_data [["pca"]]@stdev / sum( seurat_data [["pca"]]@stdev) * 100


# Calculate cumulative percents for each PC
cumu <- cumsum(pct)


# Determine which PC exhibits cumulative percent greater than 90% and % variation associated with the PC as less than 5
co1 <- which(cumu > 90 & pct < 5)[1]
co1

# Determine the difference between variation of PC and subsequent PC
co2 <- sort(which((pct[1:length(pct) - 1] - pct[2:length(pct)]) > 0.1), decreasing = T)[1] + 1


# last point where change of % of variation is more than 0.1%.
co2

# Minimum of the two calculation
pcs <- min(co1, co2)
pcs

# Create a dataframe with values
plot_df <- data.frame(pct = pct,   cumu = cumu,   rank = 1:length(pct))


# Elbow plot to visualize 
ggplot(plot_df, aes(cumu, pct, label = rank, color = rank > pcs)) + 
  geom_text() + 
  geom_vline(xintercept = 90, color = "grey") + 
  geom_hline(yintercept = min(pct[pct > 5]), color = "grey") +
  theme_bw()

查看PC相关高可变基因。如果我们看到一种罕见细胞类型的已知标记基因的PC数,那么可以选择从1~直到该PC值的所有PC数目。

代码语言:javascript
复制
# Printing out the most variable genes driving PCs
print(x = seurat_data [["pca"]],  dims = 1:25,  nfeatures = 5)
代码语言:javascript
复制
PC_ 1 
Positive:  NEIL1, LTB, KLF2, TP53INP1, CD27 
Negative:  TYMS, MKI67, PCLAF, RRM2, NUSAP1 
PC_ 2 
Positive:  GZMA, ARL4C, PRF1, CST7, GZMM 
Negative:  SLC35E3, ID3, PRDX1, TOP2B, RPLP0 
PC_ 3 
Positive:  HBA2, HBB, HBA1, AHSP, HBD 
Negative:  RPS18, RPL18A, RPS2, RPSA, RPL37A 
PC_ 4 
Positive:  IGLL1, SLC35E3, PCDH9, CD38, F13A1 
Negative:  CCL17, HMBS, BLVRB, AQP1, CD36 
PC_ 5 
Positive:  GYPC, RPS18, RPS2, C1QTNF4, RPL18A 
Negative:  MNDA, LYZ, S100A9, S100A8, FCN1 
PC_ 6 
Positive:  PLK1, CDC20, CENPA, HMMR, CENPE 
Negative:  GINS2, MCM6, HELLS, MCM4, MCM3 
PC_ 7 
Positive:  GYPC, C1QTNF4, LIMS1, NRIP1, S100A9 
Negative:  SPIB, TAGLN2, MS4A1, IGLC6, PTPRC 
PC_ 8 
Positive:  FCGR3A, GZMB, SPON2, KLRF1, MYOM2 
Negative:  CCR7, CD3G, CD3D, IL7R, GPR183 
PC_ 9 
Positive:  CCL17, LTB, TMEM154, CCND2, HSPA12B 
Negative:  ACTG1, LGALS1, IGLL1, CCDC81, TOP2B 
PC_ 10 
Positive:  AHNAK, VIM, EMP1, LMNA, CD27 
Negative:  MT1X, CCL17, FTL, HSP90B1, NSMCE1 
PC_ 11 
Positive:  NEIL1, LTB, FTH1, CFD, CST3 
Negative:  LCN2, RETN, S100A8, LTF, CAMP 
PC_ 12 
Positive:  RPS12, RPLP1, RPL18A, EEF1B2, RPS5 
Negative:  HNRNPU, NCL, AHNAK, AC245060.5, EMP1 
PC_ 13 
Positive:  CD3D, TRAC, CD3G, IGLC6, CD27 
Negative:  MARCH1, MS4A1, BANK1, ADAM28, LINC02397 
PC_ 14 
Positive:  SCIMP, SRGN, GUSB, SHISA2, MARCH1 
Negative:  MS4A1, ZNF608, ENAM, CCND2, CCL17 
PC_ 15 
Positive:  ATF5, HSPA5, PSAT1, PHGDH, MARCH1 
Negative:  NT5E, GIMAP4, TP53INP1, SHISA2, DBI 
PC_ 16 
Positive:  ACSM3, IGLC6, SHISA2, REXO2, MT1X 
Negative:  CD82, GCHFR, PRDX1, UBASH3B, PTGDR 
PC_ 17 
Positive:  MARCKSL1, FTH1, S100A1, CRIP2, EMP2 
Negative:  HSP90B1, HSPA5, UBASH3B, PPIB, FKBP5 
PC_ 18 
Positive:  MARCH1, H3F3A, CALM2, ACTB, PRDX1 
Negative:  HSP90B1, ATF5, HSPA5, MT-ND6, CANX 
PC_ 19 
Positive:  TRGC2, LGALS1, KLRG1, CCL5, PTMS 
Negative:  CCR7, TXK, FCER1G, CD7, TCF7 
PC_ 20 
Positive:  PIM1, SOCS3, ADGRE5, RGCC, EPHA4 
Negative:  LRMP, BANK1, MS4A1, CLEC4E, NME1 
PC_ 21 
Positive:  CCR7, CMTM2, S100A11, LRMP, TXK 
Negative:  TRGC2, RPS12, KLRG1, LCN6, RPS18 
PC_ 22 
Positive:  CTGF, PMAIP1, FOS, KLF6, FOSB 
Negative:  FUT7, SLC9A3R2, LCN6, PPP1R14A, EMP3 
PC_ 23 
Positive:  ATF5, PSAT1, HSP90B1, PHGDH, HSPA5 
Negative:  CTHRC1, NSMCE1, MAP1A, IGLL1, BTNL9 
PC_ 24 
Positive:  SERINC2, LST1, NAMPT, MT1X, SLC25A37 
Negative:  SHISA2, DEPP1, GADD45A, PSTPIP2, CD33 
PC_ 25 
Positive:  CDKN1C, RHOB, BATF3, CX3CR1, SERPINA1 
Negative:  FOS, ALDH2, MGST1, MPO, FOSB 
本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2019-11-01,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 单细胞天地 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 准备
  • 方法一:DimHeatmap函数
  • 方法二:ElbowPlot函数
  • 方法三:JackStrawPlot函数
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档