一、细胞周期
A cell cycle is a series of events that takes place in a cell as it grows and divides.即描述细胞生长、分裂整个过程中细胞变化过程。最重要的两个特点就是DNA复制、分裂成两个一样的子细胞。
如下图,一般分成4个阶段
scran与Seurat包鉴定细胞周期的方法介绍与演示。在单细胞周期分析时,通常只考虑三个阶段:G1、S、G2M。(即把G2和M当做一个phase)
http://bioconductor.org/books/release/OSCA/cell-cycle-assignment.html#motivation-12
scRNAseq包的LunSpikeInData。选这个数据集是因为It is known to contain actively cycling cells after oncogene induction.sce对象的操作,具体之前OSCA的学习笔记,不再细述了)library(scRNAseq)
sce.416b <- LunSpikeInData(which="416b")
sce.416b$block <- factor(sce.416b$block)
#--- gene-annotation 基因ID转换---#
library(AnnotationHub)
ens.mm.v97 <- AnnotationHub()[["AH73905"]]
rowData(sce.416b)$ENSEMBL <- rownames(sce.416b)
rowData(sce.416b)$SYMBOL <- mapIds(ens.mm.v97, keys=rownames(sce.416b),
keytype="GENEID", column="SYMBOL")
#--- normalization 标准化---#
library(scater)
sce.416b <- logNormCounts(sce.416b)
#--- variance-modelling 挑选高变基因---#
dec.416b <- modelGeneVarWithSpikes(sce.416b, "ERCC", block=sce.416b$block)
chosen.hvgs <- getTopHVGs(dec.416b, prop=0.1)
#--- dimensionality-reduction PCA降维---#
sce.416b <- runPCA(sce.416b, ncomponents=10, subset_row=chosen.hvgs)
head(head(colData(sce.416b)))
relabel <- c("onco", "WT")[factor(sce.416b$phenotype)]
plotPCA(sce.416b, colour_by=I(relabel), shape_by=I(relabel))
WT代表wild type phenotype组;onco代表induced CBFB-MYH11 oncogene expression组
cyclone注释cell cyclehttp://bioconductor.org/packages/release/bioc/manuals/scran/man/scran.pdf
cyclone函数是利用‘marker基因对’表达来对细胞所在周期阶段进行预测的方法Scialdone (2015)library(scran)
# hs.pairs <- readRDS(system.file("exdata", "human_cycle_markers.rds", package="scran"))
mm.pairs <- readRDS(system.file("exdata", "mouse_cycle_markers.rds", package="scran"))
str(mm.pairs)
head(mm.pairs$G1)
ENSMUSG00000000001基因表达值是否大于ENSMUSG00000001785基因表达值。cyclone则是通过计算score,即对于某个细胞,符合上述比较关系的marker基因对数占全部marker基因对数的比值。For each cell, cyclone calculates the proportion of all marker pairs where the expression of the first gene is greater than the second in the new data x (pairs with the same expression are ignored). A high proportion suggests that the cell is likely to belong in G1 phase, as the expression ranking in the new data is consistent with that in the training data.
library(scran)
mm.pairs <- readRDS(system.file("exdata", "mouse_cycle_markers.rds", package="scran"))
assignments <- cyclone(sce.416b, pairs=mm.pairs)
#返回的是含有三个元素的list
head(assignments$scores)
注意一点就是默认提供marker基因对是ensemble格式。如果sce提供的是其它类基因ID需要转换一下。
plot(assignments$score$G1, assignments$score$G2M,
xlab="G1 score", ylab="G2/M score", pch=16)
abline(h = 0.5, col='red')
abline(v = 0.5, col='red')
cycle返回list的phases即根据上述规则判断的结果table(assignments$phases)
# G1 G2M S
#105 65 22
test <- t(assignments$scores)
colnames(test)=c(1:192)
library(pheatmap)
pheatmap(test,
show_colnames = F,
annotation_col=data.frame(phase=assignments$phases,
row.names = c(1:192)))
plotPCA(sce.416b, colour_by=I(relabel), shape_by=I(assignments$phases))
CellCycleScoring注释cell cycleCellCycleScoring较scran包cyclone函数最主要的区别是直接根据每个cycle,一组marker基因表达值判断。library(Seurat)
str(cc.genes)
#List of 2
# $ s.genes : chr [1:43] "MCM5" "PCNA" "TYMS" "FEN1" ...
# $ g2m.genes: chr [1:54] "HMGB2" "CDK1" "NUSAP1" "UBE2C" ...
如上是Seurat包提供的人的细胞中分别与S期、G2M期直接相关的marker基因
CellCycleScoring即根据此,对每个细胞的S期、G2M期可能性进行打分;具体如何计算的,暂时在Seurat官方文档中没有提及。在satijalab的github(https://github.com/satijalab/seurat/issues/728)中,作者这样回复类似的提问:As we say in the vignette, the scores are computed using an algorithm developed by Itay Tirosh, when he was a postdoc in the Regev Lab (Science et al., 2016). The gene sets are also taken from his work. You can read the methods section of https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4944528/ to see how the scoring works, but essentially the method scores cells based on the expression of each gene in a signature set - after controlling for the expected expression of genes with similar abundance. (但是PMC4944528链接好像并没有methods section ........)
Thanks. We don't provide different capitalizations because this gene list was developed on a human dataset, and we don't want to create ambiguity by suggesting its created from a mouse reference dataset. In practice however, we've found it works quite well for mouse also, and recommend the solution above.
biomaRt包转换一下convertHumanGeneList <- function(x){
require("biomaRt")
human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
mouse = useMart("ensembl", dataset = "mmusculus_gene_ensembl")
genesV2 = getLDS(attributes = c("hgnc_symbol"), filters = "hgnc_symbol", values = x , mart = human, attributesL = c("mgi_symbol"), martL = mouse, uniqueRows=T)
humanx <- unique(genesV2[, 2])
# Print the first 6 genes found to the screen
print(head(humanx))
return(humanx)
}
m.s.genes <- convertHumanGeneList(cc.genes$s.genes)
m.g2m.genes <- convertHumanGeneList(cc.genes$g2m.genes)
但是
biomaRt包网络不是很稳定,有人也直接提供了转换后的结果(https://github.com/satijalab/seurat/issues/462),直接下载导入到R里即可,下面演示的代码就是用的该结果。
table(is.na(rowData(sce.416b)$SYMBOL))
#FALSE TRUE
#46041 563
rownames(sce.416b) <- uniquifyFeatureNames(rowData(sce.416b)$ENSEMBL,
rowData(sce.416b)$SYMBOL)
#若SYMBOL为NA值,则用对应的ENSEMBL替换
library(Seurat)
seurat.416b <- as.Seurat(sce.416b, counts = "counts", data = "logcounts")
str(mouse_cell_cycle_genes)
#List of 2
# $ s.genes : chr [1:42] "Mcm4" "Exo1" "Slbp" "Gmnn" ...
# $ g2m.genes: chr [1:52] "Nuf2" "Psrc1" "Ncapd2" "Ccnb2" ...
CellCycleScoring周期打分seurat.416b <- CellCycleScoring(seurat.416b,
s.features = mouse_cell_cycle_genes$s.genes,
g2m.features = mouse_cell_cycle_genes$s.genes)
seurat.416b
head(seurat.416b@meta.data[,10:12])
plot(seurat.416b$S.Score,seurat.416b$G2M.Score,
col=factor(seurat.416b$Phase),
main="CellCycleScoring")
legend("topleft",inset=.05,
title = "cell cycle",
c("G1","S","G2M"), pch = c(1),col=c("black","green","red"))
DimPlot(seurat.416b, reduction = "PCA",
group.by = "phenotype",
shape.by = "Phase")
compare1 <- data.frame(scran=assignments$phases,seurat=seurat.416b$Phase)
ct.km <- table(compare1$scran,compare1$seurat)
ct.km
# G1 G2M S
# G1 84 13 9
# G2M 9 29 27
# S 3 3 15
library(flexclust)
randIndex(ct.km)
# ARI
0.3578223
flexclust包的randIndex用于评价两个分类结果的相似性,返回值在-1~1之间。从本例来看,只能说相似性不是很好。