首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >使用R包genefu来根据基因集进行表达谱分类

使用R包genefu来根据基因集进行表达谱分类

作者头像
生信技能树
发布2018-07-27 14:21:51
2.3K0
发布2018-07-27 14:21:51
举报
文章被收录于专栏:生信技能树生信技能树

学习使用genefu这个包,首先需要安装它!

source("http://bioconductor.org/biocLite.R")
options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/")
biocLite("genefu")

教程略微有点复杂:https://rdrr.io/bioc/genefu/f/inst/doc/genefu.pdf

这个包自带了5个乳腺癌芯片数据集,需要了解他们,可以看我在生信技能树发的教程:https://vip.biotrainee.com/d/689-5

包里面自带的数据集也非常多,可以在https://rdrr.io/bioc/genefu/man/ 查看列表,比较重要的如下:

gene70

Function to compute the 70 genes prognosis profile (GENE70)…

gene76

Function to compute the Relapse Score as published by Wang et…

mod1

Gene modules published in Desmedt et al. 2008

mod2

Gene modules published in Wirapati et al. 2008

scmgene.robust

Subtype Clustering Model using only ESR1, ERBB2 and AURKA…

scmod1.robust

Subtype Clustering Model using ESR1, ERBB2 and AURKA modules…

scmod2.robust

Subtype Clustering Model using ESR1, ERBB2 and AURKA modules…

sig.endoPredict

Signature used to compute the endoPredict signature as…

sig.gene70

Signature used to compute the 70 genes prognosis profile…

sig.gene76

Signature used to compute the Relapse Score (GENE76) as…

sig.genius

Gene Expression progNostic Index Using Subtypes (GENIUS) as…

sig.ggi

Gene expression Grade Index (GGI) as published in Sotiriou et…

sig.oncotypedx

Signature used to compute the OncotypeDX signature as…

pam50

PAM50 classifier for identification of breast cancer…

pik3cags

Function to compute the PIK3CA gene signature (PIK3CA-GS)

上面列出的这些数据集都是可以打开看的:

library(genefu)
data(pam50)
data(pam50.scale)
data(pam50.robust)
data(scmod2.robust)
pam50
str(scmod2.robust, max.level=1)

最重要的功能就是根据已知基因集来对乳腺癌进行分子分型

所有的分型都是用molecular.subtyping函数,预背了很多可以进行乳腺癌进行分子分型的基因集,比如大名鼎鼎的PAM50,下面是演示:

rm(list = ls())
library(breastCancerMAINZ)
library(breastCancerTRANSBIG)
library(breastCancerUPP)
library(breastCancerUNT)
library(breastCancerNKI)
data(transbig)
dd=transbig
cinfo <- colnames(pData(dd))
cinfo
ddata <- t(exprs(dd))
dannot <- featureData(dd)@data
ddemo <- phenoData(dd)@data

library(genefu)
SubtypePredictions<-molecular.subtyping(sbt.model = "scmod2",data = ddata,
                                        annot = dannot,do.mapping = TRUE)
table(SubtypePredictions$subtype)
ddemo$SCMOD2<-SubtypePredictions$subtype

PAM50Preds<-molecular.subtyping(sbt.model = "pam50",data=ddata,
                                annot=dannot,do.mapping=TRUE)

table(PAM50Preds$subtype)
ddemo$PAM50<-PAM50Preds$subtype

分类结果如下:

> table(SubtypePredictions$subtype)
ER+/HER2- High Prolif  ER+/HER2- Low Prolif             ER-/HER2-                 HER2+ 
                   50                    77                    49                    22 

> table(PAM50Preds$subtype)
 Basal   Her2   LumB   LumA Normal 
    45     26     45     78      4 

需要懂这个数据集才能理解,还需要有一点乳腺癌研究背景知识。

成功分类后的信息,就可以用来做生存分析

# http://www.inside-r.org/r-doc/survival/survfit.coxph
library(survival) 
data.for.survival.SCMOD2 <- ddemo[,c("e.os", "t.os", "SCMOD2","age")]
data.for.survival.PAM50 <- ddemo[,c("e.os", "t.os", "PAM50","age")]
# Remove patients with missing survival information
data.for.survival.SCMOD2 <- data.for.survival.SCMOD2[complete.cases(data.for.survival.SCMOD2),]
data.for.survival.PAM50 <- data.for.survival.PAM50[complete.cases(data.for.survival.PAM50),]
days.per.month <- 30.4368
days.per.year <- 365.242
data.for.survival.PAM50$months_to_death <- data.for.survival.PAM50$t.os / days.per.month
data.for.survival.PAM50$vital_status <- data.for.survival.PAM50$e.os == "1"
surv.obj.PAM50 <- survfit(Surv(data.for.survival.PAM50$months_to_death,
                               data.for.survival.PAM50$vital_status) ~ data.for.survival.PAM50$PAM50)
data.for.survival.SCMOD2$months_to_death <- data.for.survival.SCMOD2$t.os / days.per.month
data.for.survival.SCMOD2$vital_status <- data.for.survival.SCMOD2$e.os == "1"
surv.obj.SCMOD2 <- survfit(Surv(
  data.for.survival.SCMOD2$months_to_death,
  data.for.survival.SCMOD2$vital_status) ~ data.for.survival.SCMOD2$SCMOD2)
message("KAPLAN-MEIR CURVE - USING PAM50")
plot(main = "Surival Curves SCMOD2", surv.obj.SCMOD2,
     col =c("#006d2c", "#8856a7","#a50f15", "#08519c"),lty = 1,lwd = 3,
     xlab = "Time (months)",ylab = "Probability of Survival") 
legend("topright",fill = c("#006d2c", "#8856a7","#a50f15", "#08519c"), 
       legend = c("Basal","Her2","LumA","LumB"),bty = "n")

plot(main = "Surival Curves PAM50", surv.obj.PAM50,
     col =c("#006d2c", "#8856a7","#a50f15", "#08519c", "#000000"),lty = 1,lwd = 3,

     xlab = "Time (months)",ylab = "Probability of Survival")
legend("topright",
       fill = c("#006d2c", "#8856a7","#a50f15", "#08519c", "#000000"),
       legend = c("Basal","Her2","LumA","LumB","Normal"),bty = "n")

也可以把这两个分类画在一张图上,或者添加统计学显著指标,你们自己尝试一下哦,我在生信技能树论坛也多次提到过,就不再赘述咯。

提示一下,可以使用 Cross-validated Partial Likelihood (cvpl) 模型来检验两个分类方法的预后判断情况。

更多分类标准

前面我们提到过,这个包最大的优点就是内置了一系列分类指标,如下;

Subtype Clustering Model using just the AURKA gene: scmgene.robust()
Subtype Clustering Model using just the ESR1 gene: scmgene.robust() Subtype Clustering Model using just the ERBB2 gene: scmgene.robust() 
NPI: npi()
GGI: ggi()
GENIUS: genius() 
EndoPredict: endoPredict() 
OncotypeDx: oncotypedx() 
TamR: tamr()
GENE70: gene70() 
PIK3CA: pik3cags()
rorS: rorS()

基本上是,可以想怎么分类,就怎么分类了。

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2018-05-16,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 生信技能树 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 最重要的功能就是根据已知基因集来对乳腺癌进行分子分型
  • 更多分类标准
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档