在我们的shiny服务器部署一个RNA-seq下游分析网页工具

生信技能树

发布于 2019-12-23 16:42:56

1.4K0

发布于 2019-12-23 16:42:56

文章被收录于专栏：生信技能树

RNA-seq数据的下游分析网页工具不要太多，这里介绍一个最新的： Sundararajan Z, Knoll R, Hombach P, et al. Shiny-Seq: advanced guided transcriptome analysis[J]. BMC research notes, 2019, 12(1): 432. 很普通的杂志，一般来说，纯粹的工具就是这样的命运！

源代码是公开的，在：https://github.com/szenitha/Shiny-Seq 所以我想着把它安装在我们自己的服务器里面，方便国内的粉丝使用！ 所以大家给我了一个新的外号，宠粉狂魔！

需要安装一些R包

基本上是需要设置镜像的，参考：http://www.bio-info-trainee.com/3727.html 因为我是安装在自己的Ubuntu服务器里面，所以其实还蛮难的，各种报错，斗智斗勇的解决掉了。

rm(list = ls()) 
options()$repos 
options()$BioC_mirror
options(BioC_mirror="https://mirrors.ustc.edu.cn/bioc/")
options("repos" = c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))
options()$repos 
options()$BioC_mirror

if (!requireNamespace("BiocManager", quietly = TRUE))
 install.packages("BiocManager")

BiocManager::install(c("rhdf5","tximport",'DESeq2','clusterProfiler',"org.Hs.eg.db","org.Mm.eg.db","org.Mmu.eg.db","sva","limma","geneplotter",'biomaRt',"pcaGoPromoter","pcaGoPromoter.Mm.mm9","pcaGoPromoter.Hs.hg19","pathview"))

BiocManager::install(c("shiny","shinyBS","shinyjs",'RColorBrewer',"stringr",'formula.tools','data.table','fdrtool',"VennDiagram",'colorspace',"xlsx",'svglite',"visNetwork","V8","ggrepel","ReporteRs","ReporteRsjars"))

install.packages("gplots",dependencies = TRUE)
install.packages("plotly")
install.packages("devtools") 
install.packages("WGCNA") 
devtools::install_github("rstudio/crosstalk",force=TRUE)

下载安装shiny的APP并且启动它

如果你不会shiny，就不用看了，略微有一点复杂。

log目录是 /var/log/shiny-server 只有同属于shiny组的用户才能访问，只有在需要调试代码的时候才需要去查看

默认程序存放在：/srv/shiny-server ，我们的这个网页工具也是

主要是源代码:

mkdir -p /srv/shiny-server/paper
cd /srv/shiny-server/paper
git clone https://github.com/szenitha/Shiny-Seq.git

无需要编译，直接启动

http://49.235.27.111:3838/paper/Shiny-Seq/App/

一般来说，第一次安装肯定会报错，需要排查，比如这个代码里面经常出现文件名大小写混淆的错误，应该是操作系统迁移的问题：

 ls *.R|while read id;do( mv  $id  ${id/.R/.r});done

碰到比较麻烦的事情就重启，基本上重启能解决99%的事情。

 sudo systemctl restart shiny-server

网页工具用法

好奇怪，该文章并没有仔细讲解，而且其GitHub的帮助文档，也很简陋，我解析了一下源代码，发现其输入数据的要求

首先是表达矩阵，一定要是txt的，tab键分割的，第一列是基因的id，第二列往后面是各个样本的表达。

(PS: 上面的表达矩阵，虽然是符合要求的，但是隐藏了一个小意外，因为样本名字里面有减号！！！会在R里面当做列名字被强制性转为点)

然后是临床信息文件，第一列必须的样本名字，必须跟表达矩阵的第一行一模一样哦！！！后面其它列可以是各种各样的信息咯。

sample    group   stage   grade   passages
SCBO.5_orgP10    organoids    T1+CIS  Hg 10
SCBO.5_orgP13    organoids    T1+CIS  Hg 13
SCBO.1_orgP7    organoids    Ta  Hg 7
SCBO.3_orgP4    organoids    T1  Lg/Hg  4
SCBO.6_orgP4    organoids    T1+CIS  Hg 4
SCBO.2_orgP5    organoids    T2  Lg/Hg  5
SCBO.4_orgP10    organoids    T2  Hg 10
SCBO.5_orgP3    organoids    T1+CIS  Hg 3

请注意，临床信息文件的第一列必须的样本名字，必须跟表达矩阵的第一行一模一样哦！！！我这里演示的所有的减号被我全选替换成为了点。

我是因为解析了作者"丑陋"的代码才总结出来的规律：

filepath='~/Downloads/2018-bladder-organoids-counts.txt'
input_exp<-read.csv(filepath, header = TRUE,sep = "\t",check.names = FALSE,quote = "\"")
data<-input_exp[,-1]

#convert entries to integers
data=as.matrix(data)
storage.mode(data)="double"
data= data.frame(data)
rownames(data)=input_exp[,1]

get_pheno<-function(data,pheno)
{
  #Expression table and annotation table should not be null
  if(!is.null(data) && !is.null(pheno))
  {

    #Assuming that the first column of annotation table is sample id ,extract all sample IDs
    sample_id = pheno[,1]
    #print(sample_id)
    #Get all sample IDs from expression table(sample ID refer to column names of expression table)
    exp_sample_id = colnames(data)
    #print('all')
    # print(exp_sample_id)
    #Check if all sample ID in expression table are present in the annotation table
    if (all(exp_sample_id %in% sample_id))
    {
      #print("Yay")

      #set all variables of annotation table as factors
      col<-1:ncol(pheno)
      for (i in col)
      {
        pheno[,i]<-as.factor(pheno[,i])
      }
      #Remove those sample IDs that are present in the expression table but absent in the annotation table from the annotation table
      idx<-NULL
      if(!all(sample_id %in% exp_sample_id))
      {
        idx<-which(!(sample_id %in% exp_sample_id))
        pheno <- pheno[-idx, ]
      }

      pheno<-pheno

    }
    else 
    {
      pheno<-NULL
      #print('nay')

    }
    return(pheno)
  }
}


input_p<-read.csv(filepath2, header = TRUE,sep = "\t",check.names = FALSE,quote = "\"")
pheno<-input_p#[,-1]
print("line")
print(pheno)

v=list()
v$data_wgcna<-data
v$pheno_wgcna<-get_pheno(v$data_wgcna,pheno)

要不是为了测试我网页服务的有效性，我才懒得看这些代码。

一些效果

还是很棒的，至少不需要你自己写代码了，不过就是需要你耗费很多时间来摸索如何使用这个网页工具咯，而且少了很多自定义的快乐。

其它类似的还有

作者也提到了一些工具：shinyngs, START, Degust, Explore DEG, DEBrowser 我也收集整理了部分其它工具：（瞧瞧告诉你，这些工具，我全部部署了，我们慢慢介绍）

Nelson, JW, Sklenar J, Barnes AP, Minnier J. (2016) "The START App: A Web-Based RNAseq Analysis and Visualization Resource." Bioinformatics. doi: 10.1093/bioinformatics/btw624. The app is hosted on Shinyapps.io here: https://kcvi.shinyapps.io/START/
https://github.com/wyguo/ThreeDRNAseq , A shiny App for differential expression and differential alternative splicing analysis
GENAVi (RNA-seq Shiny App), https://github.com/alpreyes/GENAVi
TCC-GUI: a Shiny-based application for differential expression analysis of RNA-Seq count data https://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-019-4179-2 代码在 TCC-GUI GitHub page. 2019. https://github.com/swsoyee/TCC-GUI. Accessed 9 Jan 2019.

居然还有一个网页工具是有中文介绍的：