首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >microRNAs靶基因数据库哪家强

microRNAs靶基因数据库哪家强

作者头像
生信技能树
发布2020-04-21 14:47:13
1.5K0
发布2020-04-21 14:47:13
举报
文章被收录于专栏:生信技能树生信技能树

microRNAs早就不再是科研热点,但毕竟还是遗留下来了不少数据,而且好歹是TCGA计划的多组学中的一环。在自己的研究增加miRNA的角度也是极好的, 通常大家有4个需求:

  • 想知道自己感兴趣的一个或者多个miRNA有哪些靶基因
  • 想知道自己感兴趣的一个或者多个基因由哪些miRNA调控
  • 想知道自己感兴趣的一个或者多个miRNA跟哪些疾病或者药物相关
  • 想知道自己感兴趣的一个或者多个miRNA是否调控自己感兴趣的一个或者多个基因

如果你也有上述需求,那么一个R包推荐给你,发表在Nucleic Acids Res. 2014 Sep的The multiMiR R package and database: integration of microRNA–target interactions along with their disease and drug associations

关于R包的下载安装,我就不多说了:

options(BioC_mirror="https://mirrors.tuna.tsinghua.edu.cn/bioconductor/")
options("repos" = c(CRAN="http://mirrors.cloud.tencent.com/CRAN/"))
options("repos" = c(CRAN="https://mirrors.aliyun.com/CRAN/"))
options(download.file.method = 'libcurl')
options(url.method='libcurl')
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("multiMiR",ask = F,update = F)

安装并且加载multiMiR后,可以看到multiMiR的更新历史:

> library(multiMiR)
> db.ver = multimir_dbInfoVersions()
> db.ver[,1:3]
  VERSION    UPDATED                      RDA
1   2.3.0 2020-04-15 multimir_cutoffs_2.3.rda
2   2.2.0 2017-08-08 multimir_cutoffs_2.2.rda
3   2.1.0 2016-12-22 multimir_cutoffs_2.1.rda
4   2.0.0 2015-05-01     multimir_cutoffs.rda

这也就是我为什么推荐它的原因,首先当然是因为基于R,无需理会讨厌的网页工具,其次,它最近一次更新是2020-04-15 ,疫情如此严重,还坚持更新,值得鼓励!

当然,需要R编程基础从看得懂这个包的用法,有一个学习班推荐给大家:

miRWalk是12个网页工具的集合

如果你确实不喜欢R语言,也不想学,当然也可以使用网页工具哈:

一篇2018年6月的文章利用该miRWalk工具,选择被7个工具预测到的MiRNA–mRNA相互作用关系作为最后的结果。文献标题是:FABP4 as a key determinant of metastatic potential of ovarian cancer,网页工具描述如下:

miRWalk2.0 not only documents miRNA binding sites within the complete sequence of a gene, but also combines this information with a comparison of binding sites resulting from 12 existing miRNA-target prediction programs (DIANA-microTv4.0, DIANA-microT-CDS, miRanda-rel2010, mirBridge, miRDB4.0, miRmap, miRNAMap, doRiNA i.e.,PicTar2, PITA, RNA22v2, RNAhybrid2.1 andTargetscan6.2) to build novel comparative platforms of binding sites for the promoter (4 prediction datasets), cds (5 prediction datasets), 5’- (5 prediction datasets) and 3’-UTR (13 prediction datasets) regions. It also documents experimentally verified miRNA-target interaction information collected via an automated text-mining search and data from existing resources (miRTarBase, PhenomiR,miR2Disease and HMDD) offer such information.

其实还有 miRSystem 整合了其他的预测软件: DIANA, miRanda, miRBridge, PicTar, PITA, rna22和TargetScan,包含TarBase和miRecords的验证数据。

当然了,各取所需,完成科研目标为主!

但是,我们要推荐的multiMiR,有14个数据库源哦。

multiMiR的数据库源头

来自于:http://multimir.org/,数据库的详细网址如下:

                                                                                source_url
1           http://diana.imis.athena-innovation.gr/DianaTools/index.php?r=microT_CDS/index
2                                  http://www.mirz.unibas.ch/miRNAtargetPredictionBulk.php
3                http://www.ebi.ac.uk/enright-srv/microcosm/cgi-bin/targets/v5/download.pl
4                                                               http://www.mir2disease.org
5                                         http://www.microrna.org/microrna/getDownloads.do
6                                                                         http://mirdb.org
7                                                http://mirecords.biolead.org/download.php
8                                       http://mirtarbase.mbc.nctu.edu.tw/php/download.php
9                                       http://www.pharmaco-mir.org/home/download_VERSE_db
10                                             http://mips.helmholtz-muenchen.de/phenomir/
11                                                             http://dorina.mdc-berlin.de
12                                  http://genie.weizmann.ac.il/pubs/mir07/mir07_data.html
13 http://carolina.imis.athena-innovation.gr/diana_tools/web/index.php?r=tarbasev8%2Findex
14               http://www.targetscan.org/cgi-bin/targetscan/data_download.cgi?db=vert_61

收录了常见模式生物,人,小鼠,大鼠的miRNA数据

> db.count
       map_name human_count mouse_count rat_count total_count
1  diana_microt     7664602     3747171         0    11411773
2         elmmo     3959112     1449133    547191     5955436
3     microcosm      762987      534735    353378     1651100
4   mir2disease        2875           0         0        2875
5       miranda     5429955     2379881    247368     8057204
6         mirdb     1990425     1091263    199250     3280938
7     mirecords        2425         449       171        3045
8    mirtarbase      544588       50673       652      595913
9  pharmaco_mir         308           5         0         313
10     phenomir       15138         491         0       15629
11       pictar      404066      302236         0      706302
12         pita     7710936     5163153         0    12874089
13      tarbase      433048      209831      1307      644186
14   targetscan    13906497    10442093         0    24348590

从miRNA到mRNA

查询自己感兴趣的一个miRNA有哪些靶基因

注意,这个时候的miRNA的ID是有规则的哦,miRNA成熟体简写成miR,再根据其物种名称,及被发现的先后顺序加上阿拉伯数字,如hsa-miR-122;高度同源的miRNA在数字后机上英文小写字母(a,b,c,…),如hsa-miR-34a,hsa-miR-34b,hsa-miR-34c等;通常一个miRNA前体长度大约为70~80nt,很可能两个臂分别产生miRNA,则继续在名称之后加上-5p/-3p等,如hsa-miR-122-5p。

所以下面代码里面的例子miRNA的ID是 hsa-miR-18a-3p你应该是明白了的!

# The default is to search validated interactions in human
example1 <- get_multimir(mirna = 'hsa-miR-18a-3p', summary = TRUE)
names(example1)
# Check which types of associations were returned
table(example1@data$type)
# Detailed information of the validated miRNA-target interaction
head(example1@data)
dim(example1@data)
# Which interactions are supported by Luciferase assay?
example1@data[grep("Luciferase", example1@data[, "experiment"]), ]
example1@summary[example1@summary[,"target_symbol"] == "KRAS",]

既然可以查询一个miRNA,当然是可以批量查询多个,示例代码如下,top_miRNAs是差异分析后挑选的miRNA的ID组成的向量:

multimir_results <- get_multimir(org     = 'mmu',
                                 mirna   = top_miRNAs,
                                 table   = 'validated',
                                 summary = TRUE)

从mRNA到miRNA

查询 自己感兴趣的一个或者多个基因由哪些miRNA调控,代码分别如下:

example3 <- get_multimir(org     = "mmu",
                         target  = "Gnb1",
                         table   = "predicted",
                         summary = TRUE,
                         predicted.cutoff      = 35,
                         predicted.cutoff.type = "p",
                         predicted.site        = "all")
names(example3)
table(example3@data$type)
head(example3@data)
head(example3@summary)


apply(example3@summary[, 6:13], 2, function(x) sum(x > 0))


example4 <- get_multimir(org     = 'hsa',
                         target  = c('AKT2', 'CERS6', 'S1PR3', 'SULF2'),
                         table   = 'predicted',
                         summary = TRUE,
                         predicted.cutoff.type = 'n',
                         predicted.cutoff      = 500000)

example4.counts <- addmargins(table(example4@summary[, 2:3]))
example4.counts <- example4.counts[-nrow(example4.counts), ]
example4.counts <- example4.counts[order(example4.counts[, 5], decreasing = TRUE), ]
head(example4.counts)

因为查询的数据集,虽然记录了miRNA和mRNA的关系,但有很多筛选阈值可以选择,就需要熟练掌握数据库源头。

从miRNA到疾病或者药物

主要是数据库记录:

example2 <- get_multimir(disease.drug = 'cisplatin', table = 'disease.drug')
names(example2)
nrow(example2@data)
table(example2@data$type)
head(example2@data)

miRNA集合是否调控mRNA集合

load(url("http://multimir.org/bladder.rda"))

## ----Example5_part2, eval=TRUE, echo=TRUE---------------------------------------------------------
# search all tables & top 10% predictions
example5 <- get_multimir(org     = "hsa",
                         mirna   = DE.miRNA.up,
                         target  = DE.entrez.dn,
                         table   = "all",
                         summary = TRUE,
                         predicted.cutoff.type = "p",
                         predicted.cutoff      = 10,
                         use.tibble = TRUE)

table(example5@data$type)
result <- select(example5, keytype = "type", keys = "validated", columns = columns(example5))
unique_pairs <- 
  result[!duplicated(result[, c("mature_mirna_id", "target_entrez")]), ]

result

## ----Example5_part4, eval=TRUE, echo=TRUE---------------------------------------------------------
mykeytype <- "disease_drug"

mykeys <- keys(example5, keytype = mykeytype)
mykeys <- mykeys[grep("bladder", mykeys, ignore.case = TRUE)]

result <- select(example5, keytype = "disease_drug", keys = mykeys,
                 columns = columns(example5))
result

## ----Example5_part4_fortext, echo=FALSE, include=FALSE, eval=TRUE---------------------------------
unique_pairs <- 
  result[!duplicated(apply(result[, c("mature_mirna_id", "disease_drug")], 2,
                           tolower)), ]

一个示例

下面是使用edgeR包,对普通的转录组counts表达矩阵(miRNA)做差异分析,并且拿到感兴趣的miRNA基因集:

library(edgeR)
library(multiMiR)

# Load data
counts_file  <- system.file("extdata", "counts_table.Rds", package = "multiMiR")
strains_file <- system.file("extdata", "strains_factor.Rds", package = "multiMiR")
counts_table   <- readRDS(counts_file)
strains_factor <- readRDS(strains_file)
table(strains_factor)

# Standard edgeR differential expression analysis
design <- model.matrix(~ strains_factor)

# Using trended dispersions
dge <- DGEList(counts = counts_table)
dge <- calcNormFactors(dge)
dge$samples$strains <- strains_factor
dge <- estimateGLMCommonDisp(dge, design)
dge <- estimateGLMTrendedDisp(dge, design)
dge <- estimateGLMTagwiseDisp(dge, design)

# Fit GLM model for strain effect
fit <- glmFit(dge, design)
lrt <- glmLRT(fit)

# Table of unadjusted p-values (PValue) and FDR values
p_val_DE_edgeR <- topTags(lrt, adjust.method = 'BH', n = Inf)

# Getting top differentially expressed miRNA's
top_miRNAs <- rownames(p_val_DE_edgeR$table)[1:10]

有了感兴趣的miRNA基因集,就可以查询它们的靶基因

library(multiMiR)
# Plug miRNA's into multiMiR and getting validated targets
multimir_results <- get_multimir(org     = 'mmu',
                                 mirna   = top_miRNAs,
                                 table   = 'validated',
                                 summary = TRUE)
head(multimir_results@data)
table(multimir_results@data$mature_mirna_id)
dim(multimir_results@data)

是不是非常方便,有了multiMiR包后!

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2020-04-19,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 生信技能树 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • miRWalk是12个网页工具的集合
  • 从miRNA到mRNA
    • 查询自己感兴趣的一个miRNA有哪些靶基因
    • 从mRNA到miRNA
    • 从miRNA到疾病或者药物
    • miRNA集合是否调控mRNA集合
    • 一个示例
    相关产品与服务
    数据库
    云数据库为企业提供了完善的关系型数据库、非关系型数据库、分析型数据库和数据库生态工具。您可以通过产品选择和组合搭建,轻松实现高可靠、高可用性、高性能等数据库需求。云数据库服务也可大幅减少您的运维工作量,更专注于业务发展,让企业一站式享受数据上云及分布式架构的技术红利!
    领券
    问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档