前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >数据整理这一块工作商业公司可能做得更好-人类lncRNA大全

数据整理这一块工作商业公司可能做得更好-人类lncRNA大全

作者头像
生信技能树
发布2018-12-14 10:26:22
6500
发布2018-12-14 10:26:22
举报
文章被收录于专栏:生信技能树生信技能树

有VIP学员咨询我们,该如何整理人类的lncRNA信息做数据挖掘呢?

正好我看到一款商业芯片 Arraystar Human LncRNA Array V4.0 ,上面介绍:

Arraystar Human LncRNA Array V4.0 has a total of 40,173 lncRNAs in two major lncRNA collections, 7,506 for Gold Standard LncRNAs and 32,667 Reliable LncRNAs, from more than 47 Tb worth of RNA-seq data and all major public databases and repositories, such as Refseq, USCS Known Genes, GENCODE, lincRNA catalogs, lncRNAdb, T-UCRs, RNAdb, NRED, and scientific publications.

https://www.arraystar.com/human-lncrna-expression-array-v4-0/

值得大家学习。

https://www.arraystar.com/assets/1/6/Arraystar_Human_LncRNA_Array_V4.0.pdf

LncRNA Transcript Collection

-Arraystar Human LncRNA Microarray V4.0 The set of LncRNAs covered by the Arraystar Human LncRNA Microarray V4.0 is carefully constructed using the most highly-respected public transcriptome databases (Refseq, UCSC knowngenes, Gencode, etc), as well as landmark publications [1-17]. Our LncRNA database is continually being updated to ensure that all the latest annotated LncRNAs are included on the array.

1. RefSeq (Updated Aug 2015)

The Reference Sequence (RefSeq) database maintained by NCBI (http://www.ncbi.nlm.nih.gov/projects/RefSeq/) is a comprehensive and well-annotated collection of genome, transcript, and protein sequences [1]. There are 4,927 human LncRNAs in Refseq as of August 2015, all of which are included on the Arraystar Human LncRNA Microarray V4.0.

2. UCSC Known genes dataset (Known Genes 7)

The UCSC Known genes dataset (http://genome.ucsc.edu/cgi-bin/hgTables) contains predicted genes based on data from RefSeq, Genbank, CCDS and UniProt [2]. After removing small RNAs and other unrelated transcripts, the Arraystar Human LncRNA Microarray V4.0 covers 3,521 LncRNAs from UCSC Known genes.

3. GENCODE version 19

The GENCODE project is a database of annotations of all human protein-coding and noncoding genes using evidence-based gene features [3]. After analyzing the noncoding sequences and removing those unrelated to LncRNAs, we have designed probes to detect 13,332 LncRNAs from GENCODE.

4. LncRNAdb

LncRNAdb (http://lncrnadb.com/) is a database of functional LncRNAs that are connected in one way or another with eukaryotic biological function, including expression patterns, subcellular localization, etc. [6]. 122 human LncRNAs in this database are represented on the Arraystar Human LncRNA Microarray V4.0.

5. NRED

The Noncoding RNA Expression Database (NRED) (http://jsm-research.imb.uq.edu.au/nred/cgi-bin/ncrnadb.pl) includes human and mouse LncRNAs with experimental expression and ancillary information [5]. 645 human LncRNAs from NRED are covered by the Arraystar Human LncRNA Microarray V4.0.

6. RNAdb 2.0

The RNAdb (http://research.imb.uq.edu.au/rnadb/) database is archived by the Mattick group at the Institute for Molecular Bioscience (IMB) [4]. This database contains the legacy sequences and annotations for thousands of non-coding RNAs. 1,318 human LncRNAs from RNAdb are represented on the Arraystar Human LncRNA Microarray V4.0. Page | 2

7. LincRNAs identified by Khalil et al.

Khalil et al. identified and characterized 3,289 large intergenic noncoding RNAs (lincRNAs) by searching for regions of chromatin methylation (H3K4me3 and H3K36me3) outside of known protein-coding loci [7]. By mapping these chromatin state data to transcriptome databases, eliminating all annotated non-lincRNA transcripts (e.g., annotated protein-coding genes, rRNAs and tRNAs), and evaluating their coding potential, 2,193 of the lincRNAs described by Khalil,et al. were included on the Arraystar Human LncRNA Microarray V4.0.

8. LincRNAs identified by Cabili et al.

Cabili et al. defined a reference catalog of more than 8,000 human lincRNA genes using their RNA sequencing results and public database information [8]. 14,353 transcripts expressed from 4,662 stringently-defined human lincRNA genes were identified. 6,969 out of these lincRNA transcripts are covered by the Arraystar Human LncRNA Microarray V4.0.

9. LincRNAs identified by Iyer et al. & Clark et al.

Clark et al. used CaptureSeq to greatly improve RNA-seq coverage and support the identification of 16,453 lncRNA transcripts in 78 tissue samples. Iyer et al. integrated 7,256 RNA-seq data from 25 independent studies, including TCGA, ENCODE and others, to derive 58,648 LncRNAs [17]. 20,142 of these LncRNAs are covered by the Arraystar Human LncRNA Microarray V4.0.

10. Ultraconserved regions encoding LncRNAs (T-UCRs)

Ultraconserved regions (UCRs) are intra- and intergenic sequences greater than 200 nt in length that are 100% identical among humans, mice, and rats. 481 human UCRs were identified by Bejerano et al. [9]. A large fraction of UCRs transcribe a subset of LncRNAs, known as T-UCRs, that are aberrantly expressed in several human cancers. All T-UCRs are represented on the Arraystar human LncRNA Microarray V4.0. To help discover potential non-coding transcripts from these regions, we also designed 962 probes to target both strands of these UCRs (http://users.soe.ucsc.edu/~jill/ultra.html).

11. HOX loci LncRNAs (HOX LncRNAs)

HOX cluster genes are fundamental regulators of pattern and axis formation during animal development. Rinn et al. identified 407 transcribed regions within the four HOX loci in humans (101 HOX gene exons, 75 introns and 231 intergenic ncRNA transcripts) [10]. All of these distinct transcribed regions are targeted by probes on the Arraystar Human LncRNA Microarray V4.0. Furthermore, 68 potential LncRNAs, whose transcript units (TUs) overlap HOX cluster genes on the same or antisense genomic strand, are covered by the Arraystar Human LncRNA Microarray V4.0.

12. LncRNAs with Enhancer-like Function (LncRNA-a)

Using the human GENCODE annotation, Orom et al. identified 3,019 human LncRNAs with enhancer-like function expressed from 2,286 unique genes [11]. LncRNAs with enhancer-like function are included on the Arraystar Human LncRNA Microarray V4.0.

References:

  1. Pruitt K.D. et al. (2014) RefSeq: an update on mammalian reference sequences. Nucleic Acids Res, 2014. 42(Database issue):D756-63
  2. Hsu, F., et al., The UCSC Known Genes. Bioinformatics, 2006. 22(9): p. 1036-46.
  3. Harrow, J., et al., GENCODE: producing a reference annotation for ENCODE. Genome Biol,
  4. 7 Suppl 1: p. S4 1-9.
  5. Pang, K.C., et al., RNAdb 2.0--an expanded database of mammalian non-coding RNAs. Nucleic Acids Res, 2007. 35(Database issue): p. D178-82.
  6. Dinger, M.E., et al., NRED: a database of long noncoding RNA expression. Nucleic Acids Res,
  7. 37(Database issue): p. D122-6.
  8. Quek X.C. et al., lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res. 2015. 43(Database issue):D168-73
  9. Khalil, A.M., et al., Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci U S A, 2009. 106(28): p. 11667-72.
  10. Cabili, M.N., et al., Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev, 2011. 25(18): p. 1915-27.
  11. Bejerano, G., et al., Ultraconserved elements in the human genome. Science, 2004. 304(5675): p. 1321-5.
  12. Rinn, J.L., et al., Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell, 2007. 129(7): p. 1311-23.
  13. Orom, U.A., et al., Long noncoding RNAs with enhancer-like function in human cells. Cell, 2010. 143(1): p. 46-58.
  14. Pang, K.C., et al., RNAdb--a comprehensive mammalian noncoding RNA database. Nucleic Acids Res, 2005. 33(Database issue): p. D125-30.
  15. Mercer, T.R., et al., Specific expression of long noncoding RNAs in the mouse brain. Proc Natl Acad Sci U S A, 2008. 105(2): p. 716-21.
  16. Guttman, M., et al., Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature, 2009. 458(7235): p. 223-7.
  17. Benson, D.A., et al., GenBank: update. Nucleic Acids Res, 2004. 32(Database issue): p. D23-6.
  18. Clark, et al.Quantitative gene profiling of long noncoding RNAs with targeted RNA sequencing. Nat Methods, 2015. 12(4): 339-342.
  19. Iyer, et al. The landscape of long noncoding RNAs in the human transcriptome. Nat Genet 2015. 47(3): 199-208.
本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2018-11-17,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 生信技能树 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • LncRNA Transcript Collection
    • 1. RefSeq (Updated Aug 2015)
      • 2. UCSC Known genes dataset (Known Genes 7)
        • 3. GENCODE version 19
          • 4. LncRNAdb
            • 5. NRED
              • 6. RNAdb 2.0
                • 7. LincRNAs identified by Khalil et al.
                  • 8. LincRNAs identified by Cabili et al.
                    • 9. LincRNAs identified by Iyer et al. & Clark et al.
                      • 10. Ultraconserved regions encoding LncRNAs (T-UCRs)
                        • 11. HOX loci LncRNAs (HOX LncRNAs)
                          • 12. LncRNAs with Enhancer-like Function (LncRNA-a)
                          • References:
                          领券
                          问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档