专栏首页生信菜鸟团Microbiome Discovery 宏基因组入门课程

Microbiome Discovery 宏基因组入门课程

偶然间在 youtube 上看到 Dan Knights 的 Microbiome Discovery 宏基因组入门课程,大致浏览了一下,由浅入深,从理论到实践讲得非常不错,真是相见恨晚 QAQ,只看这个应该完全足够入门宏基因组了~

课程播放列表:https://www.youtube.com/playlist?list=PLOPiWVjg6aTzsA53N19YqJQeZpSCH9QPc

RMarkdown 示例数据及实践代码:https://github.com/danknights/mice8992-2016

视频目录

1. Intro to the Microbiome

•介绍微生物组•如何进行研究•面对的一些挑战(微生物组数据相对不稳定,biomarker discovery)

网址 https://youtu.be/6564K4-_DBI

2. How microbiome data are generated

•如何产生这些数据的•两种测序方法的优劣•宏基因组测序•扩增子测序

网址 https://youtu.be/FWT1HBzlWOE

3. 16S Variable Regions

•为什么选择 16S 片段,16S rRNA 的结构功能•OTU 从何而来

网址 https://youtu.be/8Aa_mnyXm70

4. QIIME

•QIIME 分析流程介绍

网址 https://youtu.be/iy0JWgzmM_A

4.5. (Optional) UNIX Command Line

•UNIX 命令介绍以及 Git 的使用

网址 https://youtu.be/u2IQQUMeWy8

5. Picking OTUs

•OTU 聚类方法•closed reference•de novo•UCLUST•CD-HIT•SUMACLUST•mothur•SWARM•open reference

网址 https://youtu.be/Ok5h24KZbAE

6. Assigning Taxonomy

•如何注释菌群分类•The Random Forests classifier seems to work better•Nearest neighbor using optimal gapped alignment with large reference databases will probably win eventually

网址 https://youtu.be/HkwFdzFLZ0I

7. Alpha Diversity

•Alpha diversity measures diversity within communities•Beta diversity measures diversity between communities•Rarefaction determines saturation•There is room for experimental validation•不同计算 Alpha Diversity 的方法•species count•phylogenetic diversity (PD)•Chao1 Estimator

网址 https://youtu.be/9ZvoR89HYP8

8. Beta Diversity

•Beta diversity measures diversity between communities•不同 Beta Diversity 的计算方法•euclidean distance•Chi-square distance, Chi-square is usually best for gradients•Bray-Curtis•Most people use Bray Curtis or UniFrac•用 PCoA 可视化

网址 https://youtu.be/lcbp6EecDg4

9. UniFrac

•Beta diversity using UniFrac

网址 https://youtu.be/M8ylvsS0MHg

10. Statistical testing part 1

•统计学基础•Linear models are not always appropriate•Non-parametric tests (no distribution assumptions)•Generalized linear models(better underlying distributions)

网址 https://youtu.be/_uDv7LRUUsY

11. Statistical testing part 2

•统计学基础•t-test:Compare 2 groups•ANOVA:Compare three or more groups•Correlation:Compare to a continuous variable (e.g.Age)•Linear Regression:Similar to correlation,but you can regress on multiple variables at the same time•NOTE:all of these assume normal distributions!•When linear regression tests do not have normally distributed residuals,use a generalized linear model with the negative binomial distribution.This is in the edgeR package in R.•Use false discovery rate (FDR) to correct for multiple hypothesis testing.•If you don't need to control for confounders, non-parametric tests are very safe (although lower power than linear models or generalized linear models).•Two-category test:Mann-Whitney U (Wilcoxon) test (like a t-test)•Multi-category test:Kruskal-Wallis (like ANOVA)•Continuous test:Spearman correlation (like Pearson correlation)

网址 https://youtu.be/tNxfYqa5Rtc

12. Visualizing Microbiome Diversity, Ordination

•用 R 或 QIIME 可视化•PCA•PCoA•NMDS

网址 https://youtu.be/H-u2iyiTzj0

13. Detrending and detecting gradients

•用 QIIME 进行 detrending•Detrending does not have strong statistical foundations•Use detrending for visualizing a primary gradient•Use detrending to test whether your ordination recovered the primary gradient in axis 1

网址 https://youtu.be/aNLPzdfivkM

14. Constrained Ordination

•CCA does direct gradient analysis•Never use more than 3-4 variates•More will simply over fit the data•Measure success by ratio of constrained variance explained to unconstrained variance explained•Canonical Correspondence analysis == Constrained Correspondence analysis•Not to be confused with canonical correlation analysis

网址 https://youtu.be/wHSECEI2tnQ

15. Clustering

•Use caution with supervised ordination - need to assess significance carefully•Prediction strength >0.9 or Silhouette index >0.5•Clusters can be useful ways to analyze high-dimensional data•However, direct analysis is generally better when you have known gradients/groups•Diagnostics based on direct supervised analysis generally better

网址 https://youtu.be/ORX968xJqiA

16. Supervised Learning Background

•Supervised learning tries to learning a model that will predict outcomes for novel samples•Example: classify cancer patients to determine treatment path•Models have to balance low complexity (underfitting) and high complexity (overfitting)•Model accuracy should be assessed in separate test data that it has never seen•10-fold cross validation is standard

网址 https://youtu.be/-eXnrA_3xzA

17. Supervised Learning Applications

•用 QIIME 进行随机森林分类

网址 https://youtu.be/ecz5SzP6Z_U

18. Source Tracking

•介绍 Source Tracking 实现原理以及 SourceTracker 应用•Microbial source tracking can be done at the community-wide level•SourceTracker uses Bayesian methods to deconvolute mixtures of communities•Can identify contributions of individual species from each source environment•Does not model changes after mixing (temporal dynamics)•SourceTracker:github.com/danknights/sourcetracker/releases

网址 https://youtu.be/sDevHMuYJ28

19. Compositionality

•Compositionality can cause spurious and even opposite conclusions•Dominant bugs can skew the relative abundance of minor bugs•Correlation is hard to infer•See Sparco, SPIEC-EASI•Best to do analysis with absolute abundances when possible•Spike-ins of foreign bugs and/or q PCR can circumvent this

网址 https://youtu.be/X60nFYpLWRs

20. PICRUSt and predicting functions

. PICRUSt and predicting functions

•Shotgun metagenomics can describe the full functional repertoire of a metagenome, but it is expensive•PICRUSt can produce 80-85% accurate metagenomes from 16S data sets•Useful for mining published data•Can be used to select a subset of 16S samples for shotgun sequencing•Be sure to treat the results as "suggestive only"in publications•Mostly useful on human gut samples

网址 https://youtu.be/mPQCl_cHCsM

21. Shotgun Taxonomy

•Shotgun metagenomics can be used for identifying species•Far superior to 16S•Approaches to Shotgun taxonomy•MetaPhlAn and MetaPhlAn2•Pre-identify a set of marker genes•Genes that are conserved within a species but not elsewhere•Requires alignment,but uses small database•Kraken,others•Use all unique k-mers as markers•UItrafast,but large database

网址 https://youtu.be/DlQTXdb2rhg


看到这里的小伙伴恭喜你发现了隐藏福利~ 我帮大家搬运了全集

链接:https://pan.baidu.com/s/194r0zs5WbcNFQKQrV0Nnkg 密码:0rjr


生信技能树目前已经公开了三个生信知识库,记得来关注哦~

每周文献分享

https://www.yuque.com/biotrainee/weeklypaper

肿瘤外显子分析指南

https://www.yuque.com/biotrainee/wes

生物统计从理论到实践

https://www.yuque.com/biotrainee/biostat

本文分享自微信公众号 - 生信菜鸟团(bio_123456789),作者:鲍志炜

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2020-04-12

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 如何将基因型数据转为 012 格式

    最近碰到将基因型数据转为 012 格式的需求,就顺手总结了一些方法和大家分享,要是有更方便的法子欢迎大家多多补充~

    生信菜鸟团
  • 用 Tax4Fun2 对 16S 微生物组数据进行功能预测

    16S rRNA 基因的高通量测序已被广泛用于研究各种海洋,地表和宿主相关环境中微生物群落的组成和结构。但许多生物学问题更需要我们研究其功能变化,而不仅仅是微生...

    生信菜鸟团
  • StatQuest生物统计学 - 二项分布的极大似然估计

    极大似然估计(Maximum Likelihood)已经在以前的推文中提到过,在那里提到过,Likelihood也是一个概率值,只不过它不同于一般的概率值。

    生信菜鸟团
  • How to Add an API to your Web Service

    Introduction APIs are a great way to extend your application, build a community,...

    张善友
  • DAY 1: 学习CUDA C Programming Guide

    GPUS Lady
  • Rails里应用Devise

    用户2183996
  • 特征匹配--GMS: Grid-based Motion Statistics for Fast, Ultra-robust Feature Correspondence

    GMS: Grid-based Motion Statistics for Fast, Ultra-robust Feature Correspondence ...

    用户1148525
  • Earshot Builds with Watson APIs to Enhance its Marketing

    Social Media IS Big Data. On any given day more than 500 million tweets and 55 m...

    首席架构师智库
  • IBM Research: WatsonPaths

    A new cognitive computing project that enables more natural interaction between ...

    首席架构师智库
  • How to design and implement a deep iterator?

    包子IT面试培训 助你拿到理想的offer! Question: Write a deep iterator to iterate through a li...

    包子面试培训

扫码关注云+社区

领取腾讯云代金券