LDM及permanovaFL的使用

Listenlii-生物信息知识分享

发布于 2021-07-30 15:10:07

6650

发布于 2021-07-30 15:10:07

文章被收录于专栏：Listenlii的生物信息笔记

背景：

Bioinformatics：线性分解模型LDM检验微生物差异

Microbiome: 组内PERMANOVA和LDM提高了微生物组数据分析的效率

本文对ldm和permanovaFL函数做简要介绍。

# https://github.com/yijuanhu/LDM
# 从github上下载后本地安装

library(LDM)

# LDM ---------------------------------------------------------------------
ldm(
  formula, 
  data = .GlobalEnv, # 协变量,也即环境因子或者分组
  tree = NULL, # 用于计算基于树的距离矩阵
  dist.method = "bray", # 计算距离矩阵
  dist = NULL, # 可直接输入距离矩阵
  cluster.id = NULL,
  strata = NULL, # 分组
  how = NULL, # permute的一个参数
  perm.within.type = "free", # 随机置换方法, "free", "none", "series", or "grid".
  perm.between.type = "none",# 同上
  perm.within.ncol = 0, # 一个正整数，仅在perm.within.type="grid"时使用。
  perm.within.nrow = 0, # 同上
  n.perm.max = NULL, # 置换次数。最大5000，NULL会自动设置。
  n.rej.stop = 100, # 置换统计量超过观察到的统计量的次数
  seed = NULL, # 随机种子
  test.global = TRUE, # 是否进行global test.
  test.otu = TRUE,# 是否进行OTU-specific test.
  fdr.nominal = 0.1, # FDR value
  square.dist = TRUE, # 距离矩阵平方
  center.dist = TRUE, # 距离矩阵Gower中心化
  scale.otu.table = TRUE, # 对OTU的行标准化，计算频率。
  center.otu.table = TRUE,# 对OTU的行中心化，如果距离矩阵做了这个OTU也要做。
  freq.scale.only = FALSE, # 是否只分析频率数据
  binary = FALSE, # 数据是否做0-1转化
  n.rarefy = 0 # 稀释次数
)

# formular格式：
otu.table ~ (first set of covariates) + (second set of covariates) ... + (last set of covariates)

# or

otu.table | confounders ~ (first set of covariates) + (second set of covariates) ... + (last set of covariates)

# 其中OTU表行为样本，列为OTU

# 如给定OTU表y和包含4个协变量a、b、c、d:
Y ~ a + b + c + d ### 没有混杂因子, 4个子模型(即协变量)
Y ~ (a+b) + (c+d) ###没有混杂因子，2个子模型每个有2个协变量
Y | b ~ (a+c) + d ### b是混杂因子模型，子模型1是(a+c)，子模型2是d
Y | b+c ~ a*d     ###有2个混杂因子b和c;有一个子模型a、d和a:d(交互)组成。这个例子等价于
y | b+c ~ (a+d+a:d)  

y | as.factor(b) ~ (a+d) + a:d  ###混杂因子b将被视为一个因子变量，子模型1将具有主效应a和d，子模型2将只有a和d之间的交互作用
y | as.factor(b) ~ (a) + (d) + (a:d)  ###有3个子模型a、d和a:d。

# 结果实在是太多了，共有53项。这里放一部分：
x：正交矩阵

dist：距离矩阵

mean.freq：OTU平均相对丰度（列均值）

y.freq：转为频率的OTU表  

beta：每个特征对应每个OTU的效应量

VE.global.freq.confounders：混杂因子解释的变异

VE.global.freq.submodels：子模型解释的变异  

VE.otu.freq.confounders：每个OTU通过混杂因子解释的变异  

VE.otu.freq.submodel：每个OTU通过子模型解释的变异  

VE.global.tran.confounders：基于arcsin-root-transformed频率数据的混杂因子效应  

VE.global.tran.submodels：基于arcsin-root-transformed频率数据的子模型效应  

VE.otu.tran.confounders：基于arcsin-root-transformed频率数据的每个OTU通过混杂因子解释的变异

VE.otu.tran.submodels：基于arcsin-root-transformed频率数据的每个OTU通过子模型解释的变异

F.global.freq：每个子模型的F统计量

F.global.tran：基于arcsin-root-transformed频率数据每个子模型的F统计量

F.otu.freq：每个子模型每个OTU的F统计量  

F.otu.tran：基于arcsin-root-transformed频率数据每个子模型每个OTU的F统计量  

p.global.freq：每个协变量global test 的p值

p.global.tran：基于arcsin-root-transformed频率数据每个协变量global test 的p值

p.global.omni：基于omnibus statistics每个协变量global test 的p值

p.otu.freq：OTU-specific tests的p值

p.otu.tran：基于arcsin-root-transformed频率数据  OTU-specific tests的p值

p.otu.omni：基于omnibus statisticsOTU-specific tests的p值

q.otu.freq：OTU-specific tests的q-values (FDR-adjusted p-values)

q.otu.tran：基于arcsin-root-transformed频率数据  OTU-specific tests的q-values (FDR-adjusted p-values)

q.otu.omni：基于omnibus statisticsOTU-specific tests的q-values (FDR-adjusted p-values)

p.global.pa  ：基于0-1数据每个协变量global test的p值

p.otu.pa  ：基于0-1数据每个协变量OTU-specific test的p值

q.otu.pa：基于0-1数据OTU-specific test的q-values

# 例子
#-----------------------------------------------
data(throat.meta) # 环境因子
data(throat.otu.tab) # OTU 表
#-----------------------------------------------
# Testing presence-absence associations: LDM-A (recommended)
#-----------------------------------------------
res.ldmA <- ldm(formula=throat.otu.tab | (Sex+AntibioticUse) ~ SmokingStatus+PackYears+Age, 
                data=throat.meta, seed=2021, n.perm.max=5,
                n.rarefy="all") # parameter for requesting LDM-A

# permanovaFL -------------------------------------------------------------

# 参数说明同ldm
permanovaFL(
  formula,
  data = .GlobalEnv,
  tree = NULL,
  dist.method = "bray",
  cluster.id = NULL,
  strata = NULL,
  how = NULL,
  perm.within.type = "free",
  perm.between.type = "none",
  perm.within.ncol = 0,
  perm.within.nrow = 0,
  n.perm.max = 5000,
  n.rej.stop = 100,
  seed = NULL,
  square.dist = TRUE,
  center.dist = TRUE,
  scale.otu.table = TRUE,
  binary = FALSE,
  n.rarefy = 0
)

#结果
F.statistics：每个协变量的F值

p.permanova：每个协变量的P值

n.perm.completed：完成的置换次数

permanova.stopped：停止规则

seed：随机种子  

# 例子
#----------------------------------------------------------------
# Testing presence-absence associations: PERMANOVA-F(5)
#----------------------------------------------------------------
res.perm.F <- permanovaFL(throat.otu.tab | (Sex+AntibioticUse) ~ SmokingStatus+PackYears, 
                          data=throat.meta, seed=2021, 
                          dist.method="jaccard", binary=TRUE,  # parameters 
                          n.rarefy=5)   # for requesting PERMANOVA-F(5)

# rarefy 5次，结果为5列。共两个协变量，结果为2行
$F.statistics
[,1]     [,2]     [,3]     [,4]     [,5]
[1,] 1.6377339 3.249570 4.856997 6.482301 8.185227
[2,] 0.7769379 1.564397 2.366726 3.098582 3.987804

$p.permanova
[,1]       [,2]       [,3]       [,4]       [,5]
[1,] 0.00339932 0.00359928 0.00419916 0.00319936 0.00279944
[2,] 0.94940000 0.95100000 0.94460000 0.96040000 0.93940000

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-07-15，如有侵权请联系 cloudcommunity@tencent.com 删除

数据分析