Adonis结果P值小于0.05，一定代表两组样品物种构成差异显著吗？

生信宝典

发布于 2022-01-18 21:17:52

2K0

发布于 2022-01-18 21:17:52

文章被收录于专栏：生信宝典

前情回顾

方差分析基本概念：方差分析中的“元”和“因素”是什么？

PERMANOVA原理解释：这个统计检验可用于判断PCA/PCoA等的分群效果是否显著！

实战1：画一个带统计检验的PCoA分析结果

配对检验：画一个带统计检验的PcOA分析结果（再进一步，配对比较）

你的adonis用对了吗？不同因素的顺序竟然对结果有很大影响

为PERMANOVA/Adonis分析保驾护航，检验数据离散度

非参数检验也不是什么都不需要关注，比如上面提到的因素顺序和方差加和方式是一个需要注意的点。除此之外，非参数多元方差分析应用时还有下面这些注意事项：

PERMANOVA检验没有考虑变量之间的共线性关系，因此也不能够用于探索这种关系。
嵌套或分层设计 (Nested or hierarchical designs)时需要考虑合适的置换策略。需要明确哪些样品是真正可以交换的 (exchangeable)。
PERMANOVA有个假设是balanced designs (不同分组的样本数相等), 非平衡设计也能处理。
如果不同组的样品在检测指标构成的空间的中心点没有差别，但每个组内检测指标离散度较大，也会导致获得显著性的P值。在解释结果时，需要同时评估数据离散度的影响。

vegdist评估数据离散度，再解释adonis的结果

前面我们用下面的代码检验了Managment对物种组成差异影响的显著程度，获得P-value=0.002 < 0.05，表示管理方式对物种组成有显著影响。但这一影响是否受到每个分组里面数据离散程度的影响呢？

library(vegan)
data(dune)
data(dune.env)
# 基于bray-curtis距离进行计算
set.seed(1)
dune.div <- adonis2(dune ~ Management, data = dune.env, permutations = 999, method="bray")

dune.div

## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = dune ~ Management, data = dune.env, permutations = 999, method = "bray")
##            Df SumOfSqs      R2      F Pr(>F)   
## Management  3   1.4686 0.34161 2.7672  0.002 **
## Residual   16   2.8304 0.65839                 
## Total      19   4.2990 1.00000                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

我们还需要利用betadisper评估下每组样本物种组成的多元一致性 (Multivariate homogeneity of groups dispersions (variances))。如下代码计算出P=0.168表示不同分组样品检测指标的离散度(方差)没有显著差异。那么，adonis检测出的差异就是因为每组数据在空间的中心点不同造成的，进一步说明Management对物种组成有显著影响。

# 计算加权bray-curtis距离
dune.dist <- vegdist(dune, method="bray", binary=F)

# One measure of multivariate dispersion (variance) for a group of samples 
# is to calculate the average distance of group members to the group centroid 
# or spatial median in multivariate space. 
# To test if the dispersions (variances) of one or more groups are different, 
# the distances of group members to the group centroid are subject to ANOVA. 
# This is a multivariate analogue of Levene's test for homogeneity of variances 
# if the distances between group members and 
# group centroids is the Euclidean distance.
dispersion <- betadisper(dune.dist, group=dune.env$Management)
permutest(dispersion)

## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
## 
## Response: Distances
##           Df  Sum Sq  Mean Sq      F N.Perm Pr(>F)
## Groups     3 0.13831 0.046104 1.9506    999  0.159
## Residuals 16 0.37816 0.023635

从下面的图上也可以看出，4种管理方式下样品在空间的中心点相距较远。（也可以参考前面如何美化这个图）

plot(dispersion, hull=FALSE, ellipse=TRUE) ##sd ellipse

Q: When running adonis (vegan package) I got an r2 = 0.45, andp = 0.001. When I ran the betadisper and ran a subsequent permutation test I got an F = 1 and p = 0.3. A: A non-significant result in betadisper is not necessarily related to a significant/non-significant result in adonis. The two tests are testing different hypothesis. The former testshomogeneity of dispersion among groups (regions in your case), which is a condition (assumption) for adonis. The latter tests no difference in ‘location’, that is, tests whether composition among groups is similar or not. You may have the centroids of two groups in NMS at a very similar position in the ordination space, but if theirdispersions are quite different, adonis will give you a significant p-value, thus, the result is heavily influenced not by thedifference in composition between groups but bydifferences in composition within groups (heterogeneous dispersion, and thus a measure of beta diversity). In short, your results are fine, you are meeting the ‘one assumption’ for adonis (homogeneous dispersion) and thus you are certain that results from adonis are ‘real’ and not an artifact of heterogeneous dispersions. For more information you can read Anderson (2006) Biometrics 62(1):245-253 and Anderson (2006) Ecology Letters 9(6):683-693. Hope this helps! https://stats.stackexchange.com/questions/212137/betadisper-and-adonis-in-r-am-i-interpreting-my-output-correctly

数据离散度不同而中心点一致，adonis也可能显著

下面我们看一个模拟的例子，模拟出3套群体的物种丰度表，群体1、群体2、群体3的物种空间的中心点一致，而物种丰度的离散度依次变小，PERMANOVA检验显著，betadisper结果也显著，这时解释数据时就要小心。这个导致显著的原因是什么。

set.seed(1)
num <- 30
# 控制每个物种的均值
mean <- seq(10,120,by=10)
# 控制离散度
disp <- c(5,50,200)

# 模拟3组样品的数据；直接是转置后的物种丰度表
sites.a <- as.data.frame(mapply(rnbinom, n=num, size=disp[1], mu=mean))
rownames(sites.a) <- paste('site.a', 1:num, sep=".")
colnames(sites.a) <- paste('Species',letters[1:length(mean)], sep=".")

sites.b <- as.data.frame(mapply(rnbinom, n=num, size=disp[1:2], mu=mean))
rownames(sites.b) <- paste('site.b', 1:num, sep=".")
colnames(sites.b) <- paste('Species',letters[1:length(mean)], sep=".")

sites.c <- as.data.frame(mapply(rnbinom, n=num, size=disp, mu=mean))
rownames(sites.c) <- paste('site.c', 1:num, sep=".")
colnames(sites.c) <- paste('Species',letters[1:length(mean)], sep=".")

otu_table_t <- rbind(sites.a,sites.b,sites.c)
otu_table_t[sample(1:90,5),]

##           Species.a Species.b Species.c Species.d Species.e Species.f Species.g Species.h Species.i Species.j
## site.c.22        13        15        43        29        49        72        24       102        75        96
## site.a.26         8        23        46        29        25        15        91        49        58        54
## site.a.13        14        30        47        56        18        77       111       128        90        53
## site.a.14         5        15        17        56        37        75        81        59        63        58
## site.b.21        15        24         8        33        28        42       108        74        76        64
##           Species.k Species.l
## site.c.22       139       142
## site.a.26        87       129
## site.a.13        33        47
## site.a.14       164       183
## site.b.21        52       103

生成Metadata数据，包含样品的分组信息。目的就是检验不同组的物种构成是否有显著差异。

metadata <- data.frame(Sample=rownames(otu_table_t), Group=rep(c("A","B","C"), each=num))
rownames(metadata) <- metadata$Sample
metadata[sample(1:90,5),,drop=F]

##              Sample Group
## site.a.28 site.a.28     A
## site.b.12 site.b.12     B
## site.a.20 site.a.20     A
## site.b.10 site.b.10     B
## site.a.10 site.a.10     A

PCoA和NMDS分析可视化不同组样品物种组成的差异度

统计分析前，先直观的看一下不同组样本在物种定义的空间上的分布。

为什么要画个图：参考 - 什么是安斯库姆四重奏？为什么统计分析之前必须要作图？

# 计算加权bray-curtis距离
otu_dist <- vegdist(otu_table_t, method="bray", binary=F)

otu_pcoa <- cmdscale(otu_dist, k=3, eig=T)

otu_pcoa_points <- as.data.frame(otu_pcoa$points)
sum_eig <- sum(otu_pcoa$eig)
eig_percent <- round(otu_pcoa$eig/sum_eig*100,1)

colnames(otu_pcoa_points) <- paste0("PCoA", 1:3)

otu_pcoa_result <- cbind(otu_pcoa_points, metadata)

从PCoA的结果上来看，A,B,C三个组在第一、第二、第三主坐标轴没有明显的区分开。

library(ggplot2)
library(patchwork)

ggplot(otu_pcoa_result, aes(x=PCoA1, y=PCoA2, color=Group)) +
  labs(x=paste("PCoA 1 (", eig_percent[1], "%)", sep=""),
       y=paste("PCoA 2 (", eig_percent[2], "%)", sep="")) +
  geom_point(size=4) + stat_ellipse(level=0.9) +
  theme_classic() + coord_fixed() +
  ggplot(otu_pcoa_result, aes(x=PCoA1, y=PCoA3, color=Group)) +
  labs(x=paste("PCoA 1 (", eig_percent[1], "%)", sep=""),
       y=paste("PCoA 3 (", eig_percent[3], "%)", sep="")) +
  geom_point(size=4) + stat_ellipse(level=0.9) +
  theme_classic() + coord_fixed()

从NMDS结果看，A,B,C三组也区分不开。

otu_mds <- metaMDS(otu_table_t, k=5)  #using all the defaults

## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1131245 
## Run 1 stress 0.1131233 
## ... New best solution
## ... Procrustes: rmse 0.0003155417  max resid 0.001341899 
## ... Similar to previous best
## Run 2 stress 0.1131243 
## ... Procrustes: rmse 0.0009154324  max resid 0.00352237 
## ... Similar to previous best
## Run 3 stress 0.1131238 
## ... Procrustes: rmse 0.0002307456  max resid 0.001378836 
## ... Similar to previous best
## Run 4 stress 0.1131239 
## ... Procrustes: rmse 0.0002008885  max resid 0.0008441584 
## ... Similar to previous best
## Run 5 stress 0.1131233 
## ... Procrustes: rmse 0.0004594988  max resid 0.00248363 
## ... Similar to previous best
## Run 6 stress 0.1136538 
## Run 7 stress 0.1131231 
## ... New best solution
## ... Procrustes: rmse 6.187922e-05  max resid 0.0002788433 
## ... Similar to previous best
## Run 8 stress 0.1131234 
## ... Procrustes: rmse 0.000457399  max resid 0.002017475 
## ... Similar to previous best
## Run 9 stress 0.1131243 
## ... Procrustes: rmse 0.0003620819  max resid 0.001329571 
## ... Similar to previous best
## Run 10 stress 0.1131235 
## ... Procrustes: rmse 0.0001788438  max resid 0.0008840311 
## ... Similar to previous best
## Run 11 stress 0.1131248 
## ... Procrustes: rmse 0.0004674201  max resid 0.001960981 
## ... Similar to previous best
## Run 12 stress 0.1131231 
## ... New best solution
## ... Procrustes: rmse 0.0003807188  max resid 0.001578129 
## ... Similar to previous best
## Run 13 stress 0.1131238 
## ... Procrustes: rmse 0.0004016239  max resid 0.002178598 
## ... Similar to previous best
## Run 14 stress 0.113123 
## ... New best solution
## ... Procrustes: rmse 0.0001931854  max resid 0.0007886561 
## ... Similar to previous best
## Run 15 stress 0.1176584 
## Run 16 stress 0.1131244 
## ... Procrustes: rmse 0.000621146  max resid 0.002339344 
## ... Similar to previous best
## Run 17 stress 0.1131237 
## ... Procrustes: rmse 0.0004553297  max resid 0.0019548 
## ... Similar to previous best
## Run 18 stress 0.1131236 
## ... Procrustes: rmse 0.000454603  max resid 0.001894929 
## ... Similar to previous best
## Run 19 stress 0.1131241 
## ... Procrustes: rmse 0.0005855289  max resid 0.002455173 
## ... Similar to previous best
## Run 20 stress 0.113124 
## ... Procrustes: rmse 0.0005247607  max resid 0.001899271 
## ... Similar to previous best
## *** Solution reached

otu_mds_scores <- as.data.frame(scores(otu_mds))  

otu_mds_scores <- cbind(otu_mds_scores, metadata)

library(ggplot2)
ggplot(data=otu_mds_scores, aes(x=NMDS1,y=NMDS2,colour=Group)) + 
  geom_point(size=4) + 
  stat_ellipse(level = 0.9) +
  theme_classic()

进行Adonis检验和数据离散度评估

adonis结果显示Pr(>F)<0.05，统计显著；不同组之间的物种组成存在显著差异。这与PCoA和NMDS的结果还是有些不一致的。那这个统计差异是怎么来的呢？

library(vegan)
adon.results<-adonis(otu_dist ~ Group, data=metadata, perm=999)
print(adon.results)

## 
## Call:
## adonis(formula = otu_dist ~ Group, data = metadata, permutations = 999) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##           Df SumsOfSqs  MeanSqs F.Model      R2 Pr(>F)    
## Group      2   0.10752 0.053760  2.4707 0.05375  0.001 ***
## Residuals 87   1.89300 0.021759         0.94625           
## Total     89   2.00052                  1.00000           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

betadisper检验Pr(>F)<0.05表明不同组的数据在空间分布的离散度显著不同。这是导致adonis结果显著的主要原因。不同分组之间物种的构成的显著不同不是体现在物种空间中心点的变化，而是物种空间离散度的变化。

mod <- betadisper(otu_dist, metadata$Group)
permutest(mod)

## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
## 
## Response: Distances
##           Df   Sum Sq  Mean Sq      F N.Perm Pr(>F)    
## Groups     2 0.157498 0.078749 80.188    999  0.001 ***
## Residuals 87 0.085439 0.000982                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

用一组可视化来展示这个差异的成因

把每组样本抽提出来，分别绘制下PCoA的样品分布，可以看出，每组样品在PCoA定义的空间上中心点是很相近的，而样品分散程度不同。也就是说分组内样品的多样性反应到了不同分组的物种构成差异上了，这个“显著”的差异是不是我们关注的，需要自己来判断了。

# extract the centroids and the site points in multivariate space.  
centroids<-data.frame(grps=rownames(mod$centroids),data.frame(mod$centroids))
vectors<-data.frame(group=mod$group,data.frame(mod$vectors))

# to create the lines from the centroids to each point we will put it in a format that ggplot can handle
seg.data<-cbind(vectors[,1:3],centroids[rep(1:nrow(centroids),as.data.frame(table(vectors$group))$Freq),2:3])
names(seg.data)<-c("group","v.PCoA1","v.PCoA2","PCoA1","PCoA2")

# create the convex hulls of the outermost points
grp1.hull<-seg.data[seg.data$group=="A",1:3][chull(seg.data[seg.data$group=="A",2:3]),]
grp2.hull<-seg.data[seg.data$group=="B",1:3][chull(seg.data[seg.data$group=="B",2:3]),]
grp3.hull<-seg.data[seg.data$group=="C",1:3][chull(seg.data[seg.data$group=="C",2:3]),]
all.hull<-rbind(grp1.hull,grp2.hull,grp3.hull)

library(gridExtra)

panel.a<-ggplot() +
  geom_polygon(data=all.hull[all.hull=="A",],aes(x=v.PCoA1,y=v.PCoA2),colour="black",alpha=0,linetype="dashed") +
  geom_segment(data=seg.data[1:30,],aes(x=v.PCoA1,xend=PCoA1,y=v.PCoA2,yend=PCoA2),alpha=0.30) + 
  geom_point(data=centroids[1,1:3], aes(x=PCoA1,y=PCoA2),size=4,colour="red",shape=16) + 
  geom_point(data=seg.data[1:30,], aes(x=v.PCoA1,y=v.PCoA2),size=2,shape=16) +
  labs(title="A",x="",y="") +
  coord_cartesian(xlim = c(-0.2,0.2), ylim = c(-0.25,0.2)) +
  theme_classic() + 
  theme(legend.position="none")

panel.b<-ggplot() + 
  geom_polygon(data=all.hull[all.hull=="B",],aes(x=v.PCoA1,y=v.PCoA2),colour="black",alpha=0,linetype="dashed") +
  geom_segment(data=seg.data[31:60,],aes(x=v.PCoA1,xend=PCoA1,y=v.PCoA2,yend=PCoA2),alpha=0.30) + 
  geom_point(data=centroids[2,1:3], aes(x=PCoA1,y=PCoA2),size=4,colour="red",shape=17) + 
  geom_point(data=seg.data[31:60,], aes(x=v.PCoA1,y=v.PCoA2),size=2,shape=17) +
  labs(title="B",x="",y="") +
  coord_cartesian(xlim = c(-0.2,0.2), ylim = c(-0.25,0.2)) +
  theme_classic() + 
  theme(legend.position="none")

panel.c<-ggplot() + 
  geom_polygon(data=all.hull[all.hull=="C",],aes(x=v.PCoA1,y=v.PCoA2),colour="black",alpha=0,linetype="dashed") +
  geom_segment(data=seg.data[61:90,],aes(x=v.PCoA1,xend=PCoA1,y=v.PCoA2,yend=PCoA2),alpha=0.30) +
  geom_point(data=centroids[3,1:3], aes(x=PCoA1,y=PCoA2),size=4,colour="red",shape=15) + 
  geom_point(data=seg.data[61:90,], aes(x=v.PCoA1,y=v.PCoA2),size=2,shape=15) + 
  labs(title="C",x="",y="") +
  coord_cartesian(xlim = c(-0.2,0.2), ylim = c(-0.25,0.2)) +
  theme_classic() + 
  theme(legend.position="none")

panel.d<-ggplot() + 
  geom_polygon(data=all.hull,aes(x=v.PCoA1,y=v.PCoA2),colour="black",alpha=0,linetype="dashed") +
  geom_segment(data=seg.data,aes(x=v.PCoA1,xend=PCoA1,y=v.PCoA2,yend=PCoA2),alpha=0.30) + 
  geom_point(data=centroids[,1:3], aes(x=PCoA1,y=PCoA2,shape=grps),size=4,colour="red") + 
  geom_point(data=seg.data, aes(x=v.PCoA1,y=v.PCoA2,shape=group),size=2) + 
  labs(title="All",x="",y="") +
  coord_cartesian(xlim = c(-0.2,0.2), ylim = c(-0.25,0.2)) +
  theme_classic() + 
  theme(legend.position="none")

grid.arrange(panel.a,panel.b,panel.c,panel.d,nrow=1)

`PERMANOVA`的作者对这个问题的看法

Marti Anderson: “[…] Although there is also no explicit assumption regarding the homogeneity of spread within each group, PERMANOVA, like ANOSIM (Clarke 1993), will be sensitive to differences in spread (variability) among groups. Thus, if a significant difference between groups is detected using PERMANOVA, then this could be due to differences in location, differences in spread, or a combinationof the two. Perhaps the best approach is to perform a separate test for homogeneity (e.g., using the program PERMDISP) including pair-wise comparisons, as well as examining the average within and between-group distances and associated MDS plots. This will help to determine the nature of the difference between any pair of groups, whether it be due to location, spread, or a combination of the two. […]”

参考

https://www.scribbr.com/frequently-asked-questions/one-way-vs-two-way-anova/
MANOVA的前提假设 https://www.real-statistics.com/multivariate-statistics/multivariate-analysis-of-variance-manova/manova-assumptions/ https://www.statology.org/manova-assumptions/
https://statistics.laerd.com/statistical-guides/one-way-anova-statistical-guide.php
https://www.yunbios.net/h-nd-570.html
https://mp.weixin.qq.com/s/v_k4Yhe9rBWM9y9A3P3wQw
https://mp.weixin.qq.com/s?__biz=MzUzMjA4Njc1MA==&mid=2247484678&idx=1&sn=f95418a311e639704e9848545efc7fd7&scene=21#wechat_redirect
https://chrischizinski.github.io/rstats/vegan-ggplot2/
https://chrischizinski.github.io/rstats/adonis/
https://chrischizinski.github.io/rstats/ordisurf/
https://www.rdocumentation.org/packages/vegan/versions/1.11-0/topics/adonis
https://www.jianshu.com/p/dfa689f7cafd
https://stats.stackexchange.com/questions/312302/adonis-in-vegan-order-of-variables-non-nested-with-one-degree-of-freedom-for
https://stats.stackexchange.com/questions/188519/adonis-in-vegan-order-of-variables-or-use-of-strata?noredirect=1
https://github.com/vegandevs/vegan/issues/229
https://stats.stackexchange.com/questions/476256/adonis-vs-adonis2
清晰解释Type I, Type II, Type III https://mcfromnz.wordpress.com/2011/03/02/anova-type-iiiiii-ss-explained/
清晰解释Type I, Type II, Type III https://stats.stackexchange.com/questions/60362/choice-between-type-i-type-ii-or-type-iii-anova
https://thebiobucket.blogspot.com/2011/08/two-way-permanova-adonis-with-custom.html#more
adonis的前提条件 https://thebiobucket.blogspot.com/2011/04/assumptions-for-permanova-with-adonis.html#more
作者的论文 https://static1.squarespace.com/static/580e3c475016e191c523a0e2/t/5813ba8b5016e1a5b61f454a/1477687949842/Anderson_et_al-2013-ANOSIM+vs.+PERMANOVA.pdf
离散度 adonis https://chrischizinski.github.io/rstats/adonis/