在这篇文章:
Logares, R., Deutschmann, I.M., Junger, P.C. et al. Disentangling the mechanisms shaping the surface ocean microbiota. Microbiome 8, 55 (2020). https://doi.org/10.1186/s40168-020-00827-8
利用maximal information coefficient (MIC)计算了环境因子与OTU的相关性。
Additional file 13: Figure S7. 和每个环境因子显著相关(MIC>0.4)的OTU占所有OTU数量及丰度的比例。
MIC是专为快速探索多维数据集而设计的双变量相关性度量。MIC是基于最大信息量的非参数探索(maximal information-based nonparametric exploration, MINE)统计的一部分,可以用来识别和描述数据集中的重要关系。
该方法最初发表于Science上。具体算法我没看,应该看也看不懂。
MIC介绍:http://www.exploredata.net/
R包minerva可计算MINE的各种指数。主要的函数为mine,用法很简单,想尝试的自己看帮助文档即可。
仿照上图左半边,写了个简单代码:
library(minerva) # ?minerva
library(ggplot2)
library(reshape2)
otu = read.table(file="otu.txt",sep="\t",header=T,row.names=1)
env = read.table("env.factor.txt",sep="\t",header=T,row.names=1)
otu = otu[rowSums(otu)>0,]
mat1 = match(rownames(env),colnames(otu))
otu = otu[,mat1]
mat2 = match(colnames(otu),rownames(env))
env = env[mat2,]
mic = c()
for (i in 1:nrow(otu)){
res = c()
for (j in 1:ncol(env)){
res = c(res,mine(as.numeric(otu[i,]),env[,j])$MIC)
}
mic = rbind(mic,res)
}
colnames(mic) = colnames(env)
rownames(mic) = rownames(otu)
head(mic)
# 以0.4为界,mic>0.4的认为是显著的相关
per = c()
for (k in 1:ncol(mic)){ # k = 1
per = cbind(per,table(mic[,k]>0.4)[2]*100/nrow(mic))
}
colnames(per) = colnames(env)
rownames(per) = "percent"
per.gg<-melt(per,
id.vars = c("percent"),
measure.vars = colnames(per),
variable.name='env',
value.name='Percentage')
per.gg
p = ggplot(per.gg)+geom_bar(aes(x=Var2,y=values),stat="identity")+
coord_flip()+labs(title="OTUs (MIC>0.4)",x="Factors",y="Percentage")+
theme_bw()
p