探究施肥是否会对促进植株生成(植株生长以树高作为指标来衡量)。试验为:
做方差分析的时候数据需要满足正态分布;方差齐性等。正常拿到数据后需要对数据是否符合正态分布和组间方差是否一致做检验。如何来做以上两个检验今天先忽略掉,在默认拿到的数据符合条件后直接在做单因素方差分析。
使用numpy模块模拟生成5组,每组100个正态分布数据 正态分布函数参数依次是均值,标准差、数据的个数
import numpy as np
df = {'ctl':list(np.random.normal(10,5,100)),
'treat1':list(np.random.normal(15,5,100)),\
'treat2':list(np.random.normal(20,5,100)),\
'treat3':list(np.random.normal(30,5,100)),\
'treat4':list(np.random.normal(31,5,100))}
#组合成数据框
import pandas as pd
df = pd.DataFrame(df)
df.head()
ctl treat1 treat2 treat3 treat4
0 9.614605 15.719777 17.068697 23.842793 32.206690
1 7.617131 20.481499 14.880172 29.685766 29.372065
2 5.078861 13.683188 20.780142 25.123814 29.500179
3 4.749667 13.209488 15.390307 37.757911 27.912748
4 5.167490 20.374576 18.669367 33.772163 34.394511
df.boxplot(grid = False)
import matplotlib.pyplot as plt
plt.show()
1.png
数据格式整理为一列为处理,一列为数值的形式
df_melt = df.melt()
df_melt.head()
variable value
0 ctl 9.614605
1 ctl 7.617131
2 ctl 5.078861
3 ctl 4.749667
4 ctl 5.167490
df_melt.columns = ['Treat','Value']
df_melt.head()
Treat Value
0 ctl 9.614605
1 ctl 7.617131
2 ctl 5.078861
3 ctl 4.749667
4 ctl 5.167490
import seaborn as sns
sns.boxplot(x='Treat',y='Value',data = df_melt)
2.jpg
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
model = ols('Value~C(Treat)',data=df_melt).fit()
anova_table = anova_lm(model, typ = 2)
print(anova_table)
sum_sq df F PR(>F)
C(Treat) 34622.433013 4.0 351.230458 4.926641e-143
Residual 12198.617718 495.0 NaN NaN
方差分析的结果我们需要看P值,本例中P值等于4.926641e-143小于0.05,说明处理间存在显著差异,具体哪个处理间存在差异还需要通过多重检验来看。
比较常用的检验方法是邓肯多重检验(Tukey HSD test)
from statsmodels.stats.multicomp import MultiComparison
mc = MultiComparison(df_melt['Value'],df_melt['Treat'])
tukey_result = mc.tukeyhsd(alpha = 0.5)
print(tukey_result)
Multiple Comparison of Means - Tukey HSD,FWER=0.50
=============================================
group1 group2 meandiff lower upper reject
---------------------------------------------
ctl treat1 4.4997 3.379 5.6204 True
ctl treat2 9.498 8.3773 10.6186 True
ctl treat3 19.831 18.7103 20.9517 True
ctl treat4 21.1355 20.0148 22.2562 True
treat1 treat2 4.9983 3.8776 6.1189 True
treat1 treat3 15.3313 14.2106 16.452 True
treat1 treat4 16.6358 15.5151 17.7565 True
treat2 treat3 10.333 9.2124 11.4537 True
treat2 treat4 11.6375 10.5168 12.7582 True
treat3 treat4 1.3045 0.1838 2.4252 True
---------------------------------------------
多重检验结果表明各个组间均存在显著差异(reject这一列为True的话则说明两个处理间存在差异)