TVP

t检验的工作原理和在Python中的实现

t检验也许使用最广泛的统计假设检验之一。

t-测试

t检验

t检验（Student’s t-Test）是一种统计假设检验，用来检验两个样本是否如逾期一样来自同一人群。

t检验有两个主要版本：

Python中，独立和相关的t检验分别通过SciPy的ttest_ind（）和ttest_rel（） 函数提供。

t = observed difference between sample means / standard error of the difference between the means

t = (mean(X1) - mean(X2)) / sed

sed = sqrt(se1^2 + se2^2)

se = std / sqrt(n)

# calculate meansmean1, mean2 = mean(data1), mean(data2)

# calculate sample standard deviationsstd1, std2 = std(data1, ddof=1), std(data2, ddof=1)

# calculate standard errorsn1, n2 = len(data1), len(data2)se1, se2 = std1/sqrt(n1), std2/sqrt(n2)

# calculate standard errorsse1, se2 = sem(data1), sem(data2)

# standard error on the difference between the samplessed = sqrt(se1**2.0 + se2**2.0)

# calculate the t statistic

t_stat = (mean1 - mean2) / sed

# degrees of freedomdf = n1 + n2 - 2

# calculate the critical valuealpha = 0.05cv = t.ppf(1.0 - alpha, df)

p值可以用t分布上的累积分布函数来计算（同样在SciPy中）。

# calculate the p-valuep = (1 - t.cdf(abs(t_stat), df)) * 2

# function for calculating the t-test for two independent samplesdef independent_ttest(data1, data2, alpha):# calculate meansmean1, mean2 = mean(data1), mean(data2)# calculate standard errorsse1, se2 = sem(data1), sem(data2)# standard error on the difference between the samplessed = sqrt(se1**2.0 + se2**2.0)# calculate the t statistict_stat = (mean1 - mean2) / sed# degrees of freedomdf = len(data1) + len(data2) - 2# calculate the critical valuecv = t.ppf(1.0 - alpha, df)# calculate the p-valuep = (1.0 - t.cdf(abs(t_stat), df)) * 2.0# return everythingreturn t_stat, df, cv, p

# seed the random number generatorseed(1)# generate two independent samplesdata1 = 5 * randn(100) + 50data2 = 5 * randn(100) + 51

# Student's t-test for independent samplesfrom numpy.random import seedfrom numpy.random import randnfrom scipy.stats import ttest_ind# seed the random number generatorseed(1)# generate two independent samplesdata1 = 5 * randn(100) + 50data2 = 5 * randn(100) + 51# compare samplesstat, p = ttest_ind(data1, data2)print('t=%.3f, p=%.3f' % (stat, p))

t=-2.262, p=0.025

# interpret via critical valueif abs(t_stat)

# interpret via p-valueif p > alpha:print('Accept null hypothesis that the means are equal.')else:print('Reject the null hypothesis that the means are equal.')

# t-test for independent samplesfrom math import sqrtfrom numpy.random import seedfrom numpy.random import randnfrom numpy import meanfrom scipy.stats import semfrom scipy.stats import t# function for calculating the t-test for two independent samplesdef independent_ttest(data1, data2, alpha):# calculate meansmean1, mean2 = mean(data1), mean(data2)# calculate standard errorsse1, se2 = sem(data1), sem(data2)# standard error on the difference between the samplessed = sqrt(se1**2.0 + se2**2.0)# calculate the t statistict_stat = (mean1 - mean2) / sed# degrees of freedomdf = len(data1) + len(data2) - 2# calculate the critical valuecv = t.ppf(1.0 - alpha, df)# calculate the p-valuep = (1.0 - t.cdf(abs(t_stat), df)) * 2.0# return everythingreturn t_stat, df, cv, p# seed the random number generatorseed(1)# generate two independent samplesdata1 = 5 * randn(100) + 50data2 = 5 * randn(100) + 51# calculate the t testalpha = 0.05t_stat, df, cv, p = independent_ttest(data1, data2, alpha)print('t=%.3f, df=%d, cv=%.3f, p=%.3f' % (t_stat, df, cv, p))# interpret via critical valueif abs(t_stat) alpha:print('Accept null hypothesis that the means are equal.')else:print('Reject the null hypothesis that the means are equal.')

t = (mean(X1) - mean(X2)) / sed

sed = sd / sqrt(n)

d1 = sum (X1[i] - X2[i])^2 for i in n

d2 = sum (X1[i] - X2[i]) for i in n

sd = sqrt((d1 - (d2**2 / n)) / (n - 1))

# calculate meansmean1, mean2 = mean(data1), mean(data2)

# number of paired samplesn = len(data1)

# sum squared difference between observationsd1 = sum([(data1[i]-data2[i])**2 for i in range(n)])# sum difference between observationsd2 = sum([data1[i]-data2[i] for i in range(n)])

# standard deviation of the difference between meanssd = sqrt((d1 - (d2**2 / n)) / (n - 1))

# standard error of the difference between the meanssed = sd / sqrt(n)

# calculate the t statistict_stat = (mean1 - mean2) / sed

# degrees of freedomdf = n - 1

# function for calculating the t-test for two dependent samplesdef dependent_ttest(data1, data2, alpha):# calculate meansmean1, mean2 = mean(data1), mean(data2)# number of paired samplesn = len(data1)# sum squared difference between observationsd1 = sum([(data1[i]-data2[i])**2 for i in range(n)])# sum difference between observationsd2 = sum([data1[i]-data2[i] for i in range(n)])# standard deviation of the difference between meanssd = sqrt((d1 - (d2**2 / n)) / (n - 1))# standard error of the difference between the meanssed = sd / sqrt(n)# calculate the t statistict_stat = (mean1 - mean2) / sed# degrees of freedomdf = n - 1# calculate the critical valuecv = t.ppf(1.0 - alpha, df)# calculate the p-valuep = (1.0 - t.cdf(abs(t_stat), df)) * 2.0# return everythingreturn t_stat, df, cv, p

# seed the random number generatorseed(1)# generate two independent samplesdata1 = 5 * randn(100) + 50data2 = 5 * randn(100) + 51

# Paired Student's t-testfrom numpy.random import seedfrom numpy.random import randnfrom scipy.stats import ttest_rel# seed the random number generatorseed(1)# generate two independent samplesdata1 = 5 * randn(100) + 50data2 = 5 * randn(100) + 51# compare samplesstat, p = ttest_rel(data1, data2)print('Statistics=%.3f, p=%.3f' % (stat, p))

Statistics=-2.372, p=0.020

# t-test for dependent samplesfrom math import sqrtfrom numpy.random import seedfrom numpy.random import randnfrom numpy import meanfrom scipy.stats import t# function for calculating the t-test for two dependent samplesdef dependent_ttest(data1, data2, alpha):# calculate meansmean1, mean2 = mean(data1), mean(data2)# number of paired samplesn = len(data1)# sum squared difference between observationsd1 = sum([(data1[i]-data2[i])**2 for i in range(n)])# sum difference between observationsd2 = sum([data1[i]-data2[i] for i in range(n)])# standard deviation of the difference between meanssd = sqrt((d1 - (d2**2 / n)) / (n - 1))# standard error of the difference between the meanssed = sd / sqrt(n)# calculate the t statistict_stat = (mean1 - mean2) / sed# degrees of freedomdf = n - 1# calculate the critical valuecv = t.ppf(1.0 - alpha, df)# calculate the p-valuep = (1.0 - t.cdf(abs(t_stat), df)) * 2.0# return everythingreturn t_stat, df, cv, p# seed the random number generatorseed(1)# generate two independent samples (pretend they are dependent)data1 = 5 * randn(100) + 50data2 = 5 * randn(100) + 51# calculate the t testalpha = 0.05t_stat, df, cv, p = dependent_ttest(data1, data2, alpha)print('t=%.3f, df=%d, cv=%.3f, p=%.3f' % (t_stat, df, cv, p))# interpret via critical valueif abs(t_stat) alpha:print('Accept null hypothesis that the means are equal.')else:print('Reject the null hypothesis that the means are equal.')

t=-2.372, df=99, cv=1.660, p=0.020Reject the null hypothesis that the means are equal.Reject the null hypothesis that the means are equal.

Statistics in Plain English，第三版，2010年。

API

scipy.stats.ttest_ind API：https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

scipy.stats.ttest_rel API：https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_rel.html

scipy.stats.sem API：https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.sem.html

scipy.stats.t API：https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.t.html

• 发表于:
• 原文链接https://kuaibao.qq.com/s/20180811B0HALD00?refer=cp_1026
• 腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号（企鹅号）传播渠道之一，根据《腾讯内容开放平台服务协议》转载发布内容。
• 如有侵权，请联系 cloudcommunity@tencent.com 删除。

2022-12-05

2022-12-05

2022-12-05

2022-12-05

2022-12-05

2022-12-05

2022-12-05

2022-12-05

2022-12-05

2022-12-05

2022-12-05