首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Duke@coursera 数据分析与统计推断 unit4 inference for numerical variables

Duke@coursera 数据分析与统计推断 unit4 inference for numerical variables

作者头像
统计学家
发布2019-04-10 16:48:22
5830
发布2019-04-10 16:48:22
举报

inference for numerical variables

一、hypothesis testing for paired data

hypotheses for paired means:

二、confidence intervals for paired data

estimating the difference between pairedmeans:

三、comparing independent means

Conditions for inference for comparing twoindependent means:

1. Independence:

✓ within groups: sampled observations mustbe independent

‣ random sample/assignment

‣ if sampling without replacement, n < 10% of population

✓ between groups: the two groups must beindependent of each other (non-paired)

2. Sample size/skew: Each sample size mustbe at least 30 (n1 ≥ 30 and n2 ≥ 30), larger if the population distributionsare very skewed.

testing for a differencebetween independent means

‣ null hypothesis: no difference

‣ alternative hypothesis: some difference

‣ same conditions and SE as the confidenceinterval

summary

四、bootstrapping

‣ An alternative approach to constructingconfidence intervals is bootstrapping.

‣ This term comes from the phrase “pullingoneself up by one’s bootstraps”, which is a metaphor for accomplishing an impossibletask without any outside help.

‣ In this case the im/possible task isestimating a population parameter, and we’ll accomplish it using data from onlythe given sample.

bootstrapping scheme

(1) take a bootstrap sample - a randomsample taken with replacement from the original sample, of the same size as theoriginal sample

(2) calculate the bootstrap statistic - astatistic such as mean, median, proportion, etc. computed on the bootstrap samples

(3) repeat steps (1) and (2) many times tocreate a bootstrap distribution - a distribution of bootstrap statistics

bootstrappinglimitations

‣ Not as rigid conditions as CLT basedmethods.

‣ However if the bootstrap distribution isextremely skewed or sparse, the bootstrap interval might be unreliable.

‣ A representative sample is required forgeneralizability. If the sample is biased, the estimates resulting from thissample will also be biased

bootstrap vs.sampling distribution

‣ Sampling distribution created usingsampling (with replacement) from the population.

‣ Bootstrap distribution created usingsampling (with replacement) from the sample.

‣ Both are distributions of samplestatistics

五、t distribution

‣ n is small & σ unknown (almostalways), use the t distribution to address the uncertainty of the standarderror estimate

‣ bell shaped but thicker tails than thenormal

‣ observations more likely to fall beyond2 SDs from the mean ‣ extra thick tails helpful for mitigatingthe effect of a less reliable estimate for the standard error of the samplingdistribution

‣ always centered at 0 (like the standardnormal)

‣ has one parameter: degrees of freedom(df) - determines thickness of tails

‣ remember, the normal distribution hastwo parameters: mean and SD

tstatistic

六、inference for a small sample mean

七、inference for comparing two small sample means

八、comparing more than two means

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2015-05-07,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 机器学习与统计学 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • inference for numerical variables
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档