一、hypothesis testing for paired data
hypotheses for paired means：
二、confidence intervals for paired data
estimating the difference between pairedmeans：
三、comparing independent means
Conditions for inference for comparing twoindependent means:
✓ within groups: sampled observations mustbe independent
‣ random sample/assignment
‣ if sampling without replacement, n < 10% of population
✓ between groups: the two groups must beindependent of each other (non-paired)
2. Sample size/skew: Each sample size mustbe at least 30 (n1 ≥ 30 and n2 ≥ 30), larger if the population distributionsare very skewed.
testing for a differencebetween independent means
‣ null hypothesis: no difference
‣ alternative hypothesis: some difference
‣ same conditions and SE as the confidenceinterval
‣ An alternative approach to constructingconfidence intervals is bootstrapping.
‣ This term comes from the phrase “pullingoneself up by one’s bootstraps”, which is a metaphor for accomplishing an impossibletask without any outside help.
‣ In this case the im/possible task isestimating a population parameter, and we’ll accomplish it using data from onlythe given sample.
(1) take a bootstrap sample - a randomsample taken with replacement from the original sample, of the same size as theoriginal sample
(2) calculate the bootstrap statistic - astatistic such as mean, median, proportion, etc. computed on the bootstrap samples
(3) repeat steps (1) and (2) many times tocreate a bootstrap distribution - a distribution of bootstrap statistics
‣ Not as rigid conditions as CLT basedmethods.
‣ However if the bootstrap distribution isextremely skewed or sparse, the bootstrap interval might be unreliable.
‣ A representative sample is required forgeneralizability. If the sample is biased, the estimates resulting from thissample will also be biased
bootstrap vs.sampling distribution
‣ Sampling distribution created usingsampling (with replacement) from the population.
‣ Bootstrap distribution created usingsampling (with replacement) from the sample.
‣ Both are distributions of samplestatistics
‣ n is small & σ unknown (almostalways), use the t distribution to address the uncertainty of the standarderror estimate
‣ bell shaped but thicker tails than thenormal
‣ observations more likely to fall beyond2 SDs from the mean ‣ extra thick tails helpful for mitigatingthe effect of a less reliable estimate for the standard error of the samplingdistribution
‣ always centered at 0 (like the standardnormal)
‣ has one parameter: degrees of freedom(df) - determines thickness of tails
‣ remember, the normal distribution hastwo parameters: mean and SD
六、inference for a small sample mean
七、inference for comparing two small sample means
八、comparing more than two means
本文分享自微信公众号 - 机器学习与统计学（tjxj666）
原文出处及转载信息见文内详细说明，如有侵权，请联系 firstname.lastname@example.org 删除。
一、sampling variability & CLT for proportions
properties (I) the magnitude (absolutevalue) of the correlation coefficient meas...
Processor operations mostly involve processing data. This data can be stored in ...
如需根据本风格指南配置 IntelliJ 格式化程序，请安装 Kotlin 插件1.2.20 或更高版本，转到“Settings | Editor | Code...