# inference for numerical variables

hypotheses for paired means：

estimating the difference between pairedmeans：

Conditions for inference for comparing twoindependent means:

1. Independence:

✓ within groups: sampled observations mustbe independent

‣ random sample/assignment

‣ if sampling without replacement, n < 10% of population

✓ between groups: the two groups must beindependent of each other (non-paired)

2. Sample size/skew: Each sample size mustbe at least 30 (n1 ≥ 30 and n2 ≥ 30), larger if the population distributionsare very skewed.

testing for a differencebetween independent means

‣ null hypothesis: no difference

‣ alternative hypothesis: some difference

‣ same conditions and SE as the confidenceinterval

summary

‣ An alternative approach to constructingconfidence intervals is bootstrapping.

‣ This term comes from the phrase “pullingoneself up by one’s bootstraps”, which is a metaphor for accomplishing an impossibletask without any outside help.

‣ In this case the im/possible task isestimating a population parameter, and we’ll accomplish it using data from onlythe given sample.

bootstrapping scheme

(1) take a bootstrap sample - a randomsample taken with replacement from the original sample, of the same size as theoriginal sample

(2) calculate the bootstrap statistic - astatistic such as mean, median, proportion, etc. computed on the bootstrap samples

(3) repeat steps (1) and (2) many times tocreate a bootstrap distribution - a distribution of bootstrap statistics

bootstrappinglimitations

‣ Not as rigid conditions as CLT basedmethods.

‣ However if the bootstrap distribution isextremely skewed or sparse, the bootstrap interval might be unreliable.

‣ A representative sample is required forgeneralizability. If the sample is biased, the estimates resulting from thissample will also be biased

bootstrap vs.sampling distribution

‣ Sampling distribution created usingsampling (with replacement) from the population.

‣ Bootstrap distribution created usingsampling (with replacement) from the sample.

‣ Both are distributions of samplestatistics

‣ n is small & σ unknown (almostalways), use the t distribution to address the uncertainty of the standarderror estimate

‣ bell shaped but thicker tails than thenormal

‣ observations more likely to fall beyond2 SDs from the mean ‣ extra thick tails helpful for mitigatingthe effect of a less reliable estimate for the standard error of the samplingdistribution

‣ always centered at 0 (like the standardnormal)

‣ has one parameter: degrees of freedom(df) - determines thickness of tails

‣ remember, the normal distribution hastwo parameters: mean and SD

tstatistic

0 条评论

• ### Duke@coursera 数据分析与统计推断unit5 inference for categorical variables

一、sampling variability & CLT for proportions

• ### R Programming week2 Functions and Scoping Rules

When R tries to bind a value to a symbol,it searches through a series of environ...

• ### Duke@coursera 数据分析与统计推断unit6introduction to linear regression

properties (I) the magnitude (absolutevalue) of the correlation coefficient meas...

• ### JDK7并行计算框架介绍一 Fork/Join概述（官方原版-英文）

New in the Java SE 7 release, the fork/join framework is an implementation of th...

• ### HOJ-1005 Fast Food（动态规划）

Fast Food My Tags (Edit) Source : Unknown Time limit : 3 sec Memory...

• ### Assembly - Registers

Processor operations mostly involve processing data. This data can be stored in ...

• ### Kotlin 编码规约

如需根据本风格指南配置 IntelliJ 格式化程序，请安装 Kotlin 插件1.2.20 或更高版本，转到“Settings | Editor | Code...

• ### nano：基本操作

I once faced with a problem. I wrote a piece of code related with notifcation. E...

• ### 基于注意力的基于神经网络的远程监督情感态度提取(CS CL)

在情感态度提取任务中，目标是识别文本中实体之间的情感关系。本文提供了一种在情感态度提取任务中基于注意力的上下文编码器的研究。基于此任务，采用两种类型的注意力上下...