一、sampling variability& CLT
samplingdistribution
二、confidence interval
A plausible range of values for thepopulation parameter is called a confidence interval.
‣ If we report a point estimate, weprobably won’t hit the exact population parameter.
‣ If we report a range of plausible valueswe have a good shot at capturing the parameter
Confidence interval for a population mean: Computed as thesample mean plus/minus a margin of error (critical value corresponding to themiddle XX% of the normal distribution times the standard error of the samplingdistribution)
Conditions forthis confidence interval:
1.Independence:Sampled observations must be independent.
‣ random sample/assignment
‣ if sampling without replacement, n < 10% of population
2. Sample size/skew: n ≥ 30, larger if the population distributionis very skewed.
三、accuracy vs. precision
‣Suppose we took many samples and built aconfidence interval from each sample using the equation
‣Then about 95% of those intervals wouldcontain the true population mean (μ)
‣ Commonly used confidence levels inpractice are 90%, 95%, 98%, and 99%
四、required sample sizefor ME
backtracking ton for a given ME
given a target margin of error, confidencelevel, and information on the variability of the sample (or the population), wecan determine the required sample size to achieve the desired margin of error
五、another introductionto inference
two competingclaims…
1. “There is nothing going on.”
Promotion and gender are independent, nogender discrimination, observed difference in proportions is simply due tochance. → Null hypothesis
2. “There is something going on.”
Promotion and gender are dependent, thereis gender discrimination, observed difference in proportions is not due tochance. → Alternative hypothesis
Since it was quite unlikely to obtainresults like the actual data or something more extreme in the simulations (malepromotions being 30% or more higher than female promotions), we decided toreject the null hypothesis in favor of the alternative
recap:hypothesis testing framework
‣ We start with a null hypothesis (H0)that represents the status quo.
‣ We also have an alternative hypothesis(HA) that represents our research question, i.e. what we’re testing for.
‣ We conduct a hypothesis test under theassumption that the null hypothesis is true, either via simulation (end of Unit1) or theoretical methods — methods that rely on the CLT (in this Unit).
‣ If the test results suggest that thedata do not provide convincing evidence for the alternative hypothesis, westick with the null hypothesis. If they do, then we reject the null hypothesisin favor of the alternative.
六、hypothesis testing
hypotheses
null - H0: Often either askeptical perspective or a claim to be tested
alternative – HA: Representsan alternative claim under consideration and is often represented by a range ofpossible parameter values.
The skeptic will not abandon the H0 unlessthe evidence in favor of the HA is so strong that she rejects H0 in favor of HA
p-value
P(observed or more extreme outcome | H0true)
decision based on the p-value
‣ We used the test statistic to calculatethe p-value, the probability of observing data at least as favorable to thealternative hypothesis as our current data set, if the null hypothesis wastrue.
‣ If the p-value is low (lower than thesignificance level, α, which is usually 5%) we say that it would be veryunlikely to observe the data if the null hypothesis were true, and hence rejectH0.
‣ If the p-value is high (higher than α)we say that it is likely to observe the data even if the null hypothesis weretrue, and hence do not reject H0
interpreting the p-value
‣ If in fact college students have been in3 exclusive relationships on average, there is a 21% chance that a randomsample of 50 college students would yield a sample mean of 3.2 or higher. ‣ This is a pretty high probability, so we think that a sample meanof 3.2 or more exclusive relationships is likely to happen simply by chance.
making a decision
‣ Since p-value is high (higher than 5%)we fail to reject H0.
‣ The data do not provide convincingevidence that college students have been in more than 3 relationships onaverage.
‣ The difference between the null value of3 relationships and the observed sample mean of 3.2 relationships is due tochance or sampling variability
two-sided tests
‣ Often instead of looking for adivergence from the null in a specific direction,
we might be interested in divergence in anydirection.
‣ We call such hypothesis tests two-sided(or two-tailed).
‣ The definition of a p-value is the sameregardless of doing a one or twosided test, however the calculation is slightlydifferent since we need to consider “at least as extreme as the observedoutcome” in both directions.
七、inference for otherestimators
nearly normal sampling distributions
unbiased estimator
An important assumption about pointestimates is that they are unbiased, i.e. the sampling distribution of theestimate is centered at the true population parameter it estimates.
‣ That is, an unbiased estimate does notnaturally over or
underestimate the parameter, it provides a“good” estimate.
‣ The sample mean is an example of anunbiased point estimate, as
well as others we just listed.
confidence intervals for nearly normal point estimates
hypothesis testing for nearly normal point estimates
八、decision errors
hypothesis test as a trial
If we again think of a hypothesis test as acriminal trial then it makes sense to frame the verdict in terms of the nulland alternative hypotheses:
H0 : Defendant is innocent
HA : Defendant is guilty
type 1 error rate
‣ We reject H0 when the p-value is lessthan 0.05 (α = 0.05).
‣ This means that, for those cases whereH0 is actually true, we do not want to incorrectly reject it more than 5% ofthose times.
‣ In other words, when using a 5%significance level there is about 5% chance of making a Type 1 error if thenull hypothesis is true.
P(Type 1 error | H0 true) = α
‣ This is why we prefer small values of α– increasing α increases the Type 1 error rate
choosing α
type 2 error rate
If the alternative hypothesis is actuallytrue, what is the chance that we make a Type 2 Error, i.e. we fail to rejectthe null hypothesis even when we should reject it?
‣ The answer is not obvious.
‣ If the true population average is veryclose to the null value, it will be difficult to detect a difference (andreject H0).
‣ If the true population average is verydifferent from the null value, it will be easier to detect a difference.
‣ Clearly, β depends on the effect size(δ), difference between point estimate and null value.
九、significance vs.confidence level
agreement of CI and HT
‣ A two sided hypothesis with threshold ofα is equivalent to a confidence interval with CL = 1 − α.
‣ A one sided hypothesis with threshold ofα is equivalent to a confidence interval with CL = 1 − (2 x α).
‣ If H0 is rejected, a confidence intervalthat agrees with the result of the hypothesis test should not include the nullvalue.
‣ If H0 is failed to be rejected, aconfidence interval that agrees with the result of the hypothesis test should includethe null value.
十、statistical vs.practical significance
‣ Real differences between the pointestimate and null value are easier to detect with larger samples.
‣ However, very large samples will resultin statistical significance even for tiny differences between the sample meanand the null value (effect size), even when the difference is not practicallysignificant.