In a random process we know what outcomes couldhappen, but we don't know which particular outcome will happen
law of large numbers
law of large numbers states that asmore observations are collected, the proportion of occurrences with aparticular outcome converges to the probability of that outcome.
disjoint events + general addition rule
‣ disjoint events
‣ the general addition rule
‣ sample space
‣ probability distributions
‣ complementary events
disjoint (mutually exclusive)
disjoint (mutually exclusive) events cannothappen at the same time.
‣ the outcome of a single coin toss cannotbe a head and a tail.
‣ a student can’t both fail and pass aclass.
‣ a single card drawn from a deck cannotbe an ace and a queen.non-disjoint events can happen at the sametime.
‣ a student can get an A in Stats and A inEcon in the same semester.
union of disjoint events
For disjoint events A and B, P(A or B) =P(A) + P(B)
union of non-disjoint events
For non-disjoint events A and B, P(A or B)= P(A) + P(B) - P(A and B)
General addition rule: P(A or B) = P(A) +P(B) - P(A and B)
a sample space is a collection of allpossible outcomes of a trial.
a probability distribution lists allpossible outcomes in the sample space, and the probabilities with which theyoccur rules
1. the events listed must be disjoint
2. each probability must be between 0 and 1
3. the probabilities must total 1
complementary events are two mutuallyexclusive events whose probabilities add up to 1.
disjoint vs. complementary
Do the sum of probabilities of two disjointoutcomes always add up to 1?
Not necessarily, there may be more than 2outcomes in the sample space.
Do the sum of probabilities of twocomplementary outcomes always add up to 1?
Yes, that’s the definition of complementary
‣ independent events
‣ assessing independence
‣ multiplication rule for independent events
two processes are independent if knowingthe outcome of one provides no useful information about the outcome of theother
Checking for independence: P(A | B) = P(A), then A and B are independent
Product rule for independent events: If Aand B are independent, P(A and B) =P(A) x P(B)
‣ marginal, joint, conditional probability
‣ Bayes’ theorem
‣ general product rule
independence and conditional probabilities
Generically, if P(A|B) = P(A) then theevents A and B are said to be independent
‣ Conceptually: Giving B doesn’t tell usanything about A
‣ Mathematically: If events A and B areindependent, P(A and B) = P(A) × P(B). Then,
‣ The probability we just calculated isalso called the posterior probability.
P(H1: good die on the Right | you rolled ≥4 with thedie on the Right)
‣ Posterior probability is generallydefined as P(hypothesis | data).
‣ It tells us the probability of ahypothesis we set forth, given the data we just observed.
‣ It depends on both the prior probabilitywe set and the observed data.
‣ This is different than what wecalculated at the end of the randomization test on gender discrimination – theprobability of observed or more extreme data given the null hypothesisbeing true, i.e. P(data | hypothesis),also called a p-value.
‣ In the Bayesian approach, we evaluateclaims iteratively as we collect more data.
‣ In the next iteration (roll) we get totake advantage of what we learned from the data.
‣ In other words, we update our prior withour posterior probability from the previous iteration.
‣ Take advantage of prior information,like a previously published study or a physical model.
‣ Naturally integrate data as you collectit, and update your priors.
‣ Avoid the counter-intuitive definitionof a p-value:
P(observed or more extreme outcome | H0 istrue)
‣ Instead base decisions on the posteriorprobability:
P(hypothesis is true | observed data)
‣ A good prior helps, a bad prior hurts,but the prior matters less the more data you have.
‣ More advanced Bayesian techniques offerflexibility not present in
‣ normal distribution
‣ 68-95-99.7% rule
‣ standardized scores
‣ probabilities and percentiles
‣ unimodal and symmetric
‣ bell curve
‣ follows very strict guidelines about how variably the data are distributed around the mean
‣ many variables are nearly normal, but none are exactly normal
68 - 95 - 99.7% rule
standardizing with Z scores
‣ standardized (Z) score of an observation is the number of standard deviations it falls above or below the mean
‣ Z score of mean = 0
‣ unusual observation: |Z| > 2
‣ defined for distributions of any shape
‣ when the distribution is normal, Zscores can be used to calculate percentiles
‣ percentile is the percentage of observationsthat fall below a given data point
‣graphically, percentile is the area below the probability distribution curve to the left of that observation.
normal probability plot
anatomy of a normal probability plot
‣ Data are plotted on the y-axis of a normalprobability plot, and theoretical quantiles (following a normal distribution)on the x-axis.
‣ If there is a one-to-one relationshipbetween the data and the theoretical quantiles, then the data follow a nearlynormal distribution.
‣ Since a one-to-one relationship wouldappear as a straight line on a scatter plot, the closer the points are to aperfect straight line, the more confident we can be that the data follow thenormal model.
‣ Constructing a normal probability plotrequires calculating percentiles and corresponding z-scores for eachobservation, which is tedious. Therefore we generally rely on software whenmaking these plots
‣ definition, properties, conditions
‣ calculating probabilities
‣ mean and standard deviation
Bernouilli random variables
‣ each person in Milgram’s experiment canbe thought of as a trial
‣ a person is labeled a success if sherefuses to administer a severe shock, and failure if she administers such shock
‣ since only 35% of people refused toadminister a shock, probability of success is p = 0.35.
‣ when an individual trial has only twopossible outcomes, it is called a Bernoulli random variable
the binomial distribution describes theprobability of having exactly k successes in n independent Bernouilli trialswith probability of success p
If p represents probability of success,(1-p) represents probability of failure, n represents number of independenttrials, and k represents number of successes
1. the trials must be independent
2. the number of trials, n, must be fixed
3. each trial outcome must be classified asa success or a failure
4. the probability of success, p, must bethe same for each trial
normal approximation to binomial
‣ shapes of binomial distributions
‣ normal approximation
Success-failure rule: A binomialdistribution with at least 10 expected successes and 10 expected failures closelyfollows a normal distribution.
np ≥ 10
n(1-p) ≥ 10
Normal approximation to the binomial: Ifthe successfailure condition holds,
Binomial(n,p) ~ Normal(μ,σ)
本文分享自微信公众号 - 机器学习与统计学（tjxj666）
原文出处及转载信息见文内详细说明，如有侵权，请联系 firstname.lastname@example.org 删除。
roughly the average deviation around themean, and has the same units as the data
‣ observations, variables, and datamatrices
A plausible range of values for thepopulation parameter is called a confidence i...
CVer 有几天没更新论文速递了，主要是这段时间的论文太多，而且质量较高的论文也不少，所以为了方便大家阅读，我已经将其中的目标检测（Object Detecti...
【导读】专知内容组整理了最近人工智能领域相关期刊的5篇最新综述文章，为大家进行介绍，欢迎查看! 1. The Unreasonable Effectivenes...
雷锋网 AI 科技评论按：上一篇文章中我们简单介绍了 OpenAI 的新语言模型 GPT-2，它虽然没有什么技术上的突破，但是依靠超大的参数规模和训练数据，无监...