Duke@coursera 数据分析与统计推断 unit2 probability and distributions

统计学家

发布于 2019-04-10 17:09:01

5270

发布于 2019-04-10 17:09:01

文章被收录于专栏：机器学习与统计学机器学习与统计学

random process

In a random process we know what outcomes couldhappen, but we don't know which particular outcome will happen

law of large numbers

law of large numbers states that asmore observations are collected, the proportion of occurrences with aparticular outcome converges to the probability of that outcome.

disjoint events + general addition rule

‣ disjoint events

‣ the general addition rule

‣ sample space

‣ probability distributions

‣ complementary events

disjoint (mutually exclusive)

disjoint (mutually exclusive) events cannothappen at the same time.

‣ the outcome of a single coin toss cannotbe a head and a tail.

‣ a student can’t both fail and pass aclass.

‣ a single card drawn from a deck cannotbe an ace and a queen.non-disjoint events can happen at the sametime.

‣ a student can get an A in Stats and A inEcon in the same semester.

union of disjoint events

For disjoint events A and B, P(A or B) =P(A) + P(B)

union of non-disjoint events

For non-disjoint events A and B, P(A or B)= P(A) + P(B) - P(A and B)

General addition rule: P(A or B) = P(A) +P(B) - P(A and B)

sample space

a sample space is a collection of allpossible outcomes of a trial.

probability distributions

a probability distribution lists allpossible outcomes in the sample space, and the probabilities with which theyoccur rules

1. the events listed must be disjoint

2. each probability must be between 0 and 1

3. the probabilities must total 1

complementary events

complementary events are two mutuallyexclusive events whose probabilities add up to 1.

disjoint vs. complementary

Do the sum of probabilities of two disjointoutcomes always add up to 1?

Not necessarily, there may be more than 2outcomes in the sample space.

Do the sum of probabilities of twocomplementary outcomes always add up to 1?

Yes, that’s the definition of complementary

independence

‣ independent events

‣ assessing independence

‣ multiplication rule for independent events

two processes are independent if knowingthe outcome of one provides no useful information about the outcome of theother

Checking for independence: P(A | B) = P(A), then A and B are independent

Product rule for independent events: If Aand B are independent, P(A and B) =P(A) x P(B)

conditional probability

‣ marginal, joint, conditional probability

‣ Bayes’ theorem

‣ general product rule

independence and conditional probabilities

Generically, if P(A|B) = P(A) then theevents A and B are said to be independent

‣ Conceptually: Giving B doesn’t tell usanything about A

‣ Mathematically: If events A and B areindependent, P(A and B) = P(A) × P(B). Then,

probability trees

bayesian inference

posterior

‣ The probability we just calculated isalso called the posterior probability.

P(H1: good die on the Right | you rolled ≥4 with thedie on the Right)

‣ Posterior probability is generallydefined as P(hypothesis | data).

‣ It tells us the probability of ahypothesis we set forth, given the data we just observed.

‣ It depends on both the prior probabilitywe set and the observed data.

‣ This is different than what wecalculated at the end of the randomization test on gender discrimination – theprobability of observed or more extreme data given the null hypothesisbeing true, i.e. P(data | hypothesis),also called a p-value.

updating theprior

‣ In the Bayesian approach, we evaluateclaims iteratively as we collect more data.

‣ In the next iteration (roll) we get totake advantage of what we learned from the data.

‣ In other words, we update our prior withour posterior probability from the previous iteration.

recap

‣ Take advantage of prior information,like a previously published study or a physical model.

‣ Naturally integrate data as you collectit, and update your priors.

‣ Avoid the counter-intuitive definitionof a p-value:

P(observed or more extreme outcome | H0 istrue)

‣ Instead base decisions on the posteriorprobability:

P(hypothesis is true | observed data)

‣ A good prior helps, a bad prior hurts,but the prior matters less the more data you have.

‣ More advanced Bayesian techniques offerflexibility not present in

Frequentist models.

normal distribution

‣ normal distribution

‣ 68-95-99.7% rule

‣ standardized scores

‣ probabilities and percentiles

normal distribution

‣ unimodal and symmetric

‣ bell curve

‣ follows very strict guidelines about how variably the data are distributed around the mean

‣ many variables are nearly normal, but none are exactly normal

68 - 95 - 99.7% rule

standardizing with Z scores

‣ standardized (Z) score of an observation is the number of standard deviations it falls above or below the mean

‣ Z score of mean = 0

‣ unusual observation: |Z| > 2

‣ defined for distributions of any shape

percentiles

‣ when the distribution is normal, Zscores can be used to calculate percentiles

‣ percentile is the percentage of observationsthat fall below a given data point

‣graphically, percentile is the area below the probability distribution curve to the left of that observation.

normal probability plot

anatomy of a normal probability plot

‣ Data are plotted on the y-axis of a normalprobability plot, and theoretical quantiles (following a normal distribution)on the x-axis.

‣ If there is a one-to-one relationshipbetween the data and the theoretical quantiles, then the data follow a nearlynormal distribution.

‣ Since a one-to-one relationship wouldappear as a straight line on a scatter plot, the closer the points are to aperfect straight line, the more confident we can be that the data follow thenormal model.

‣ Constructing a normal probability plotrequires calculating percentiles and corresponding z-scores for eachobservation, which is tedious. Therefore we generally rely on software whenmaking these plots

binomial distribution

‣ definition, properties, conditions

‣ calculating probabilities

‣ mean and standard deviation

Bernouilli random variables

‣ each person in Milgram’s experiment canbe thought of as a trial

‣ a person is labeled a success if sherefuses to administer a severe shock, and failure if she administers such shock

‣ since only 35% of people refused toadminister a shock, probability of success is p = 0.35.

‣ when an individual trial has only twopossible outcomes, it is called a Bernoulli random variable

binomial distribution

the binomial distribution describes theprobability of having exactly k successes in n independent Bernouilli trialswith probability of success p

Binomial distribution:

If p represents probability of success,(1-p) represents probability of failure, n represents number of independenttrials, and k represents number of successes

binomial conditions

1. the trials must be independent

2. the number of trials, n, must be fixed

3. each trial outcome must be classified asa success or a failure

4. the probability of success, p, must bethe same for each trial

normal approximation to binomial

‣ shapes of binomial distributions

‣ normal approximation

Success-failure rule: A binomialdistribution with at least 10 expected successes and 10 expected failures closelyfollows a normal distribution.

np ≥ 10

n(1-p) ≥ 10

Normal approximation to the binomial: Ifthe successfailure condition holds,

Binomial(n,p) ~ Normal(μ,σ)

本文参与腾讯云自媒体分享计划，分享自微信公众号。

原始发表：2015-05-05，如有侵权请联系 cloudcommunity@tencent.com 删除

数据分析

本文分享自机器学习与统计学微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体分享计划，欢迎热爱写作的你一起参与！

数据分析

登录后参与评论

0 条评论

热度

Duke@coursera 数据分析与统计推断 unit2 probability and distributions

Duke@coursera 数据分析与统计推断 unit2 probability and distributions

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐