专栏首页机器学习与统计学Duke@coursera 数据分析与统计推断 unit2 probability and distributions

Duke@coursera 数据分析与统计推断 unit2 probability and distributions

random process

In a random process we know what outcomes couldhappen, but we don't know which particular outcome will happen

law of large numbers

law of large numbers states that asmore observations are collected, the proportion of occurrences with aparticular outcome converges to the probability of that outcome.

disjoint events + general addition rule

‣ disjoint events

‣ the general addition rule

‣ sample space

‣ probability distributions

‣ complementary events

disjoint (mutually exclusive)

disjoint (mutually exclusive) events cannothappen at the same time.

‣ the outcome of a single coin toss cannotbe a head and a tail.

‣ a student can’t both fail and pass aclass.

‣ a single card drawn from a deck cannotbe an ace and a queen.non-disjoint events can happen at the sametime.

‣ a student can get an A in Stats and A inEcon in the same semester.

union of disjoint events

For disjoint events A and B, P(A or B) =P(A) + P(B)

union of non-disjoint events

For non-disjoint events A and B, P(A or B)= P(A) + P(B) - P(A and B)

General addition rule: P(A or B) = P(A) +P(B) - P(A and B)

sample space

a sample space is a collection of allpossible outcomes of a trial.

probability distributions

a probability distribution lists allpossible outcomes in the sample space, and the probabilities with which theyoccur rules

1. the events listed must be disjoint

2. each probability must be between 0 and 1

3. the probabilities must total 1

complementary events

complementary events are two mutuallyexclusive events whose probabilities add up to 1.

disjoint vs. complementary

Do the sum of probabilities of two disjointoutcomes always add up to 1?

Not necessarily, there may be more than 2outcomes in the sample space.

Do the sum of probabilities of twocomplementary outcomes always add up to 1?

Yes, that’s the definition of complementary

independence

‣ independent events

‣ assessing independence

‣ multiplication rule for independent events

two processes are independent if knowingthe outcome of one provides no useful information about the outcome of theother

Checking for independence: P(A | B) = P(A), then A and B are independent

Product rule for independent events: If Aand B are independent, P(A and B) =P(A) x P(B)

conditional probability

‣ marginal, joint, conditional probability

‣ Bayes’ theorem

‣ general product rule

independence and conditional probabilities

Generically, if P(A|B) = P(A) then theevents A and B are said to be independent

‣ Conceptually: Giving B doesn’t tell usanything about A

‣ Mathematically: If events A and B areindependent, P(A and B) = P(A) × P(B). Then,

probability trees

bayesian inference

posterior

‣ The probability we just calculated isalso called the posterior probability.

P(H1: good die on the Right | you rolled ≥4 with thedie on the Right)

‣ Posterior probability is generallydefined as P(hypothesis | data).

‣ It tells us the probability of ahypothesis we set forth, given the data we just observed.

‣ It depends on both the prior probabilitywe set and the observed data.

‣ This is different than what wecalculated at the end of the randomization test on gender discrimination – theprobability of observed or more extreme data given the null hypothesisbeing true, i.e. P(data | hypothesis),also called a p-value.

updating theprior

‣ In the Bayesian approach, we evaluateclaims iteratively as we collect more data.

‣ In the next iteration (roll) we get totake advantage of what we learned from the data.

‣ In other words, we update our prior withour posterior probability from the previous iteration.

recap

‣ Take advantage of prior information,like a previously published study or a physical model.

‣ Naturally integrate data as you collectit, and update your priors.

‣ Avoid the counter-intuitive definitionof a p-value:

P(observed or more extreme outcome | H0 istrue)

‣ Instead base decisions on the posteriorprobability:

P(hypothesis is true | observed data)

‣ A good prior helps, a bad prior hurts,but the prior matters less the more data you have.

‣ More advanced Bayesian techniques offerflexibility not present in

Frequentist models.

normal distribution

‣ normal distribution

‣ 68-95-99.7% rule

‣ standardized scores

‣ probabilities and percentiles

normal distribution

‣ unimodal and symmetric

‣ bell curve

‣ follows very strict guidelines about how variably the data are distributed around the mean

‣ many variables are nearly normal, but none are exactly normal

68 - 95 - 99.7% rule

standardizing with Z scores

‣ standardized (Z) score of an observation is the number of standard deviations it falls above or below the mean

‣ Z score of mean = 0

‣ unusual observation: |Z| > 2

‣ defined for distributions of any shape

percentiles

‣ when the distribution is normal, Zscores can be used to calculate percentiles

‣ percentile is the percentage of observationsthat fall below a given data point

‣graphically, percentile is the area below the probability distribution curve to the left of that observation.

normal probability plot

anatomy of a normal probability plot

‣ Data are plotted on the y-axis of a normalprobability plot, and theoretical quantiles (following a normal distribution)on the x-axis.

‣ If there is a one-to-one relationshipbetween the data and the theoretical quantiles, then the data follow a nearlynormal distribution.

‣ Since a one-to-one relationship wouldappear as a straight line on a scatter plot, the closer the points are to aperfect straight line, the more confident we can be that the data follow thenormal model.

‣ Constructing a normal probability plotrequires calculating percentiles and corresponding z-scores for eachobservation, which is tedious. Therefore we generally rely on software whenmaking these plots

binomial distribution

‣ definition, properties, conditions

‣ calculating probabilities

‣ mean and standard deviation

Bernouilli random variables

‣ each person in Milgram’s experiment canbe thought of as a trial

‣ a person is labeled a success if sherefuses to administer a severe shock, and failure if she administers such shock

‣ since only 35% of people refused toadminister a shock, probability of success is p = 0.35.

‣ when an individual trial has only twopossible outcomes, it is called a Bernoulli random variable

binomial distribution

the binomial distribution describes theprobability of having exactly k successes in n independent Bernouilli trialswith probability of success p

Binomial distribution:

If p represents probability of success,(1-p) represents probability of failure, n represents number of independenttrials, and k represents number of successes

binomial conditions

1. the trials must be independent

2. the number of trials, n, must be fixed

3. each trial outcome must be classified asa success or a failure

4. the probability of success, p, must bethe same for each trial

normal approximation to binomial

‣ shapes of binomial distributions

‣ normal approximation

Success-failure rule: A binomialdistribution with at least 10 expected successes and 10 expected failures closelyfollows a normal distribution.

np ≥ 10

n(1-p) ≥ 10

Normal approximation to the binomial: Ifthe successfailure condition holds,

Binomial(n,p) ~ Normal(μ,σ)

本文分享自微信公众号 - 机器学习与统计学(tjxj666)

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2015-05-05

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • Duke@coursera 数据分析与统计推断 unit1 part2 introduction to data

    roughly the average deviation around themean, and has the same units as the data

    统计学家
  • Duke@coursera 数据分析与统计推断 unit1 part1 introduction to data

    ‣ observations, variables, and datamatrices

    统计学家
  • Duke@coursera 数据分析与统计推断 unit3 foundations for inference

    A plausible range of values for thepopulation parameter is called a confidence i...

    统计学家
  • 一文看尽21篇目标检测最新论文(腾讯/Google/商汤/旷视/清华/浙大/CMU/华科/中科院等)

    CVer 有几天没更新论文速递了,主要是这段时间的论文太多,而且质量较高的论文也不少,所以为了方便大家阅读,我已经将其中的目标检测(Object Detecti...

    Amusi
  • BART原理简介与代码实战

    最近huggingface的transformer库,增加了BART模型,Bart是该库中最早的Seq2Seq模型之一,在文本生成任务,例如摘要抽取方面达到了S...

    kaiyuan
  • 【论文推荐】最新5篇深度学习相关论文推介——感知度量、图像检索、联合视盘和视杯分割、谱聚类、MPI并行

    【导读】专知内容组整理了最近人工智能领域相关期刊的5篇最新综述文章,为大家进行介绍,欢迎查看! 1. The Unreasonable Effectivenes...

    WZEARW
  • 高维数据的一致且灵活的选择性估计(cs.LG)

    选择性估计旨在估计满足选择标准的数据库对象的数量。准确高效地回答这个问题对于密度估计、异常值检测、查询优化和数据集成等应用至关重要。估计问题对于大规模高维数据尤...

    Donuts_choco
  • 业界 | 论 AI 如何一本正经地胡说八道

    雷锋网 AI 科技评论按:上一篇文章中我们简单介绍了 OpenAI 的新语言模型 GPT-2,它虽然没有什么技术上的突破,但是依靠超大的参数规模和训练数据,无监...

    AI研习社
  • CVPR2019 | 10篇论文速递(涵盖全景分割、实例分割和姿态估计等方向)

    【导读】CVPR 2019 接收论文列表已经出来了,但只是一些索引号,所以并没有完整的论文合集。CVer 最近也在整理收集,今天一文涵盖10篇 CVPR 201...

    Amusi
  • 【论文推荐】最新七篇行人再识别相关论文—深度排序、风格自适应、对抗、重排序、多层次相似性、深度空间特征重构、图对应迁移

    【导读】既昨天推出六篇行人再识别文章,专知内容组今天又推出最近七篇行人再识别(Person Re-Identification)相关文章,为大家进行介绍,欢迎查...

    WZEARW

扫码关注云+社区

领取腾讯云代金券