前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >DASI_1_IntroToData

DASI_1_IntroToData

作者头像
用户1147754
发布2019-05-27 08:27:47
3990
发布2019-05-27 08:27:47
举报
文章被收录于专栏:YoungGyYoungGy

anecdotal evidence: 用极端的个例去判断整体的信息。例如“我叔叔每天吸三根烟身体很棒”来验证“吸烟对人体没有危害”。

type of data: 对数据进行进一步处理前,先想一想数据是什么类型,qualitative(有顺序还是无顺序)还是quantitative(连续还是离散)。

Correlation does not imply causation

observation 能让我们得到correlation(高级方法也可以得到causation) experiment能让我们得到causation

studies分为observational和experiment。 observation通产只能得到assignment(correlation),而experiment可以得到causual。 举个例子:判断是否workOut对energyLevel高低的影响。 obs: 分别从是否workOut中选取一组人,比较各自的energyLevel,能得到相关关系。但是energyLevel不一定是由workOut与否引起的,可能有其他不可控的因素(被称为confounding var)。 exp:从population中做random assignmen,然后分别让两个随机组做workOut与否的测试,然后测量energyLevel。这点来说,与“控制变量法”相似。

sample bias - convenience sample: 只选取容易获得的sample - non-response:只选取了随机样本的一部分 - voluntary respoonse:结果的如何取决于投票者的志愿

sample methods - simple random sample(SRS): each case is equally likely to be selected. - stratified sample: divide the population into homogenous strata, then rondomly sample - clusters: divide the population clusters, randomly sample a few clusters, then sample all obs within these clusters - multistage: like clusters, while randomly sample within these clusters(例如调查一个城市的情况,分成各个区,避免了每个区都去的情况)

principles of experimental design 1. control: compare treatment of interset to a control group 2. randomize: randomly assign subjects to treatments 3. replicate: collect a suufficiently large sample, or replicate the entire study 4. block: block for variables known or suspected to affect the outcome

more on blocking design an experiment investigating whether energy gels help you run faster treatment: energy gel control: no energy gel block: energy gel might affect pro and amateur athletes differently block for pro status: 1. divide the sample to pro and amateur 2. randomly assign pro and amateur athletes to treatment and control groups 3. pro and amateur athletes are equally represented in both groups

experimental terminology 1. placebo: fake treatment, often used as the control goup for medical studies 2. placebo effect: showing change despite being on the placebo(they believe that treatment, the mental reason) 3. blinding: experimenal units don’t know which group ther’re in 4. double-blink: both the experimental units and the researchers don’t know the group assignment

random sampling and random assignment 1. random sampling: In observation, random sample in the population. 2. random assignment: In experiment, random assign treatment and control group. 3. random sampling happens first , then random assignment. 4. only a study using random sampling and random assignment can be causal and generalizable.

modality 1. unimodal 2. bimodal 3. uniform 4. multimodal

robust statistics center: median ; not mean spread: IQR; not SD,range skew statistics is good at describing skewed data with extreme obes.

transformation 1. (natural) log transformation: often applied when much of the data cluster near zero(relative to the larger values in the data set) and all observations are positive. For example, the right skewed data transforms to the log data. Then the data is less skewed and has less extreme. 2. square root 3. inverse

goals of transformations 1. see the data structure differently 2. reduce skew assist in modeling 3. straighten a nonlinear relationship in a scatterplot

本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2015年09月15日,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档