前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >【学习】《R实战》读书笔记(第六章)

【学习】《R实战》读书笔记(第六章)

作者头像
小莹莹
发布2018-04-19 15:23:52
5850
发布2018-04-19 15:23:52
举报

读书会是一种在于拓展视野、宏观思维、知识交流、提升生活的活动。PPV课R语言读书会以“学习、分享、进步”为宗旨,通过成员协作完成R语言专业书籍的精读和分享,达到学习和研究R语言的目的。读书会由辅导老师或者读书会成员推荐书籍,经过讨论确定要读的书,每个月读一本书且要精读,大家一起分享。

第六章 基本图形

本章概要

1 条形、盒形和点图

2 饼状和扇形图

3 直方图和核密度曲线图

本章所介绍内容概括如下。

数据可视化能够很好地理解数据。R提供了非常丰富的画图函数,通过图形可有助于理解分类变量和连续变量。

1 可视化变量分布

2 结果分组比较

条形图(Bar plot)

条形图通过垂直条或者水平条展示变量频次分布,形式如下。

barplot()。

举例说明如下。

数据源:使用vcd包里面的Arthritis数据集。

Arthritis数据集描述如下。

代码语言:javascript
复制
Data from Koch \& Edwards (1988) from a double-blind clinical trial investigating a new treatment for rheumatoid arthritis.
Arthritis
ID Treatment    Sex Age Improved
1  57   Treated   Male  27     Some
2  46   Treated   Male  29     None
3  77   Treated   Male  30     None
4  17   Treated   Male  32   Marked
5  36   Treated   Male  46   Marked
6  23   Treated   Male  58   Marked
7  75   Treated   Male  59     None
8  39   Treated   Male  59   Marked
9  33   Treated   Male  63     None
10 55   Treated   Male  63     None
11 30   Treated   Male  64     None
12  5   Treated   Male  64     Some
13 63   Treated   Male  69     None
14 83   Treated   Male  70   Marked
15 66   Treated Female  23     None
16 40   Treated Female  32     None
17  6   Treated Female  37     Some
18  7   Treated Female  41     None
19 72   Treated Female  41   Marked
20 37   Treated Female  48     None
21 82   Treated Female  48   Marked
22 53   Treated Female  55   Marked
23 79   Treated Female  55   Marked
24 26   Treated Female  56   Marked
25 28   Treated Female  57   Marked
26 60   Treated Female  57   Marked
27 22   Treated Female  57   Marked
28 27   Treated Female  58     None
29  2   Treated Female  59   Marked
30 59   Treated Female  59   Marked
31 62   Treated Female  60   Marked
32 84   Treated Female  61   Marked
33 64   Treated Female  62     Some
34 34   Treated Female  62   Marked
35 58   Treated Female  66   Marked
36 13   Treated Female  67   Marked
37 61   Treated Female  68     Some
38 65   Treated Female  68   Marked
39 11   Treated Female  69     None
40 56   Treated Female  69     Some
41 43   Treated Female  70     Some
42  9   Placebo   Male  37     None
43 14   Placebo   Male  44     None
44 73   Placebo   Male  50     None
45 74   Placebo   Male  51     None
46 25   Placebo   Male  52     None
47 18   Placebo   Male  53     None
48 21   Placebo   Male  59     None
49 52   Placebo   Male  59     None
50 45   Placebo   Male  62     None
51 41   Placebo   Male  62     None
52  8   Placebo   Male  63   Marked
53 80   Placebo Female  23     None
54 12   Placebo Female  30     None
55 29   Placebo Female  30     None
56 50   Placebo Female  31     Some
57 38   Placebo Female  32     None
58 35   Placebo Female  33   Marked
59 51   Placebo Female  37     None
60 54   Placebo Female  44     None
61 76   Placebo Female  45     None
62 16   Placebo Female  46     None
63 69   Placebo Female  48     None
64 31   Placebo Female  49     None
65 20   Placebo Female  51     None
66 68   Placebo Female  53     None
67 81   Placebo Female  54     None
68  4   Placebo Female  54     None
69 78   Placebo Female  54   Marked
70 70   Placebo Female  55   Marked
71 49   Placebo Female  57     None
72 10   Placebo Female  57     Some
73 47   Placebo Female  58     Some
74 44   Placebo Female  59     Some
75 24   Placebo Female  59   Marked
76 48   Placebo Female  61     None
77 19   Placebo Female  63     Some
78  3   Placebo Female  64     None
79 67   Placebo Female  65   Marked
80 32   Placebo Female  66     None
81 42   Placebo Female  66     None
82 15   Placebo Female  66     Some
83 71   Placebo Female  68     Some
84  1   Placebo Female  74   Marked
> rm(list=ls())
> counts <- table(Arthritis$Improved)
> counts
None   Some Marked
42     14     28
> par(mfrow=c(1,2))
> barplot(counts, main=”Simple Bar Plot”, xlab=”Improvement”, ylab=”Frequency”)
> barplot(counts, main=”Horizontal Bar Plot”, xlab=”Frequency”, ylab=”Improvement”, horiz=TRUE)

效果图如图1所示。

图1:简单的垂直和水平条形图。

注意:若是分类变量属于因子类型,没必要使用table()函数转换,直接使用barplot()函数绘图。

堆形或者分组条状图。

举例说明如下。

代码语言:javascript
复制
> rm(list=ls())
> library(vcd)
Loading required package: grid
> counts <- table(Arthritis$Improved, Arthritis$Treatment)
> counts
Placebo Treated
None        29      13
Some         7       7
Marked       7      21
> barplot(counts, main=”Stacked Bar Plot”,
+ xlab=”Treatment”, ylab=”Frequency”,
+ col=c(“red”, “yellow”, “green”),
+ legend=rownames(counts))
> barplot(counts, main=”Stacked Bar Plot”,
+ xlab=”Treatment”, ylab=”Frequency”,
+ col=c(“red”, “yellow”, “green”),
+ legend=rownames(counts), beside=TRUE)

效果图如图2所示。

图2:堆形和组间条状图

均值条状图,即每个条状表示均值指标。

举例说明如下。

代码语言:javascript
复制
> rm(list=ls())
> states <- data.frame(state.region,state.x77)
> means <- aggregate(states$Illiteracy, by=list(state.region),FUN=mean)
> means
Group.1        x
1     Northeast 1.000000
2         South 1.737500
3 North Central 0.700000
4          West 1.023077
> means <- means[order(means$x),]
> means
Group.1        x
3 North Central 0.700000
1     Northeast 1.000000
4          West 1.023077
2         South 1.737500
> barplot(means$x, names.agr=means$Group.1)
> title(“Mean Illiteracy Rate”)

拓展:包gplots中barplot2()函数,增强型线状条http://addictedtor.free.fr/graphiques

饼图(Pie charts)

使用函数pie(),形式如下。

pie(x, labels)

举例说明如下:

代码语言:javascript
复制
> rm(list=ls())
> slices <- c(10, 12, 4, 16, 8)
> lbls <- c(“US”, “UK”, “Australia”, “Germany”, “France”)
> pie(slices, labels = lbls, main=”Simple Pie Chart”)

效果图如图3所示。

图3:饼形图

拓展:包plotrix的fan.plot()函数。

直方图

直方图可以展示不同分组的频次,形式如下。

hist(x)

举例说明如下。

代码语言:javascript
复制
> rm(list=ls())
> par(mfrow=c(1,2))
> hist(mtcars$mpg)
> hist(mtcars$mpg,
+ freq=FALSE,
+ breaks=12,
+ col=”red”,
+ xlab=”Miles Per Gallon”,
+ main=”Histogram, rug plot, density curve”)
> rug(jitter(mtcars$mpg))
> lines(density(mtcars$mpg), col=”blue”, lwd=2)

效果图如图4所示。

图4:直方图

核密度曲线图

它能够有效地反映连续变量的分布情况。形式如下。

plot(density(x))

举例说明如下。

代码语言:javascript
复制
> rm(list=ls())
> d <- density(mtcars$mpg)
> plot(d)
> plot(d, main=”Kernel Density of Mile Per Gallon”)
> polygon(d, col=”red”, border=”blue”)
> rug(mtcars$mpg, col=”brown”)

效果图如图5所示。

图5:核密度曲线图

拓展:包sm的sm.density.compare()函数。

盒形图

盒形图通过五个参数信息描述连续变量的分布特性。这五个参数分别是最大值、最小值、中位数、1/4分位数和3/4分位数。使用boxplot()函数。

举例说明如下。

代码语言:javascript
复制
> rm(list=ls())
> boxplot(mtcars$mpg, main=”Box plot”, ylab=”Miles per Gallon”)

效果图以及图形信息解释如图6所示。

图6:盒形图

拓展:包vioplot中的vioplot()函数。

点图

点图提供一种显示标签值的方法,形式如下。

dotchart(x, labels=)

举例说明如下。

代码语言:javascript
复制
> dotchart(mtcars$mpg, labels=row.names(mtcars),
+ cex=.7,
+ main=”Gas Mileage for Car Model”,
+ xlab=”Mile Per Gallon”)

效果图如图7所示。

图7:点图

总结

1 数据可视化技术

2 R中几种常用的图形绘制(条状图、饼图、扇形图、直方图、核密度曲线图、盒形图和点图等)

Resource

1 http://www.wangluqing.com/2014/06/r-in-action-note8/

2 《R in action》第二部分第六章内容

本栏目文章由PPV课R语言读书会提供,转载请注明来自PPV课R语言读书会。

版权所有,违者必究!

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2014-07-31,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 PPV课数据科学社区 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档