读书会是一种在于拓展视野、宏观思维、知识交流、提升生活的活动。PPV课R语言读书会以“学习、分享、进步”为宗旨,通过成员协作完成R语言专业书籍的精读和分享,达到学习和研究R语言的目的。读书会由辅导老师或者读书会成员推荐书籍,经过讨论确定要读的书,每个月读一本书且要精读,大家一起分享。
第六章 基本图形
本章概要
1 条形、盒形和点图
2 饼状和扇形图
3 直方图和核密度曲线图
本章所介绍内容概括如下。
数据可视化能够很好地理解数据。R提供了非常丰富的画图函数,通过图形可有助于理解分类变量和连续变量。
1 可视化变量分布
2 结果分组比较
条形图(Bar plot)
条形图通过垂直条或者水平条展示变量频次分布,形式如下。
barplot()。
举例说明如下。
数据源:使用vcd包里面的Arthritis数据集。
Arthritis数据集描述如下。
Data from Koch \& Edwards (1988) from a double-blind clinical trial investigating a new treatment for rheumatoid arthritis.
Arthritis
ID Treatment Sex Age Improved
1 57 Treated Male 27 Some
2 46 Treated Male 29 None
3 77 Treated Male 30 None
4 17 Treated Male 32 Marked
5 36 Treated Male 46 Marked
6 23 Treated Male 58 Marked
7 75 Treated Male 59 None
8 39 Treated Male 59 Marked
9 33 Treated Male 63 None
10 55 Treated Male 63 None
11 30 Treated Male 64 None
12 5 Treated Male 64 Some
13 63 Treated Male 69 None
14 83 Treated Male 70 Marked
15 66 Treated Female 23 None
16 40 Treated Female 32 None
17 6 Treated Female 37 Some
18 7 Treated Female 41 None
19 72 Treated Female 41 Marked
20 37 Treated Female 48 None
21 82 Treated Female 48 Marked
22 53 Treated Female 55 Marked
23 79 Treated Female 55 Marked
24 26 Treated Female 56 Marked
25 28 Treated Female 57 Marked
26 60 Treated Female 57 Marked
27 22 Treated Female 57 Marked
28 27 Treated Female 58 None
29 2 Treated Female 59 Marked
30 59 Treated Female 59 Marked
31 62 Treated Female 60 Marked
32 84 Treated Female 61 Marked
33 64 Treated Female 62 Some
34 34 Treated Female 62 Marked
35 58 Treated Female 66 Marked
36 13 Treated Female 67 Marked
37 61 Treated Female 68 Some
38 65 Treated Female 68 Marked
39 11 Treated Female 69 None
40 56 Treated Female 69 Some
41 43 Treated Female 70 Some
42 9 Placebo Male 37 None
43 14 Placebo Male 44 None
44 73 Placebo Male 50 None
45 74 Placebo Male 51 None
46 25 Placebo Male 52 None
47 18 Placebo Male 53 None
48 21 Placebo Male 59 None
49 52 Placebo Male 59 None
50 45 Placebo Male 62 None
51 41 Placebo Male 62 None
52 8 Placebo Male 63 Marked
53 80 Placebo Female 23 None
54 12 Placebo Female 30 None
55 29 Placebo Female 30 None
56 50 Placebo Female 31 Some
57 38 Placebo Female 32 None
58 35 Placebo Female 33 Marked
59 51 Placebo Female 37 None
60 54 Placebo Female 44 None
61 76 Placebo Female 45 None
62 16 Placebo Female 46 None
63 69 Placebo Female 48 None
64 31 Placebo Female 49 None
65 20 Placebo Female 51 None
66 68 Placebo Female 53 None
67 81 Placebo Female 54 None
68 4 Placebo Female 54 None
69 78 Placebo Female 54 Marked
70 70 Placebo Female 55 Marked
71 49 Placebo Female 57 None
72 10 Placebo Female 57 Some
73 47 Placebo Female 58 Some
74 44 Placebo Female 59 Some
75 24 Placebo Female 59 Marked
76 48 Placebo Female 61 None
77 19 Placebo Female 63 Some
78 3 Placebo Female 64 None
79 67 Placebo Female 65 Marked
80 32 Placebo Female 66 None
81 42 Placebo Female 66 None
82 15 Placebo Female 66 Some
83 71 Placebo Female 68 Some
84 1 Placebo Female 74 Marked
> rm(list=ls())
> counts <- table(Arthritis$Improved)
> counts
None Some Marked
42 14 28
> par(mfrow=c(1,2))
> barplot(counts, main=”Simple Bar Plot”, xlab=”Improvement”, ylab=”Frequency”)
> barplot(counts, main=”Horizontal Bar Plot”, xlab=”Frequency”, ylab=”Improvement”, horiz=TRUE)
效果图如图1所示。
图1:简单的垂直和水平条形图。
注意:若是分类变量属于因子类型,没必要使用table()函数转换,直接使用barplot()函数绘图。
堆形或者分组条状图。
举例说明如下。
> rm(list=ls())
> library(vcd)
Loading required package: grid
> counts <- table(Arthritis$Improved, Arthritis$Treatment)
> counts
Placebo Treated
None 29 13
Some 7 7
Marked 7 21
> barplot(counts, main=”Stacked Bar Plot”,
+ xlab=”Treatment”, ylab=”Frequency”,
+ col=c(“red”, “yellow”, “green”),
+ legend=rownames(counts))
> barplot(counts, main=”Stacked Bar Plot”,
+ xlab=”Treatment”, ylab=”Frequency”,
+ col=c(“red”, “yellow”, “green”),
+ legend=rownames(counts), beside=TRUE)
效果图如图2所示。
图2:堆形和组间条状图
均值条状图,即每个条状表示均值指标。
举例说明如下。
> rm(list=ls())
> states <- data.frame(state.region,state.x77)
> means <- aggregate(states$Illiteracy, by=list(state.region),FUN=mean)
> means
Group.1 x
1 Northeast 1.000000
2 South 1.737500
3 North Central 0.700000
4 West 1.023077
> means <- means[order(means$x),]
> means
Group.1 x
3 North Central 0.700000
1 Northeast 1.000000
4 West 1.023077
2 South 1.737500
> barplot(means$x, names.agr=means$Group.1)
> title(“Mean Illiteracy Rate”)
拓展:包gplots中barplot2()函数,增强型线状条http://addictedtor.free.fr/graphiques
饼图(Pie charts)
使用函数pie(),形式如下。
pie(x, labels)
举例说明如下:
> rm(list=ls())
> slices <- c(10, 12, 4, 16, 8)
> lbls <- c(“US”, “UK”, “Australia”, “Germany”, “France”)
> pie(slices, labels = lbls, main=”Simple Pie Chart”)
效果图如图3所示。
图3:饼形图
拓展:包plotrix的fan.plot()函数。
直方图
直方图可以展示不同分组的频次,形式如下。
hist(x)
举例说明如下。
> rm(list=ls())
> par(mfrow=c(1,2))
> hist(mtcars$mpg)
> hist(mtcars$mpg,
+ freq=FALSE,
+ breaks=12,
+ col=”red”,
+ xlab=”Miles Per Gallon”,
+ main=”Histogram, rug plot, density curve”)
> rug(jitter(mtcars$mpg))
> lines(density(mtcars$mpg), col=”blue”, lwd=2)
效果图如图4所示。
图4:直方图
核密度曲线图
它能够有效地反映连续变量的分布情况。形式如下。
plot(density(x))
举例说明如下。
> rm(list=ls())
> d <- density(mtcars$mpg)
> plot(d)
> plot(d, main=”Kernel Density of Mile Per Gallon”)
> polygon(d, col=”red”, border=”blue”)
> rug(mtcars$mpg, col=”brown”)
效果图如图5所示。
图5:核密度曲线图
拓展:包sm的sm.density.compare()函数。
盒形图
盒形图通过五个参数信息描述连续变量的分布特性。这五个参数分别是最大值、最小值、中位数、1/4分位数和3/4分位数。使用boxplot()函数。
举例说明如下。
> rm(list=ls())
> boxplot(mtcars$mpg, main=”Box plot”, ylab=”Miles per Gallon”)
效果图以及图形信息解释如图6所示。
图6:盒形图
拓展:包vioplot中的vioplot()函数。
点图
点图提供一种显示标签值的方法,形式如下。
dotchart(x, labels=)
举例说明如下。
> dotchart(mtcars$mpg, labels=row.names(mtcars),
+ cex=.7,
+ main=”Gas Mileage for Car Model”,
+ xlab=”Mile Per Gallon”)
效果图如图7所示。
图7:点图
总结
1 数据可视化技术
2 R中几种常用的图形绘制(条状图、饼图、扇形图、直方图、核密度曲线图、盒形图和点图等)
Resource
1 http://www.wangluqing.com/2014/06/r-in-action-note8/
2 《R in action》第二部分第六章内容
本栏目文章由PPV课R语言读书会提供,转载请注明来自PPV课R语言读书会。
版权所有,违者必究!