数据处理神器tidyverse（2）ggplot2

用户1359560

发布于 2019-08-29 10:19:41

2.9K0

数据处理神器tidyverseggplot2

tidyverse包其中包含着一个重要的可视化包---ggplot2。 Ggplot2是由Hadley Wickham制作的数据可视化软件包，它基于一组称为图层的原则。基本思想是ggplot2将数据的几何对象（圆圈，线条等），主题和比例放在上面。几何对象的形式由geom_xxx（）函数定义，基于数据变量的几何对象的属性（位置，大小，颜色）由美学（aes（））函数指定（在geom_xxx（）函数中）。任何ggplot图的基础层都是由ggplot（）函数定义的空ggplot层，它描述了用于绘图的数据框。

ggplot(gapminder)

gapminder %>% 
  filter(year == 2007) %>%
  ggplot()

这样输出的是空白图片

添加geom图层

接下来，我将向ggplot对象添加一个“geom”图层。使用+将图层添加到ggplot对象中。可能最常见的geom层是geom_point。在geom_point（）里面，您将指定从变量到所需几何对象的美学映射。例如，如果你想在x轴上绘制带有gdpPercap的散点图，在y轴上绘制lifeExp，那么你可以添加一个带有相关美学函数的geom_point（）几何图层：

# describe the base ggplot object and tell it what data we are interested in along with the aesthetic mapping
gapminder %>%
  filter(year == 2007) %>%
  ggplot() +
  # add a points layer on top
  geom_point(aes(x = gdpPercap, y = lifeExp))

我们还可以使用geom_smooth（）在点上添加平滑的趋势线图层。

# describe the base ggplot object and tell it what data we are interested in along with the aesthetic mapping
gapminder %>%
  filter(year == 2007) %>%
  ggplot() +
  # add a points layer on top
  geom_point(aes(x = gdpPercap, y = lifeExp)) +
  # add a smoothed LOESS layer
  geom_smooth(aes(x = gdpPercap, y = lifeExp), method = "loess")

# describe the base ggplot object and tell it what data we are interested in along with the aesthetic mapping
gapminder %>%
  filter(year == 2007) %>%
  # specify global aesthetic mappings
  ggplot(aes(x = gdpPercap, y = lifeExp)) +
  # add a points layer on top
  geom_point() +
  # add a smoothed LOESS layer
  geom_smooth(method = "loess")

我们还可以将points geom图层与line geom图层或任何其他类型的geom图层组合在一起。线图适用于绘制时间序列，因此下面我们使用点和线图层绘制平均预期寿命。在这里，你可以通过总结每年的预期寿命并将结果输入ggplot而不必定义任何中间变量来对dplyr操作与ggplot2进行一些巧妙的组合。

gapminder %>%
  # calcualte the average life expectency for each year
  group_by(year) %>%
  summarise(avg_lifeExp = mean(lifeExp)) %>%
  ungroup() %>%
  # specify global aesthetic mappings
  ggplot(aes(x = year, y = avg_lifeExp)) +
  # add a points layer on top
  geom_point() +
  # add a line layer on top
  geom_line()

如果你想在我们每个大陆的地块上有一条单独的线（而不是所有大陆的聚合线），你不需要为每个大陆添加一个单独的层来得到以下图：

相反，当您按年计算平均预期寿命时，首先按“大陆”分组。

gapminder %>%
  group_by(continent, year) %>%
  summarise(avg_lifeExp = mean(lifeExp))

## # A tibble: 60 x 3
## # Groups:   continent [5]
##    continent  year avg_lifeExp
##    <fct>     <int>       <dbl>
##  1 Africa     1952        39.1
##  2 Africa     1957        41.3
##  3 Africa     1962        43.3
##  4 Africa     1967        45.3
##  5 Africa     1972        47.5
##  6 Africa     1977        49.6
##  7 Africa     1982        51.6
##  8 Africa     1987        53.3
##  9 Africa     1992        53.6
## 10 Africa     1997        53.6
## # … with 50 more rows

但是，如果您尝试使用与上面相同的代码在国家/地区年份分组数据框架上绘制一条线，则会得到一个奇怪的锯齿形图案。

gapminder %>%
  group_by(continent, year) %>%
  summarise(avg_lifeExp = mean(lifeExp)) %>%
  ungroup() %>%
  ggplot() +
  # add a points layer on top
  geom_point(aes(x = year, y = avg_lifeExp)) +
  # add a lines layer ontop
  geom_line(aes(x = year, y = avg_lifeExp))

发生这种情况是因为您现在每年都有多个平均预期寿命值，但您没有指定哪些值一起使用。要修复此图，您需要通过在geom_line（）图层的aes（）函数中指定group = continent参数来指定行如何组合在一起（即哪个变量定义各行）。

gapminder %>%
  group_by(continent, year) %>%
  summarise(avg_lifeExp = mean(lifeExp)) %>%
  ggplot() +
  # add a points layer on top
  geom_point(aes(x = year, y = avg_lifeExp)) +
  # add a lines layer on top that is grouped by continent
  geom_line(aes(x = year, y = avg_lifeExp, group = continent))

image

基于变量的更多美学映射

到目前为止，我们只指定了从数据到geom对象的x和y位置美学映射。但您也可以指定其他类型的美学映射，例如使用变量来指定点的颜色。如果希望所有点都是相同的颜色，则可以指定全局点颜色参数（位于aes（）函数之外）。

gapminder %>%
  ggplot() +
  geom_point(aes(x = gdpPercap, y = lifeExp),
             col = "cornflowerblue")

但是，如果您想使用数据框中的变量来定义geoms的颜色（或任何其他美学特征），需要将它包含在aes（）函数中。

gapminder %>%
  ggplot() +
  geom_point(aes(x = gdpPercap, 
                 y = lifeExp, 
                 col  = continent))

请注意，continent变量本身不指定颜色：这是自动完成的。您可以通过添加颜色的缩放图层来指定自己想要的颜色。

gapminder %>%
  ggplot() +
  geom_point(aes(x = gdpPercap, 
                 y = lifeExp, 
                 col  = continent)) +
  scale_colour_manual(values = c("orange", "red4", "purple", "darkgreen", "blue"))

我们还可以为其他功能添加美学映射，例如形状，大小，透明度（alpha）等等！例如，根据人口改变大小：

gapminder %>%
  ggplot() +
  geom_point(aes(x = gdpPercap, y = lifeExp, 
                 col = continent, size = pop),
             alpha = 0.5)

对于上面的线图示例，我们绘制了每个大陆的平均预期寿命时间线，而不是指定“group”参数，您可以将colour参数指定为continent。这将由continent自动分组和着色。

gapminder %>%
  group_by(continent, year) %>%
  summarise(avg_lifeExp = mean(lifeExp)) %>%
  # specify global aesthetic mappings
  ggplot() +
  # add a points layer on top
  geom_line(aes(x = year, y = avg_lifeExp, colour = continent))

其他类型的图层

到目前为止，我们只看到了散点图（点）和线图，但是，还有许多其他可以添加的geom，包括：

直方图

直方图仅需要指定X轴。

gapminder %>%
  ggplot() + 
  geom_histogram(aes(x = lifeExp), binwidth = 3)

箱图

要为箱形图着色，请使用fill参数而不是col（或color /colour）参数。

gapminder %>%
  ggplot() +
  geom_boxplot(aes(x = continent, y = lifeExp, fill = continent))

组合图片

您可以通过添加构面图层来创建由您选择的分类变量（例如“大陆”）分隔的图形的网格（或“构面”）。

gapminder %>%
  ggplot() +
  geom_point(aes(x = gdpPercap, y = lifeExp)) +
  facet_wrap(~continent, ncol = 2)

自定义ggplot2

虽然我们在这里保留了默认的ggplot2功能，但是你可以用ggplot2来做很多事情。例如，通过练习，您将学习如何通过将多个层组合在一起来生成高度自定义的绘图。作为动机，这里有一个更漂亮的情节可以用ggplot2制作：

gapminder %>% 
  filter(year == 2007) %>%
  ggplot() +
  # add scatter points
  geom_point(aes(x = gdpPercap, y = lifeExp, col = continent, size = pop),
             alpha = 0.5) +
  # add some text annotations for the very large countries
  geom_text(aes(x = gdpPercap, y = lifeExp + 3, label = country),
            col = "grey50",
            data = filter(gapminder, year == 2007, pop > 1000000000 | country %in% c("Nigeria", "United States"))) +
  # clean the axes names and breaks
  scale_x_log10(limits = c(200, 60000)) +
  # change labels
  labs(title = "GDP versus life expectancy in 2007",
       x = "GDP per capita (log scale)",
       y = "Life expectancy",
       size = "Population",
       col = "Continent") +
  # change the size scale
  scale_size(range = c(0.1, 10),
             # remove size legend
             guide = "none") +
  # add a nicer theme
  theme_classic() +
  # place legend at top and grey axis lines
  theme(legend.position = "top")

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2019.08.28 ，如有侵权请联系 cloudcommunity@tencent.com 删除

编程算法

本文分享自作者个人站点/博客前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

编程算法

登录后参与评论

0 条评论

热度