前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >R4DS学习记录_ggplot2 - CG

R4DS学习记录_ggplot2 - CG

原创
作者头像
Crazy_George
发布2024-03-30 16:50:19
1240
发布2024-03-30 16:50:19
举报
文章被收录于专栏:R语言一周生信入门R语言

1. 数据和作图要求

描绘出企鹅不同族群的脚蹼长度和身体重量之间的关系,复现下图

本文目的复现此图
本文目的复现此图

2. 基本概念

2.1 变量-variable

A variable is a quantity, quality, or property that you can measure.

2.2 值-value

A value is the state of a variable when you measure it. The value of a variable may change from measurement to measurement.

2.3 observation

An observation is a set of measurements made under similar conditions (you usually make all of the measurements in an observation at the same time and on the same object). An observation will contain several values, each associated with a different variable. We’ll sometimes refer to an observation as a data point.

2.4 Tabular data

Tabular data is a set of values, each associated with a variable and an observation. Tabular data is tidy if each value is placed in its own “cell”, each variable in its own column, and each observation in its own row.

3. 准备工作

代码语言:r
复制
install.packages("tidyverse")
library(tidyverse)
library(palmerpenguins)#包含penguins数据框
library(ggthemes)#包含colorblind safe color palette功能,ggplot2作图时需要。

4. 复现过程

4.1 查看penguins

代码语言:r
复制
> penguins#查看penguins数据信息。或运行view(penguins)/glimpse()命令。
## A tibble: 344 × 8
#  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex    year
#   <fct>   <fct>              <dbl>         <dbl>             <int>       <int> <fct> <int>
# 1 Adelie  Torgersen           39.1          18.7               181        3750 male   2007
# 2 Adelie  Torgersen           39.5          17.4               186        3800 fema…  2007
# 3 Adelie  Torgersen           40.3          18                 195        3250 fema…  2007
# 4 Adelie  Torgersen           NA            NA                  NA          NA NA     2007
# 5 Adelie  Torgersen           36.7          19.3               193        3450 fema…  2007
# 6 Adelie  Torgersen           39.3          20.6               190        3650 male   2007
# 7 Adelie  Torgersen           38.9          17.8               181        3625 fema…  2007
# 8 Adelie  Torgersen           39.2          19.6               195        4675 male   2007
# 9 Adelie  Torgersen           34.1          18.1               193        3475 NA     2007
#10 Adelie  Torgersen           42            20.2               190        4250 NA     2007
## ℹ 334 more rows
## ℹ Use `print(n = ...)` to see more rows

本次作图的变量是:

  • species
  • flipper_length_mm
  • body_mass_g

4.2 作图逻辑

4.2.1. 定义横纵坐标

代码语言:r
复制
ggolot(data = penguins)#此时是空白画板,没有定义作图
ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g)
)#此时作图定义了横纵坐标,在ggplt2 中,The mapping argument is always defined in the aes() function

4.2.2 增加定义geom

ggplot2中有函数geom_bar()/geom_line()/ geom_point() / geom_boxplot()等。

代码语言:r
复制
> ggplot(
+   data = penguins,
+   mapping = aes(x = flipper_length_mm, y = body_mass_g)
+ ) +
+   geom_point()
#Warning message:
#Removed 2 rows containing missing values or values outside the scale range
#(`geom_point()`). 
根据目的选择点图
根据目的选择点图

4.2.3 怎么区分族群?

代码语言:r
复制
ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)
) +
  geom_point()
颜色区分族群
颜色区分族群

4.2.4 添加趋势线(新的几何对象)

geom_smmoth()

代码语言:r
复制
ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)
) +
  geom_point() +
  geom_smooth(method = "lm")#draw the line of best fit based on a linear model with method = "lm"

4.2.5不同族群如何共享同一趋势线

理解ggplot2作图整体和局部的概念和区分

代码语言:r
复制
ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
  geom_point(mapping = aes(color = species)) +
  geom_smooth(method = "lm")

4.2.6 根据不同族群将点分成不同形状

代码语言:r
复制
ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
  geom_point(mapping = aes(color = species, shape = species)) +
  geom_smooth(method = "lm")

4.2.7 优化作图(标题/坐标标题等)

代码语言:r
复制
ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
  geom_point(aes(color = species, shape = species)) +
  geom_smooth(method = "lm") +
  labs(
    title = "Body mass and flipper length",
    subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
    x = "Flipper length (mm)", y = "Body mass (g)",
    color = "Species", shape = "Species"
  ) +
  scale_color_colorblind()

scale_color_colorblind(): improve the color palette to be colorblind safe

labs(): 各种title命名。

作图复现完成

5. 作业

5.1 How many rows are in penguins? How many columns?

代码语言:r
复制
> glimpse(penguins)
#Rows: 344
#Columns: 8

5.2 What does the bill_depth_mm variable in the penguins data frame describe? Read the help for ?penguins to find out.

5.3 Make a scatterplot of bill_depth_mm vs. bill_length_mm. That is, make a scatterplot with bill_depth_mm on the y-axis and bill_length_mm on the x-axis. Describe the relationship between these two variables.What happens if you make a scatterplot of species vs. bill_depth_mm? What might be a better choice of geom?

代码语言:r
复制
ggplot(
  data = penguins,
  mapping = aes(x = bill_length_mm,y = bill_depth_mm,color = species,shape = species)
  )+
  geom_point()+
  geom_smooth(method = "lm")

可以看到三个不同的种群的趋势是相同的bill_length_mm越大,bill_depth_mm越大。但是如果把他们当作整体画趋势线就会的出错误的结果。

代码语言:r
复制
ggplot(
  data = penguins,
  mapping = aes(x = bill_length_mm,y = bill_depth_mm)
  )+
  geom_point(aes(color = species,shape = species))+
  geom_smooth(method = "lm")##错误方式
  
错误方式作图
错误方式作图

5.4 报错解决

代码语言:r
复制
> ggplot(data = penguins) + 
+   geom_point()
#Error in `geom_point()`:
#! Problem while setting up geom.
#ℹ Error occurred in the 1st layer.
#Caused by error in `compute_geom_1()`:
#! `geom_point()` requires the following missing aesthetics: x and y.
#Run `rlang::last_trace()` to see where the error occurred.

5.5 What does the na.rm argument do in geom_point()? What is the default value of the argument? Create a scatterplot where you successfully use this argument set to TRUE.

运行?geom_point()

理解下边两段代码区别:

代码语言:r
复制
> ggplot(
+   data = penguins,
+   mapping = aes(x = bill_length_mm,y = bill_depth_mm,color = species,shape = species)
+ )+
+   geom_point()+
+   geom_smooth(method = "lm")
#`geom_smooth()` using formula = 'y ~ x'
#Warning messages:
#1: Removed 2 rows containing non-finite outside the scale range (`stat_smooth()`). 
#2: Removed 2 rows containing missing values or values outside the scale range
代码语言:r
复制
ggplot(
  data = penguins,
  mapping = aes(x = bill_length_mm,y = bill_depth_mm,color = species,shape = species)
  )+
  geom_point(na.rm = TRUE)+
  geom_smooth(method = "lm")
#`geom_smooth()` using formula = 'y ~ x'
#Warning message:
#Removed 2 rows containing non-finite outside the scale range #(`stat_smooth()`). 

5.6 Add the following caption to the plot you made in the previous exercise: “Data come from the palmerpenguins package.” Hint: Take a look at the documentation for labs().

代码语言:r
复制
ggplot(
  data = penguins,
  mapping = aes(x = bill_length_mm,y = bill_depth_mm,color = species,shape = species)
)+
  geom_point(na.rm = TRUE)+
  geom_smooth(method = "lm")+
  labs(
    caption = "Data come from the palmerpenius package"
  )

5.7 Recreate the following visualization. What aesthetic should bill_depth_mm be mapped to? And should it be mapped at the global level or at the geom level?

代码语言:r
复制
> ggplot(
+   data = penguins,
+   mapping = aes(x = flipper_length_mm,y = body_mass_g)
+ )+
+   geom_point(aes(color = bill_depth_mm))+
+   geom_smooth()

5.8 Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.

代码语言:r
复制
ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g, color = island)
) +
  geom_point() +
  geom_smooth(se = FALSE)

5.9

代码语言:r
复制
ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
  geom_point() +
  geom_smooth()
代码语言:r
复制
ggplot() +
  geom_point(
    data = penguins,
    mapping = aes(x = flipper_length_mm, y = body_mass_g)
  ) +
  geom_smooth(
    data = penguins,
    mapping = aes(x = flipper_length_mm, y = body_mass_g)
  )

以上两行代码运行结果的图一致。

#6 代码写法

代码语言:r
复制
ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
  geom_point()
代码语言:r
复制
ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) + 
  geom_point()
代码语言:r
复制
penguins |> 
  ggplot(aes(x = flipper_length_mm, y = body_mass_g)) + 
  geom_point()

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1. 数据和作图要求
  • 2. 基本概念
    • 2.1 变量-variable
      • 2.2 值-value
        • 2.3 observation
          • 2.4 Tabular data
          • 3. 准备工作
          • 4. 复现过程
            • 4.1 查看penguins
              • 4.2 作图逻辑
                • 4.2.1. 定义横纵坐标
                • 4.2.2 增加定义geom
                • 4.2.3 怎么区分族群?
                • 4.2.4 添加趋势线(新的几何对象)
                • 4.2.5不同族群如何共享同一趋势线
                • 4.2.6 根据不同族群将点分成不同形状
                • 4.2.7 优化作图(标题/坐标标题等)
            • 5. 作业
              • 5.1 How many rows are in penguins? How many columns?
                • 5.2 What does the bill_depth_mm variable in the penguins data frame describe? Read the help for ?penguins to find out.
                  • 5.3 Make a scatterplot of bill_depth_mm vs. bill_length_mm. That is, make a scatterplot with bill_depth_mm on the y-axis and bill_length_mm on the x-axis. Describe the relationship between these two variables.What happens if you make a scatterplot of species vs. bill_depth_mm? What might be a better choice of geom?
                    • 5.4 报错解决
                      • 5.5 What does the na.rm argument do in geom_point()? What is the default value of the argument? Create a scatterplot where you successfully use this argument set to TRUE.
                        • 5.6 Add the following caption to the plot you made in the previous exercise: “Data come from the palmerpenguins package.” Hint: Take a look at the documentation for labs().
                          • 5.7 Recreate the following visualization. What aesthetic should bill_depth_mm be mapped to? And should it be mapped at the global level or at the geom level?
                            • 5.8 Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.
                              • 5.9
                              领券
                              问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档