# R数据科学|5.5.2内容介绍及课后习题解答

## 5.5.2　两个分类变量

• 使用内置的`geom_count()` 函数：
``` ggplot(data = diamonds) +
geom_count(mapping = aes(x = cut, y = color))
```

【注】图中每个圆点的大小表示每个变量组合中的观测数量。相关变动就表示为特定 x 轴变量值与特定 y 轴变量值之间的强相关关系。

• 使用`dplyr`
```diamonds %>%
count(color, cut)
#> Source: local data frame [35 x 3]
#> Groups: color [?]
#>
#> color cut n
#> <ord> <ord> <int>
#> 1 D Fair 163
#> 2 D Good 662
#> 3 D Very Good 1513
#> 5 D Ideal 2834
#> 6 E Fair 224
#> # ... with 29 more rows
```

```diamonds %>%
count(color, cut) %>%
ggplot(mapping = aes(x = color, y = cut)) +
geom_tile(mapping = aes(fill = n))
```

【注】如果分类变量是无序的，那么可以使用`seriation`包对行和列同时进行重新排序，以便更清楚地表示出有趣的模式。对于更大的图形，你可以使用`d3heatmap``heatmaply`包，这两个包都可以生成有交互功能的图形。

# 5.5.2 习题解答

## 问题一

### 解答

```diamonds %>%
count(color, cut) %>%
group_by(color) %>%
mutate(prop = n / sum(n)) %>%
ggplot(mapping = aes(x = color, y = cut)) +
geom_tile(mapping = aes(fill = prop))
```

```diamonds %>%
count(color, cut) %>%
group_by(cut) %>%
mutate(prop = n / sum(n)) %>%
ggplot(mapping = aes(x = color, y = cut)) +
geom_tile(mapping = aes(fill = prop))
```

## 问题二

### 解答

```flights %>%
group_by(month, dest) %>%
summarise(dep_delay = mean(dep_delay, na.rm = TRUE)) %>%
ggplot(aes(x = factor(month), y = dest, fill = dep_delay)) +
geom_tile() +
labs(x = "Month", y = "Destination", fill = "Departure Delay")
#> `summarise()` regrouping output by 'month' (override with `.groups` argument)
```

```flights %>%
group_by(month, dest) %>%                                 # This gives us (month, dest) pairs
summarise(dep_delay = mean(dep_delay, na.rm = TRUE)) %>%
group_by(dest) %>%                                        # group all (month, dest) pairs by dest ..
filter(n() == 12) %>%                                     # and only select those that have one entry per month
ungroup() %>%
mutate(dest = reorder(dest, dep_delay)) %>%
ggplot(aes(x = factor(month), y = dest, fill = dep_delay)) +
geom_tile() +
labs(x = "Month", y = "Destination", fill = "Departure Delay")
#> `summarise()` regrouping output by 'month' (override with `.groups` argument)
```

## 问题三

### 解答

```diamonds %>%
count(color, cut) %>%
ggplot(mapping = aes(y = color, x = cut)) +
geom_tile(mapping = aes(fill = n))
```

