# R数据科学|3.6习题解答

## 问题一

• 一架航班 50% 的时间会提前 15 分钟，50% 的时间会延误 15 分钟。
• 一架航班总是会延误 10 分钟。
• 一架航班 50% 的时间会提前 30 分钟，50% 的时间会延误 30 分钟。
• 一架航班 99% 的时间会准时，1% 的时间会延误 2 个小时。哪一种更重要：到达延误还是出发延误？

## 问题二

### 解答

• 方法一
```not_cancelled %>%
count(dest)
```
• 方法二
```not_cancelled %>%
group_by(dest) %>%
summarise(n = length(dest))
```
• 方法三
```not_cancelled %>%
group_by(dest) %>%
summarise(n = n())
```

## 问题四

### 解答

```cancelled_per_day <-
flights %>%
mutate(cancelled = (is.na(arr_delay) | is.na(dep_delay))) %>%
group_by(year, month, day) %>%
summarise(
cancelled_num = sum(cancelled),
flights_num = n(),
)
ggplot(cancelled_per_day) +
geom_point(aes(x = flights_num, y = cancelled_num))
```

```cancelled_and_delays <-
flights %>%
mutate(cancelled = (is.na(arr_delay) | is.na(dep_delay))) %>%
group_by(year, month, day) %>%
summarise(
cancelled_prop = mean(cancelled),
avg_dep_delay = mean(dep_delay, na.rm = TRUE),
avg_arr_delay = mean(arr_delay, na.rm = TRUE)
) %>%
ungroup()
ggplot(cancelled_and_delays) +
geom_point(aes(x = avg_dep_delay, y = cancelled_prop))
```

## 问题五

### 解答

```1:3 + 1:10
flights %>%
group_by(carrier) %>%
summarise(arr_delay = mean(arr_delay, na.rm = TRUE)) %>%
arrange(desc(arr_delay))
```

```flights %>%
filter(!is.na(arr_delay)) %>%
group_by(origin, dest, carrier) %>%
summarise(
arr_delay = sum(arr_delay),
flights = n()
) %>%
group_by(origin, dest) %>%
mutate(
arr_delay_total = sum(arr_delay),
flights_total = sum(flights)
) %>%
ungroup() %>%
mutate(
arr_delay_others = (arr_delay_total - arr_delay) /
(flights_total - flights),
arr_delay_mean = arr_delay / flights,
arr_delay_diff = arr_delay_mean - arr_delay_others
) %>%
filter(is.finite(arr_delay_diff)) %>%
group_by(carrier) %>%
summarise(arr_delay_diff = mean(arr_delay_diff)) %>%
arrange(desc(arr_delay_diff))
```

## 问题六

count() 函数中的 sort 参数的作用是什么？何时应该使用这个参数？

