(dplyr)使用mutate()、case_when()和which()时出错

在使用 dplyr 包中的 mutate()、case_when() 和 which() 函数时，可能会遇到各种错误。以下是一些常见问题及其解决方法：

基础概念

mutate(): 用于在数据框中添加新的变量或修改现有变量。
case_when(): 类似于 SQL 中的 CASE WHEN 语句，用于根据条件创建新的变量或修改现有变量。
which(): 用于返回满足条件的元素的索引。

常见错误及解决方法

错误1：条件表达式错误

问题描述：在使用 case_when() 时，条件表达式可能不正确，导致无法正确匹配。

示例代码：

library(dplyr)

df <- data.frame(x = c(1, 2, 3, 4))

# 错误的条件表达式
df %>% mutate(y = case_when(x > 2 ~ "greater", x < 2 ~ "less"))

解决方法：确保所有条件都覆盖了所有可能的情况，并且没有遗漏。

df %>% mutate(y = case_when(x > 2 ~ "greater", x < 2 ~ "less", TRUE ~ "equal"))

错误2：which() 返回空向量

问题描述：在使用 which() 时，可能会返回空向量，表示没有找到满足条件的元素。

示例代码：

df <- data.frame(x = c(1, 2, 3, 4))

# which() 返回空向量
indices <- which(df$x > 5)

解决方法：在使用 which() 之前，确保条件有可能为真。可以使用 any() 或 all() 检查条件。

if (any(df$x > 5)) {
  indices <- which(df$x > 5)
} else {
  indices <- numeric(0)
}

错误3：mutate() 中的变量未定义

问题描述：在 mutate() 中引用的变量可能未定义或拼写错误。

示例代码：

df <- data.frame(x = c(1, 2, 3, 4))

# 变量未定义
df %>% mutate(z = x + y)

解决方法：确保所有引用的变量都已正确定义。

df <- data.frame(x = c(1, 2, 3, 4), y = c(5, 6, 7, 8))
df %>% mutate(z = x + y)

应用场景

数据清洗：使用 mutate() 和 case_when() 进行数据转换和条件赋值。
数据分析：使用 which() 查找特定条件的索引，便于进一步分析。
数据可视化：在绘图前对数据进行预处理，使用 mutate() 添加辅助变量。

示例代码

以下是一个综合示例，展示了如何正确使用 mutate()、case_when() 和 which()：

library(dplyr)

df <- data.frame(x = c(1, 2, 3, 4), y = c(5, 6, 7, 8))

# 使用 mutate() 和 case_when() 创建新变量
df <- df %>%
  mutate(
    category = case_when(
      x > 2 ~ "greater",
      x < 2 ~ "less",
      TRUE ~ "equal"
    )
  )

# 使用 which() 查找满足条件的索引
indices <- which(df$x > 2)

print(df)
print(indices)

通过以上方法，可以有效解决在使用 dplyr 包中的 mutate()、case_when() 和 which() 函数时遇到的常见问题。