我有这个df
data.frame(Name = c("AI147", "AI147", "AI147", "AI147", "AI147",
"AI20", "AI20", "AI87", "AI88", "AI88", "AI88", "AI65", "AI65"),
Presence1 = c("both_type1", "soil", "soil", "water", "both_type2",
"soil", "water", "both_type2", "soil", "soil", "soil", "water",
"water"))
我想根据每个名称的数据创建一个条件列(Final)。(1)如果给定名称具有多于一种的存在类型,或者仅存在"both_type1“或"both_type2",则最终=两者,(2)如果给定名称仅存在”土壤“,则最终=土壤,(3)如果给定名称仅存在”水“,则最终=水,使得表格如下所示。
data.frame(Name = c("AI147", "AI147", "AI147", "AI147", "AI147",
"AI20", "AI20", "AI87", "AI88", "AI88", "AI88", "AI65", "AI65"),
Presence1 = c("both_type1", "soil", "soil", "water", "both_type2",
"soil", "water", "both_type2", "soil", "soil", "soil", "water",
"water"),
Final = c("both", "both", "both", "both", "both", "both",
"both", "both", "soil", "soil", "soil", "water", "water"))
我已经尝试了几种我在网站上找到的方法,但没有一种方法接近于这样做。
发布于 2021-09-10 21:32:47
我们可以使用n_distinct
或str_detect
创建条件,即,在按“Name”分组后,如果“Presence1”包含any
“both”子字符串或具有多个唯一值(n_distinct
),则返回“both”或返回“Presence1”的值。
library(dplyr)
library(stringr)
df1 <- df %>%
group_by(Name) %>%
mutate(Final = case_when(any(str_detect(Presence1,
"both")|n_distinct(Presence1) > 1) ~ 'both',
TRUE ~ Presence1 )) %>%
ungroup
-output
df1
# A tibble: 13 x 3
Name Presence1 Final
<chr> <chr> <chr>
1 AI147 both_type1 both
2 AI147 soil both
3 AI147 soil both
4 AI147 water both
5 AI147 both_type2 both
6 AI20 soil both
7 AI20 water both
8 AI87 both_type2 both
9 AI88 soil soil
10 AI88 soil soil
11 AI88 soil soil
12 AI65 water water
13 AI65 water water
https://stackoverflow.com/questions/69138162
复制相似问题