我有以下df:
df <- tibble(country = c("US", "US", "US", "US", "US", "US", "US", "US", "US", "Mex", "Mex"),
year = c(1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2000, 2001),
score = c(NA, NA, NA, NA, 426, NA, NA, 430, NA, 450, NA))我想做的是:创建一个新的变量before_after,它在某个国家的score值不是NA值的第一年之前是0,然后是1。
换句话说,对它进行硬编码,我希望它返回以下df:
df <- tibble(country = c("US", "US", "US", "US", "US", "US", "US", "US", "US", "Mex", "Mex"),
year = c(1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2000, 2001),
score = c(NA, NA, NA, NA, 426, NA, NA, 430, NA, 450, NA),
before_after = c(0,0,0,0,1,1,1,1,1,1,1))我尝试了以下代码,但没有用:
df %>%
arrange(year) %>%
group_by(country) %>%
mutate(before_after = ifelse(which.max(!is.na(score)),1,0)) %>%
arrange(country, year)Tidyverse的解决方案将非常受欢迎,但任何帮助都将受到极大的感谢。
提前感谢!
发布于 2019-03-11 03:29:31
您可以使用cumsum
df %>%
arrange(country, year) %>%
group_by(country) %>%
mutate(before_after = ifelse(cumsum(!is.na(score)) > 0, 1, 0))
country year score before_after
<chr> <dbl> <dbl> <dbl>
1 Mex 2000 450 1
2 Mex 2001 NA 1
3 US 1999 NA 0
4 US 2000 NA 0
5 US 2001 NA 0
6 US 2002 NA 0
7 US 2003 426 1
8 US 2004 NA 1
9 US 2005 NA 1
10 US 2006 430 1
11 US 2007 NA 1发布于 2019-03-11 03:29:22
将group_by与fill结合使用
library(tidyverse)
# create dataframe
df <- tibble(country = c("US", "US", "US", "US", "US", "US", "US", "US", "US", "Mex", "Mex"),
year = c(1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2000, 2001),
score = c(NA, NA, NA, NA, 426, NA, NA, 430, NA, 450, NA))
# create before_after variable with case_when
(df <- mutate(df, before_after = case_when(!is.na(score) ~ 1)))
# A tibble: 11 x 4
country year score before_after
<chr> <dbl> <dbl> <dbl>
1 Mex 2000 450 1
2 Mex 2001 NA NA
3 US 1999 NA NA
4 US 2000 NA NA
5 US 2001 NA NA
# run fill
df %>%
group_by(country) %>%
fill(before_after)
# A tibble: 11 x 4
# Groups: country [2]
country year score before_after
<chr> <dbl> <dbl> <dbl>
1 Mex 2000 450 1
2 Mex 2001 NA 1
3 US 1999 NA NA
4 US 2000 NA NA
5 US 2001 NA NAhttps://stackoverflow.com/questions/55091412
复制相似问题