我正在试图找出如何识别一个组中的名称更改。
例如,我有一个如下所示的dataframe:
df <- data.frame(
state = rep(c("CA", "WI", "NY"), each = 3),
year = rep(c(2000, 2001), each = 9),
name = c("John", "Paul", "Sally",
"Mary", "Fred", "Jane",
"Linda", "Carl", "Jim",
"Peter", "Paul", "Sally",
"Mary", "Kate", "Jane",
"Linda", "Carl", "Jim")
)
> df
state year name
1 CA 2000 John
2 CA 2000 Paul
3 CA 2000 Sally
4 WI 2000 Mary
5 WI 2000 Fred
6 WI 2000 Jane
7 NY 2000 Linda
8 NY 2000 Carl
9 NY 2000 Jim
10 CA 2001 Peter
11 CA 2001 Paul
12 CA 2001 Sally
13 WI 2001 Mary
14 WI 2001 Kate
15 WI 2001 Jane
16 NY 2001 Linda
17 NY 2001 Carl
18 NY 2001 Jim
正如你所看到的,“彼得”在2001年取代了“约翰”,“凯特”在2001年取代了“弗雷德”。
所以我希望输出看起来像:
df <- data.frame(
state = rep(c("CA", "WI", "NY"), each = 3),
year = rep(c(2000, 2001), each = 9),
name = c("John", "Paul", "Sally",
"Mary", "Fred", "Jane",
"Linda", "Carl", "Jim",
"Peter", "Paul", "Sally",
"Mary", "Kate", "Jane",
"Linda", "Carl", "Jim"),
change = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
1, 0, 0, 0, 1, 0, 0, 0, 0)
)
> df
state year name change
1 CA 2000 John NA
2 CA 2000 Paul NA
3 CA 2000 Sally NA
4 WI 2000 Mary NA
5 WI 2000 Fred NA
6 WI 2000 Jane NA
7 NY 2000 Linda NA
8 NY 2000 Carl NA
9 NY 2000 Jim NA
10 CA 2001 Peter 1
11 CA 2001 Paul 0
12 CA 2001 Sally 0
13 WI 2001 Mary 0
14 WI 2001 Kate 1
15 WI 2001 Jane 0
16 NY 2001 Linda 0
17 NY 2001 Carl 0
18 NY 2001 Jim 0
如你所见,2001年的Peter和2001年的Kate在"change“栏中都被标记为"1”,因为它们分别在2000年-CA和2000年-纽约取代了"John“和"Fred”。
我一直在考虑使用一些滞后方法,但它似乎只是查看了前一行,而不是按州组、年份组:
df2 <- df %>%
group_by(state, year) %>%
mutate(change = lag(name, order_by = year))
任何帮助都将不胜感激!
发布于 2022-02-21 18:10:09
根据预期的输出,这可能会有所帮助--在整个数据中创建一个基于duplicated
'name‘的逻辑列,然后按’if
‘分组,if
all
值为FALSE (!change
),然后用NA
或else
替换为将逻辑转换为二进制(+
)。
library(dplyr)
df %>%
mutate(rn = row_number()) %>%
arrange(state, year, name) %>%
group_by(state) %>%
mutate(change = !duplicated(name)) %>%
group_by(year, .add = TRUE) %>%
mutate(
change = if(all(change)) NA_integer_ else +(change)) %>%
ungroup %>%
arrange(rn) %>%
select(-rn)
-output
# A tibble: 18 × 4
state year name change
<chr> <dbl> <chr> <int>
1 CA 2000 John NA
2 CA 2000 Paul NA
3 CA 2000 Sally NA
4 WI 2000 Mary NA
5 WI 2000 Fred NA
6 WI 2000 Jane NA
7 NY 2000 Linda NA
8 NY 2000 Carl NA
9 NY 2000 Jim NA
10 CA 2001 Peter 1
11 CA 2001 Paul 0
12 CA 2001 Sally 0
13 WI 2001 Mary 0
14 WI 2001 Kate 1
15 WI 2001 Jane 0
16 NY 2001 Linda 0
17 NY 2001 Carl 0
18 NY 2001 Jim 0
使用注释中显示的另一个数据
df2 <- structure(list(state = c("AK", "AK", "AK", "AK", "AK", "AK",
"AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK",
"AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK"), year = c(1997L,
1998L, 1995L, 1996L, 1997L, 1995L, 1996L, 1998L, 1997L, 1998L,
1996L, 1995L, 1996L, 1997L, 1998L, 1995L, 1996L, 1997L, 1998L,
1995L, 1996L, 1995L, 1996L, 1997L, 1998L), name = c("A", "A", "A",
"A", "B", "B", "B", "B", "C", "C", "C", "C", "D", "D", "D", "E",
"E", "E", "E", "F", "F", "G", "G", "G", "G")), class = "data.frame",
row.names = c(NA, -25L))
df2 %>%
mutate(rn = row_number()) %>%
arrange(state, year, name) %>%
group_by(state) %>%
mutate(change = !duplicated(name)) %>%
group_by(year, .add = TRUE) %>%
mutate(
change = if(all(change)) NA_integer_ else +(change)) %>%
ungroup %>%
arrange(rn) %>%
select(-rn) %>%
as.data.frame
-output
state year name change
1 AK 1997 A 0
2 AK 1998 A 0
3 AK 1995 A NA
4 AK 1996 A 0
5 AK 1997 B 0
6 AK 1995 B NA
7 AK 1996 B 0
8 AK 1998 B 0
9 AK 1997 C 0
10 AK 1998 C 0
11 AK 1996 C 0
12 AK 1995 C NA
13 AK 1996 D 1
14 AK 1997 D 0
15 AK 1998 D 0
16 AK 1995 E NA
17 AK 1996 E 0
18 AK 1997 E 0
19 AK 1998 E 0
20 AK 1995 F NA
21 AK 1996 F 0
22 AK 1995 G NA
23 AK 1996 G 0
24 AK 1997 G 0
25 AK 1998 G 0
发布于 2022-02-21 18:31:27
一种基R方法,该方法忽略了NA
df2 <- split(df, df$year)
cbind(df, change=rep((!(df2$"2000"$name == df2$"2001"$name))*1, length(df2)))
state year name change
1 CA 2000 John 1
2 CA 2000 Paul 0
3 WI 2000 Sally 0
4 WI 2000 Mary 0
5 NY 2000 Fred 1
6 NY 2000 Jane 0
7 CA 2000 Linda 0
8 CA 2000 Carl 0
9 WI 2000 Jim 0
10 WI 2001 Peter 1
11 NY 2001 Paul 0
12 NY 2001 Sally 0
13 CA 2001 Mary 0
14 CA 2001 Kate 1
15 WI 2001 Jane 0
16 WI 2001 Linda 0
17 NY 2001 Carl 0
18 NY 2001 Jim 0
https://stackoverflow.com/questions/71210859
复制相似问题