首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >如何识别dplyr()组中的更改名称?

如何识别dplyr()组中的更改名称?
EN

Stack Overflow用户
提问于 2022-02-21 17:51:46
回答 2查看 39关注 0票数 1

我正在试图找出如何识别一个组中的名称更改。

例如,我有一个如下所示的dataframe:

代码语言:javascript
运行
复制
    df <- data.frame(
  state = rep(c("CA", "WI", "NY"), each = 3),
  year = rep(c(2000, 2001), each = 9),
  name = c("John", "Paul", "Sally",
           "Mary", "Fred", "Jane",
           "Linda", "Carl", "Jim",
           "Peter", "Paul", "Sally",
           "Mary", "Kate", "Jane",
           "Linda", "Carl", "Jim")
)

    > df
   state year  name
1     CA 2000  John
2     CA 2000  Paul
3     CA 2000 Sally
4     WI 2000  Mary
5     WI 2000  Fred
6     WI 2000  Jane
7     NY 2000 Linda
8     NY 2000  Carl
9     NY 2000   Jim
10    CA 2001 Peter
11    CA 2001  Paul
12    CA 2001 Sally
13    WI 2001  Mary
14    WI 2001  Kate
15    WI 2001  Jane
16    NY 2001 Linda
17    NY 2001  Carl
18    NY 2001   Jim

正如你所看到的,“彼得”在2001年取代了“约翰”,“凯特”在2001年取代了“弗雷德”。

所以我希望输出看起来像:

代码语言:javascript
运行
复制
df <- data.frame(
  state = rep(c("CA", "WI", "NY"), each = 3),
  year = rep(c(2000, 2001), each = 9),
  name = c("John", "Paul", "Sally",
           "Mary", "Fred", "Jane",
           "Linda", "Carl", "Jim",
           "Peter", "Paul", "Sally",
           "Mary", "Kate", "Jane",
           "Linda", "Carl", "Jim"),
  change = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
             1, 0, 0, 0, 1, 0, 0, 0, 0)
)

   > df
   state year  name change
1     CA 2000  John     NA
2     CA 2000  Paul     NA
3     CA 2000 Sally     NA
4     WI 2000  Mary     NA
5     WI 2000  Fred     NA
6     WI 2000  Jane     NA
7     NY 2000 Linda     NA
8     NY 2000  Carl     NA
9     NY 2000   Jim     NA
10    CA 2001 Peter      1
11    CA 2001  Paul      0
12    CA 2001 Sally      0
13    WI 2001  Mary      0
14    WI 2001  Kate      1
15    WI 2001  Jane      0
16    NY 2001 Linda      0
17    NY 2001  Carl      0
18    NY 2001   Jim      0

如你所见,2001年的Peter和2001年的Kate在"change“栏中都被标记为"1”,因为它们分别在2000年-CA和2000年-纽约取代了"John“和"Fred”。

我一直在考虑使用一些滞后方法,但它似乎只是查看了前一行,而不是按州组、年份组:

代码语言:javascript
运行
复制
df2 <- df %>% 
  group_by(state, year) %>%
  mutate(change = lag(name, order_by = year))

任何帮助都将不胜感激!

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-02-21 18:10:09

根据预期的输出,这可能会有所帮助--在整个数据中创建一个基于duplicated 'name‘的逻辑列,然后按’if‘分组,if all值为FALSE (!change),然后用NAelse替换为将逻辑转换为二进制(+)。

代码语言:javascript
运行
复制
library(dplyr)
df %>%
    mutate(rn = row_number()) %>%
    arrange(state, year, name) %>%
    group_by(state) %>%
    mutate(change = !duplicated(name)) %>% 
    group_by(year, .add = TRUE) %>%
    mutate(
      change = if(all(change)) NA_integer_ else +(change)) %>% 
   ungroup %>% 
   arrange(rn) %>% 
   select(-rn)

-output

代码语言:javascript
运行
复制
# A tibble: 18 × 4
   state  year name  change
   <chr> <dbl> <chr>  <int>
 1 CA     2000 John      NA
 2 CA     2000 Paul      NA
 3 CA     2000 Sally     NA
 4 WI     2000 Mary      NA
 5 WI     2000 Fred      NA
 6 WI     2000 Jane      NA
 7 NY     2000 Linda     NA
 8 NY     2000 Carl      NA
 9 NY     2000 Jim       NA
10 CA     2001 Peter      1
11 CA     2001 Paul       0
12 CA     2001 Sally      0
13 WI     2001 Mary       0
14 WI     2001 Kate       1
15 WI     2001 Jane       0
16 NY     2001 Linda      0
17 NY     2001 Carl       0
18 NY     2001 Jim        0

使用注释中显示的另一个数据

代码语言:javascript
运行
复制
df2 <- structure(list(state = c("AK", "AK", "AK", "AK", "AK", "AK",  
"AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", 
 "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK"), year = c(1997L,  
1998L, 1995L, 1996L, 1997L, 1995L, 1996L, 1998L, 1997L, 1998L, 
 1996L, 1995L, 1996L, 1997L, 1998L, 1995L, 1996L, 1997L, 1998L,  
1995L, 1996L, 1995L, 1996L, 1997L, 1998L), name = c("A", "A",  "A", 
"A", "B", "B", "B", "B", "C", "C", "C", "C", "D", "D", "D",  "E", 
"E", "E", "E", "F", "F", "G", "G", "G", "G")), class = "data.frame", 
row.names = c(NA,  -25L))
df2 %>%
    mutate(rn = row_number()) %>%
    arrange(state, year, name) %>%
    group_by(state) %>%
    mutate(change = !duplicated(name)) %>% 
    group_by(year, .add = TRUE) %>%
    mutate(
      change = if(all(change)) NA_integer_ else +(change)) %>% 
   ungroup %>% 
   arrange(rn) %>% 
   select(-rn) %>%
   as.data.frame

-output

代码语言:javascript
运行
复制
 state year name change
1     AK 1997    A      0
2     AK 1998    A      0
3     AK 1995    A     NA
4     AK 1996    A      0
5     AK 1997    B      0
6     AK 1995    B     NA
7     AK 1996    B      0
8     AK 1998    B      0
9     AK 1997    C      0
10    AK 1998    C      0
11    AK 1996    C      0
12    AK 1995    C     NA
13    AK 1996    D      1
14    AK 1997    D      0
15    AK 1998    D      0
16    AK 1995    E     NA
17    AK 1996    E      0
18    AK 1997    E      0
19    AK 1998    E      0
20    AK 1995    F     NA
21    AK 1996    F      0
22    AK 1995    G     NA
23    AK 1996    G      0
24    AK 1997    G      0
25    AK 1998    G      0
票数 2
EN

Stack Overflow用户

发布于 2022-02-21 18:31:27

一种基R方法,该方法忽略了NA

代码语言:javascript
运行
复制
df2 <- split(df, df$year)

cbind(df, change=rep((!(df2$"2000"$name == df2$"2001"$name))*1, length(df2)))
   state year  name change
1     CA 2000  John      1
2     CA 2000  Paul      0
3     WI 2000 Sally      0
4     WI 2000  Mary      0
5     NY 2000  Fred      1
6     NY 2000  Jane      0
7     CA 2000 Linda      0
8     CA 2000  Carl      0
9     WI 2000   Jim      0
10    WI 2001 Peter      1
11    NY 2001  Paul      0
12    NY 2001 Sally      0
13    CA 2001  Mary      0
14    CA 2001  Kate      1
15    WI 2001  Jane      0
16    WI 2001 Linda      0
17    NY 2001  Carl      0
18    NY 2001   Jim      0
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71210859

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档