我正在研究这个数据,其中的学生名单(,ID,)链接到他们最喜欢的体育项目,只能从7个不同的体育项目中选择。对于一个独特的ID,可能有多个喜爱的运动。下面是它的快照。
ID Sports
1 Soccer
2 Basketball
3 Tennis
1 Basketball
4 Soccer
2 Hockey
3 Basketball
5 Soccer
6 Rafting
2 surfing
1 Hockey
6 Soccer
7 Tennis
我需要创建一个数据,显示每个学生喜欢多少个不同的体育项目(ID),并显示这些体育项目,一些结果如下:
ID count All Favourite Sports
1 3 Soccer, Basketball,Hockey
2 3 Basketball,Hockey,surfing
3 2 Tennis, Basketball
4 1 Soccer
5 1 Soccer
6 2 Rafting, Soccer
7 1 Tennis
发布于 2020-09-11 19:12:53
您可以使用dplyr
包和下面的代码完成此任务。请注意,在您的问题中,data
应该是您data.frame的名称:
> data %>% group_by(ID) %>%
+ summarize(count = n_distinct(Sports),
+ all_sports = toString(Sports)) %>%
+ ungroup()
# A tibble: 7 x 3
ID count all_sports
<int> <int> <chr>
1 1 3 Soccer, Basketball, Hockey
2 2 3 Basketball, Hockey, surfing
3 3 2 Tennis, Basketball
4 4 1 Soccer
5 5 1 Soccer
6 6 2 Rafting, Soccer
7 7 1 Tennis
发布于 2020-09-11 20:06:29
另一种你可以尝试的方法
library(dplyr)
df %>%
group_by(ID) %>%
transmute(ID, count = n(), `All Favourite Sports` = paste(unique(Sports), collapse = ", ")) %>%
slice(1) %>%
ungroup()
# ID count `All Favourite Sports`
# <int> <int> <chr>
# 1 1 3 Soccer, Basketball, Hockey
# 2 2 3 Basketball, Hockey, surfing
# 3 3 2 Tennis, Basketball
# 4 4 1 Soccer
# 5 5 1 Soccer
# 6 6 2 Rafting, Soccer
# 7 7 1 Tennis
https://stackoverflow.com/questions/63849566
复制相似问题