我有以下数据:
points_team1 points_team2 team1 team2
------------------------------------------------------------------------
1 42 32 Doppler/Horst Doherty/Allen
2 40 46 Abbiati/Andreatta Mesa/Garcia
3 50 49 Bergmann/Harms Basta/Kolaric
4 46 48 Mol H./Berntsen Regza/Smits
5 29 42 Doppler/Horst Hyden/Brunner
6 31 42 Hyden/Brunner Liamin/Krasilnikov
现在我想建立每支球队得分和输球的总和。注意,一个团队可能是第1组或第2组(例如Hyden/Brunner在两边各一次)。
我试着使用gather
,但是后来被困在了如何使用类似SUMIF的东西上。
k <- structure(list(points_team1 = c(42, 40, 50, 46, 29, 31), points_team2 = c(32,
46, 49, 48, 42, 42), team1 = c("Doppler/Horst", "Abbiati/Andreatta",
"Bergmann/Harms", "Mol H. / Berntsen", "Doppler/Horst", "Hyden/Brunner"
), team2 = c("Doherty/Allen", "Mesa/Garcia", "Basta/Kolaric",
"Regza/Smits", "Hyden/Brunner", "Liamin/Krasilnikov")), row.names = c(NA,
-6L), class = "data.frame")
v <- k %>% tidyr::gather('team1','team2', key="team_id", value="teamname") %>%
dplyr::group_by(teamname) %>%
dplyr::summarize(matches_played=n(), points_won=sum(points_team1[team_id == "team1"]))
给定数据集的预期结果是:
teamname points_won points_lost
-----------------------------------------------------
1 Doppler/Horst 71 74
2 Abbiati/Andreatta 40 46
3 Mesa/Garcia 46 40
4 Hyden/Brunner 73 71
...
我研究谷歌和堆栈溢出的结果只给出了所有包含某个元素的行之和的答案(例如这里:Summarize with conditions in dplyr),但在我的问题中,要总结的列可能取决于两个不同的列,而我不知道如何做到这一点。
请帮帮我!
发布于 2018-12-27 19:28:44
您可以用相同的名称为每个团队构建两个数据文件,然后将它们堆叠在一起,并像往常一样进行总结。
team1 <- k %>% select(points_won = points_team1,
points_lost = points_team2,
team = team1)
team2 <- k %>% select(points_won = points_team2,
points_lost = points_team1,
team = team2)
bind_rows(team1, team2) %>%
group_by(team) %>%
summarise_all(sum)
https://stackoverflow.com/questions/53948800
复制相似问题