我有一个体育数据集,内容如下:
season team tm shk dgs brs cts cws avg_pt_marg
2015 sharks shk 0-0 1-3 2-0 4-1 3-2 1.2
2015 dogs dgs 3-1 0-0 2-1 1-1 2-0 3.4
2015 bears brs 0-2 1-2 0-0 1-3 2-1 -0.2
2015 cats cts 1-4 1-1 3-1 0-0 2-2 2.0
2015 cows cws 2-3 0-2 1-2 2-2 0-0 -2.1
2014 sharks shk 0-0 1-3 2-0 4-1 3-2 0.7
2014 dogs dgs 3-1 0-0 2-1 1-1 2-0 1.8
2014 bears brs 0-2 1-2 0-0 1-3 2-1 -1.9
2014 cats cts 1-4 1-1 3-1 0-0 2-2 2.3
2014 cows cws 2-3 0-2 1-2 2-2 0-0 -3.0我想为每一行(一支球队的一个赛季)添加一列,其中包含球队对手的平均积分差距。
计算方法是:(该赛季)与一支球队的比赛次数乘以该球队(该赛季)的积分差除以(该赛季)比赛的总次数。
例如,对于2015年的sharks,,一支球队的对手的平均积分差距将是((4 x 3.4)+(2 x -0.2)+(5 x 2.0)+(5 x -2.1)) / 16.
如何计算此列,然后将其添加到数据帧中?
如下所示:
season team tm shk dgs brs cts cws avg_pt_marg opponent_marg
2015 sharks shk 0-0 1-3 2-0 4-1 3-2 1.2
2015 dogs dgs 3-1 0-0 2-1 1-1 2-0 3.4
2015 bears brs 0-2 1-2 0-0 1-3 2-1 -0.2
2015 cats cts 1-4 1-1 3-1 0-0 2-2 2.0
2015 cows cws 2-3 0-2 1-2 2-2 0-0 -2.1
2014 sharks shk 0-0 1-3 2-0 4-1 3-2 0.7
2014 dogs dgs 3-1 0-0 2-1 1-1 2-0 1.8
2014 bears brs 0-2 1-2 0-0 1-3 2-1 -1.9
2014 cats cts 1-4 1-1 3-1 0-0 2-2 2.3
2014 cows cws 2-3 0-2 1-2 2-2 0-0 -3.0发布于 2021-12-03 07:20:33
嗯,不是很漂亮,但是
do.call(
rbind,
by(df,list(df$season),function(x){
tmp=sapply(
1:nrow(x),
function(i){
unlist(
lapply(
strsplit(
as.character(x[i,grepl("tm[0-9]+",colnames(x))]),
"-"
),
function(y){
sum(as.numeric(y))
}
)
)
}
)
cbind(
x,
"opponent_marg"=colSums(tmp*x[,"avg_pt_marg"])/colSums(tmp)
)
})
)结果是
season team tm tm1 tm2 tm3 tm4 tm5 avg_pt_marg opponent_marg
2014.6 2014 team1 tm1 0-0 1-3 2-0 4-1 3-2 0.7 -0.0062500
2014.7 2014 team2 tm2 3-1 0-0 2-1 1-1 2-0 1.8 -0.3909091
2014.8 2014 team3 tm3 0-2 1-2 0-0 1-3 2-1 -1.9 0.5833333
2014.9 2014 team4 tm3 1-4 1-1 3-1 0-0 2-2 2.3 -0.8333333
2014.10 2014 team5 tm5 2-3 0-2 1-2 2-2 0-0 -3.0 0.7571429
2015.1 2015 team1 tm1 0-0 1-3 2-0 4-1 3-2 1.2 0.7937500
2015.2 2015 team2 tm2 3-1 0-0 2-1 1-1 2-0 3.4 0.3636364
2015.3 2015 team3 tm3 0-2 1-2 0-0 1-3 2-1 -0.2 1.1916667
2015.4 2015 team4 tm3 1-4 1-1 3-1 0-0 2-2 2.0 0.2400000
2015.5 2015 team5 tm5 2-3 0-2 1-2 2-2 0-0 -2.1 1.4428571发布于 2021-12-03 07:49:13
另一个不是很好的解决方案,但它是一个相当复杂的小任务-它有很多组件。我在这里使用的是data.table -如果您不熟悉它们,它们只是data.frame的一个增强版,提供了一些额外的功能
library(data.table)
setDT(dt1)首先,将数据整形为更长的格式
# Reshape the data
dt2 <- dt1[, melt(.SD, id.vars=c("tm", "team", "season", "avg_pt_marg"))]我还过滤掉了球队和对手比赛的案例。这一步还为比赛/游戏的数量创建了一个变量
# Filter out cases where team and opponent match
dt2 <- dt2[tm != variable,][,
# Get number of games played
`:=`("games_played" = as.numeric(tstrsplit(value, "-")[[1]])+
as.numeric(tstrsplit(value, "-")[[2]]))]然后,最后一步为您提供所需的值:
# Get the team/season averages
dt3 <- dt2[, sum(avg_pt_marg*games_played)/sum(games_played), keyby=.(season, "tm" = variable)]您可以将其与data.table连接合并
dt1 <- dt1[dt3, on=c("tm", "season")]给予:
season team tm shk dgs brs cts cws avg_pt_marg V1
1: 2014 sharks shk 0-0 1-3 2-0 4-1 3-2 0.7 -0.0062500
2: 2014 dogs dgs 3-1 0-0 2-1 1-1 2-0 1.8 -0.3909091
3: 2014 bears brs 0-2 1-2 0-0 1-3 2-1 -1.9 0.5833333
4: 2014 cats cts 1-4 1-1 3-1 0-0 2-2 2.3 -0.8333333
5: 2014 cows cws 2-3 0-2 1-2 2-2 0-0 -3.0 0.7571429
6: 2015 sharks shk 0-0 1-3 2-0 4-1 3-2 1.2 0.7937500
7: 2015 dogs dgs 3-1 0-0 2-1 1-1 2-0 3.4 0.3636364
8: 2015 bears brs 0-2 1-2 0-0 1-3 2-1 -0.2 1.1916667
9: 2015 cats cts 1-4 1-1 3-1 0-0 2-2 2.0 0.2400000
10: 2015 cows cws 2-3 0-2 1-2 2-2 0-0 -2.1 1.4428571https://stackoverflow.com/questions/70210549
复制相似问题