简化的原始数据如下所示
Data group
2016/1/10 1
2016/2/4 1
2016/3/25 1
2016/4/13 1
2016/5/5 1
2016/7/1 2
2016/8/1 2
2016/10/1 2
2016/12/1 2
2016/12/31 2我想得到的最终数据是这样的:
Data group cum_diff_preceding
2016/1/10 1 0
2016/2/4 1 25
2016/3/25 1 125
2016/4/13 1 182
2016/5/5 1 270
2016/7/1 2 0
2016/8/1 2 31
2016/10/1 2 153
2016/12/1 2 336
2016/12/31 2 380计算方法如下:
for row 2016/1/10, cum_diff_preceding is 0
for row 2016/2/4, cum_diff_preceding is (2016/2/4-2016/1/10)
for row 2016/3/25, cum_diff_preceding is (2016/3/25-2016/1/10)+(2016/3/25-2016/2/4)
for row 2016/4/13, cum_diff_preceding is (2016/4/13-2016/1/10)+(2016/4/13- 2016/2/4)+(2016/4/13-2016/3/25)
for row 2016/5/5, cum_diff_preceding is (2016/5/5-2016/1/10)+(2016/5/5- 2016/2/4)+(2016/5/5-2016/3/25)+(2016/4/13-2016/4/13)
for row 2016/7/1, cum_diff_preceding is 0
for row 2016/8/1, cum_diff_preceding is (2016/8/1-2016/7/1)
for row 2016/10/1, cum_diff_preceding is (2016/10/1-2016/7/1)+(2016/10/1- 2016/8/1)
for row 2016/12/1, cum_diff_preceding is (2016/12/1-2016/7/1)+(2016/10/1- 2016/8/1)+(2016/10/1- 2016/10/1)
for row 2016/12/31, cum_diff_preceding is (2016/12/31-2016/7/1)+(2016/10/1- 2016/8/1)+(2016/10/1- 2016/10/1)+(2016/12/31- 2016/12/1)我的主要代码如下
>as.Date(df$Data,"%Y-%m-%d")
>fun_forcast<-function(df){for(i in 2:nrow(df)){df$cum_diff_preceeding[i]<-sum(df$data[i]-df$data[1:(i-1)])}}
>ddply(df,.(group),transform,cum_diff_preceding<-fun_forcast)但这不管用。
或者当我将代码更改为
>fun_forcast<-function(df)(df$cum_diff_preceding<-sapply(1:NROW(df), >function(i) sum(df$data[i] - df$data[1:(i-1)])))
ddply(df,.(group),fun_forcast)它可以工作,但是结果格式是
> ddply(df,.(group),fun_forcast)
group V1 V2 V3 V4 V5
1 1 0 25 125 182 270
2 2 0 31 153 336 380我不知道如何将结果返回到原始cum_diff_preceding中的data.frame中。
请
发布于 2017-08-06 11:10:09
我们可以使用来自ave的base R来完成这个任务。
df$Data <- as.Date(df$Data, "%Y/%m/%d")
fun_forcast <- function(v1) sapply(seq_along(v1), function(i) sum(v1[i] - v1[1:(i-1)]))
df$cum_diff_preceding <- with(df, ave(as.numeric(Data), group, FUN = fun_forcast))
df$cum_diff_preceding
#[1] 0 25 125 182 270 0 31 153 336 456或者使用dplyr
library(dplyr)
df %>%
group_by(group) %>%
mutate(cum_diff_preceding = fun_forcast(Data))
# A tibble: 10 x 3
# Groups: group [2]
# Data group cum_diff_preceding
# <date> <int> <dbl>
# 1 2016-01-10 1 0
# 2 2016-02-04 1 25
# 3 2016-03-25 1 125
# 4 2016-04-13 1 182
# 5 2016-05-05 1 270
# 6 2016-07-01 2 0
# 7 2016-08-01 2 31
# 8 2016-10-01 2 153
# 9 2016-12-01 2 336
#10 2016-12-31 2 456发布于 2017-08-06 12:00:14
通过将日期转换为数字,并推广公式:
df %>%
group_by(group) %>%
mutate(numdata = as.numeric(Data),
cum_diff_preceding = (1:n())*numdata-cumsum(numdata)) %>%
select(-numdata)
# A tibble: 10 x 3
# Groups: group [2]
# Data group cum_diff_preceding
# <date> <int> <dbl>
# 1 2016-01-10 1 0
# 2 2016-02-04 1 25
# 3 2016-03-25 1 125
# 4 2016-04-13 1 182
# 5 2016-05-05 1 270
# 6 2016-07-01 2 0
# 7 2016-08-01 2 31
# 8 2016-10-01 2 153
# 9 2016-12-01 2 336
# 10 2016-12-31 2 456https://stackoverflow.com/questions/45531286
复制相似问题