我想通过id计算前一年窗口中当前行之前的行数。
以下是我的数据:
df <- structure(list(id = c("1", "1", "1", "1",
"2", "2", "2", "2", "2", "2", "2",
"2", "2"), flag = c(1, 1, 0, 1, 0, 0, 1, 1,
1, 1, 1, 1, 1), date = structure(c(15425, 15456, 16613,
16959, 15513, 15513, 15625, 15635, 15649, 15663, 15670, 16051,
16052), class = "Date")), sorted = "id", class = c("data.table",
"data.frame"), row.names = c(NA, -13L))
roll_sum <- c(0, 1, 0, 1, 0, 1, 2, 3, 4, 5, 6, 0, 1)
flag_sum <- c(0, 1, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 1)
df_desired <- cbind(df, roll_sum) # roll_sum: number of rows excluding current row in 1 year time frame rolling
df_desired <- cbind(df_desired, flag_sum) # flag_sum: number of rows excluding current row in 1 year time frame rolling where flag was 1
数据:
id flag date
1: 1 1 2012-03-26
2: 1 1 2012-04-26
3: 1 0 2015-06-27
4: 1 1 2016-06-07
5: 2 0 2012-06-22
6: 2 0 2012-06-22
7: 2 1 2012-10-12
8: 2 1 2012-10-22
9: 2 1 2012-11-05
10: 2 1 2012-11-19
11: 2 1 2012-11-26
12: 2 1 2013-12-12
13: 2 1 2013-12-13
输出:
df_desired
id flag date roll_sum flag_sum
1: 1 1 2012-03-26 0 0
2: 1 1 2012-04-26 1 1
3: 1 0 2015-06-27 0 0
4: 1 1 2016-06-07 1 0
5: 2 0 2012-06-22 0 0
6: 2 0 2012-06-22 1 0
7: 2 1 2012-10-12 2 0
8: 2 1 2012-10-22 3 1
9: 2 1 2012-11-05 4 2
10: 2 1 2012-11-19 5 3
11: 2 1 2012-11-26 6 4
12: 2 1 2013-12-12 0 0
13: 2 1 2013-12-13 1 1
我尝试了G.Grothendieck在Compute rolling sum by id variables, with missing timepoints中使用zoo
给出的解决方案,但它给了我一个错误:
merge.zoo(z,g)中出现
错误:系列不能与系列中的非唯一索引条目合并。此外:警告消息: in merge.zoo(count,date):
我使用make.index.unique
和make.time.unique
使date列具有唯一性。
任何有关优化解决方案的帮助都将是appreciated.Thanks。
发布于 2018-06-21 09:13:28
不确定这对您的数据维度是否有帮助。
首先,create running index来处理重复日期和汇总不能包含prev dupe date以及一年前的create date (我认为365更好,但看起来OP需要366)。
然后,执行非equi自联接,同时确保未使用prev dupe日期且日期在一年内。
df[, c("rn", "oneYrAgo") := .(.I, date - 366)]
df[df,
.(roll_sum=.N, flag_sum=sum(flag, na.rm=TRUE)),
on=.(date >= oneYrAgo, rn < rn, id, date <= date),
by=.EACHI][,
-seq_len(2L)]
结果:
id date roll_sum flag_sum
1: 1 2012-03-26 0 0
2: 1 2012-04-26 1 1
3: 1 2015-06-27 0 0
4: 1 2016-06-07 1 0
5: 2 2012-06-22 0 0
6: 2 2012-06-22 1 0
7: 2 2012-10-12 2 0
8: 2 2012-10-22 3 1
9: 2 2012-11-05 4 2
10: 2 2012-11-19 5 3
11: 2 2012-11-26 6 4
12: 2 2013-12-12 0 0
13: 2 2013-12-13 1 1
https://stackoverflow.com/questions/50952958
复制相似问题