首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >如何平整/合并重叠时间段

如何平整/合并重叠时间段
EN

Stack Overflow用户
提问于 2015-03-09 08:58:01
回答 4查看 6.2K关注 0票数 20

我有大量的时间段数据集,由“开始”和“结束”列定义。有些时期重叠。

我想合并(扁平/合并/折叠)所有重叠的时间段,使之有一个“开始”值和一个“结束”值。

一些示例数据:

代码语言:javascript
运行
复制
  ID      start        end
1  A 2013-01-01 2013-01-05
2  A 2013-01-01 2013-01-05
3  A 2013-01-02 2013-01-03
4  A 2013-01-04 2013-01-06
5  A 2013-01-07 2013-01-09
6  A 2013-01-08 2013-01-11
7  A 2013-01-12 2013-01-15

预期结果:

代码语言:javascript
运行
复制
  ID      start        end
1  A 2013-01-01 2013-01-06
2  A 2013-01-07 2013-01-11
3  A 2013-01-12 2013-01-15

我试过的是:

代码语言:javascript
运行
复制
  require(dplyr)
  data <- structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L), class = "factor", .Label = "A"), 
    start = structure(c(1356998400, 1356998400, 1357084800, 1357257600, 
    1357516800, 1357603200, 1357948800), tzone = "UTC", class = c("POSIXct", 
    "POSIXt")), end = structure(c(1357344000, 1357344000, 1357171200, 
    1357430400, 1357689600, 1357862400, 1358208000), tzone = "UTC", class = c("POSIXct", 
    "POSIXt"))), .Names = c("ID", "start", "end"), row.names = c(NA, 
-7L), class = "data.frame")

remove.overlaps <- function(data){
data2 <- data
for ( i in 1:length(unique(data$start))) {
x3 <- filter(data2, start>=data$start[i] & start<=data$end[i])
x4 <- x3[1,]
x4$end <- max(x3$end)
data2 <- filter(data2, start<data$start[i] | start>data$end[i])
data2 <- rbind(data2,x4)  
}
data2 <- na.omit(data2)}

data <- remove.overlaps(data)
EN

Stack Overflow用户

发布于 2018-12-21 20:50:11

看来我的派对有点晚了,但我采用了@zach的代码,并在下面使用data.table重新编写了它。我没有进行全面的测试,但这似乎比tidy版本快了20%。(我无法测试IRange方法,因为R3.5.1还没有这个包)

而且,fwiw,可接受的答案并不能捕捉一个日期范围完全在另一个日期范围内的边缘情况(例如,2018-07-072017-07-14是在2018-05-012018-12-01的范围内)。@zach的回答确实抓住了这一边缘问题。

代码语言:javascript
运行
复制
library(data.table)

start_col = c("2018-01-01","2018-03-01","2018-03-10","2018-03-20","2018-04-10","2018-05-01","2018-05-05","2018-05-10","2018-07-07")
end_col = c("2018-01-21","2018-03-21","2018-03-31","2018-04-09","2018-04-30","2018-05-21","2018-05-26","2018-05-30","2018-07-14")

# create fake data, double it, add ID
# change row 17, such that each ID grouping is a little different
# also adds an edge case in which one date range is totally within another
# (this is the edge case not currently captured by the accepted answer)
d <- data.table(start_col = as.Date(start_col), end_col = as.Date(end_col))
d2<- rbind(d,d)
d2[1:(.N/2), ID := 1]
d2[(.N/2 +1):.N, ID := 2]
d2[17,end_col := as.Date('2018-12-01')]

# set keys (also orders)
setkey(d2, ID, start_col, end_col)

# get rid of overlapping transactions and do the date math
squished <- d2[,.(START_DT = start_col, 
                  END_DT = end_col, 
                  indx = c(0, cumsum(as.numeric(lead(start_col)) > cummax(as.numeric(end_col)))[-.N])),
               keyby=ID
               ][,.(start=min(START_DT), 
                    end = max(END_DT)),
                 by=c("ID","indx")
                 ]
票数 2
EN
查看全部 4 条回答
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/28938147

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档