按新年拆分日期

内容来源于 Stack Overflow,并遵循CC BY-SA 3.0许可协议进行翻译与使用

  • 回答 (2)
  • 关注 (0)
  • 查看 (12)

我有来自医院的数据,其中包含许多变量,以及每行的日期和日期,这告诉我们每行何时“有效”。每行最多可以有效一年。

test = data.frame(ID=c(10,10,10,12,12), Disease=c("P","P","P","D","P"), Pass=c("US","US","US","EN","EN"),
                  Payment=c(110,110,115,240,255), 
                  from_date=as.POSIXct(c("2008-01-09","2009-01-09","2010-01-09","2008-01-01","2013-12-31")),
                  to_date=as.POSIXct(c("2009-01-08","2010-01-08","2011-01-08","2008-12-31","2014-12-30"))
                  )

对于从一年传递到另一年的行,我想分割行,这样我最终得到两行而不是原始行,并且还操纵from_date和to_date,这样我最终得到一个新的数据集看起来像这样:

  test_desired = data.frame(ID=c(10,10,10,10,10,10,12,12,12), Disease=c("P","P","P","P","P","P","D","P","P"), Pass=c("US","US","US","US","US","US","EN","EN","EN"),
                              Payment=c(110,110,110,110,115,115,240,255,255), 
                              from_date=as.POSIXct(c("2008-01-09","2009-01-01","2009-01-09","2009-01-01","2010-01-09","2011-01-01","2008-01-01","2013-12-31","2014-01-01")),
                              to_date=as.POSIXct(c("2008-12-31","2009-01-08","2009-12-31","2010-01-08","2010-12-31","2011-01-08","2008-12-31","2013-12-31","2014-12-30"))
    )    

尝试

library(lubridate) #for function "year" below
test_desired=test
row=c()
tmp=c()
for(i in 1:nrow(test_desired)){
  if(year(test_desired$from_date)[i]<year(test_desired$to_date)[i]){
    test_desired$to_date[i] = as.POSIXct(paste0(year(test_desired$from_date[i]),"-12-31"))
    row = test_desired[i,]
    row$from_date = as.POSIXct(paste0(year(test$to_date[i]),"-01-01"))
    row$to_date = test$to_date[i]
    tmp=rbind(tmp,row)

  } else next
}
test_desired=rbind(test_desired,tmp)
library(dplyr)
test_desired=arrange(test_desired,ID,from_date)

是否有更优雅的方式来做到这一点,例如使用dplyr?

提问于
用户回答回答于

我只是使用一个data.table也提供了一个year函数,并忽略了可能的慢速日期转换逻辑as.POSIXct

我也假设,to_date并且from_date可能仅相差一年(不超过一年!)。

library(data.table)  # also provides a "year" function

setDT(test)

# Create additional rows for the new year
additional_rows <- test[year(from_date) < year(to_date), ]
additional_rows[, from_date := as.POSIXct(paste0(year(to_date),"-01-01"))]
additional_rows

# Shorten the "from_date" of the affected original rows
test[year(from_date) < year(to_date), to_date := as.POSIXct(paste0(year(from_date),"-12-31"))]
test

# Create a combined data table as result
result <- rbind(test, additional_rows)
setkey(result, ID, Payment, from_date)  # just to sort the data like the "test_desired" sort order
result

结果

   ID Disease Pass Payment  from_date    to_date
1: 10       P   US     110 2008-01-09 2008-12-31
2: 10       P   US     110 2009-01-01 2009-01-08
3: 10       P   US     110 2009-01-09 2009-12-31
4: 10       P   US     110 2010-01-01 2010-01-08
5: 10       P   US     115 2010-01-09 2010-12-31
6: 10       P   US     115 2011-01-01 2011-01-08
7: 12       D   EN     240 2008-01-01 2008-12-31
8: 12       P   EN     255 2013-12-31 2013-12-31
9: 12       P   EN     255 2014-01-01 2014-12-30
用户回答回答于

使用from_dateto_date我们可以创建一个日期序列,seq.Date然后按年分割这个序列,最后选择每年的最小值和最大值。然后使用applyseparate_rowsseparate获得最终结果。

cr_date <- function(d1, d2){
    #browser()
    sequence_date <- seq.Date(as.Date(d1), as.Date(d2), by='day') 
    lst_dates <- lapply(split(sequence_date, lubridate::year(sequence_date)),
                        function(x) paste0(min(x), '|', max(x)))
    result <- paste0(lst_dates, collapse = ';')
    return(result)
  }

#Test
#cr_date(as.Date('2008-01-09'),as.Date('2009-01-08'))
test$flag <- apply(test, 1, function(x) cr_date(x['from_date'], x['to_date']))

library(tidyr)
separate_rows(test, flag, sep=';') %>% 
  separate(flag, into = c('from_date_new','to_date_new'), '\\|') %>% 
  mutate_at(vars('from_date_new','to_date_new'), list(~as.Date(.)))


    ID Disease Pass Payment  from_date    to_date from_date_new to_date_new
  1 10       P   US     110 2008-01-09 2009-01-08    2008-01-09  2008-12-31
  2 10       P   US     110 2008-01-09 2009-01-08    2009-01-01  2009-01-08
  3 10       P   US     110 2009-01-09 2010-01-08    2009-01-09  2009-12-31
  4 10       P   US     110 2009-01-09 2010-01-08    2010-01-01  2010-01-08
  5 10       P   US     115 2010-01-09 2011-01-08    2010-01-09  2010-12-31
  6 10       P   US     115 2010-01-09 2011-01-08    2011-01-01  2011-01-08
  7 12       D   EN     240 2008-01-01 2008-12-31    2008-01-01  2008-12-31
  8 12       P   EN     255 2013-12-31 2014-12-30    2013-12-31  2013-12-31
  9 12       P   EN     255 2013-12-31 2014-12-30    2014-01-01  2014-12-30

扫码关注云+社区

领取腾讯云代金券