我正在努力使用R脚本生成同一列中两个日期之间的日期序列。
我有请求id和序列ID,日期和状态。Input table
我的需求是生成这样的表。desired output table
在这方面的任何帮助都将不胜感激。
谢谢
发布于 2018-05-25 22:14:52
您可以使用tidyverse库来做到这一点。首先,使用lubridate包中的dmy将date列设置为date。然后,您可以使用tidyr函数complete和fill来扩展数据表,如下所示。complete可以选择在白天填补空白。group_by ReqID为每个单独的标识符执行此操作。
library(tidyverse)
library(lubridate)
df <- data_frame(ReqID = 100, ID_Seq = 1:3, Created = dmy("01/01/2018","10/01/2018","18/01/2018"), Status = c("Scheduled","In Execution", "Completed"))
df %>%
group_by(ReqID) %>%
complete(Created = seq.Date(min(Created),max(Created), by = "day")) %>%
fill(ReqID,ID_Seq,Status)
## A tibble: 18 x 4
# Created ReqID ID_Seq Status
# <date> <dbl> <int> <chr>
# 1 2018-01-01 100 1 Scheduled
# 2 2018-01-02 100 1 Scheduled
# 3 2018-01-03 100 1 Scheduled
# 4 2018-01-04 100 1 Scheduled
# 5 2018-01-05 100 1 Scheduled
# 6 2018-01-06 100 1 Scheduled
# 7 2018-01-07 100 1 Scheduled
# 8 2018-01-08 100 1 Scheduled
# 9 2018-01-09 100 1 Scheduled
#10 2018-01-10 100 2 In Execution
#11 2018-01-11 100 2 In Execution
#12 2018-01-12 100 2 In Execution
#13 2018-01-13 100 2 In Execution
#14 2018-01-14 100 2 In Execution
#15 2018-01-15 100 2 In Execution
#16 2018-01-16 100 2 In Execution
#17 2018-01-17 100 2 In Execution
#18 2018-01-18 100 3 Completed 发布于 2018-05-31 23:36:44
谢谢你,Jasbner!我已经按照建议安装了dplyr和tidyr包。我正在使用'mutate‘来修正日期格式。
我的csv文件(file.csv)保存这些数据行
ReqID序列已创建状态
100 1/01/2018计划
100 2 10/01/2018执行
100 3/01/2018暂缓
100 4 18/01/2018完成
101 1 10/01/2018预定
101 2 18/01/2018执行
101 3 20/01/2018完成
102 1 18/2018预定
102 2 22/01/2018执行
102 3 25/01/2018已取消
103 1/02/2018预定
#我最后的r脚本
mydata<-read.csv('file.csv') #从csv读取数据
myindf<-as.data.frame(mydata) #转换成数据帧
myoutdf <- myindf %>% mutate(Created = dmy(Created)) %>% group_by(ReqID) %>% complete(Created = seq.Date(min(Created),max(Created),by = "day")) %>% fill(ReqID,Seq,Status)
print(myoutdf,n= 38) #打印所有38行
https://stackoverflow.com/questions/50530914
复制相似问题