我编辑了我的问题-请阅读第二部分编辑2
我需要用壁橱时间戳连接两个数据集。第一个数据集是来自移动应用程序的日记数据集:
df1 <- data.frame(stringsAsFactors=FALSE,
datetime = c("2019-03-19T13:26:52Z", "2019-03-19T13:26:19Z",
"2019-03-19T13:23:46Z", "2019-03-19T13:22:20Z",
"2019-03-19T13:09:56Z", "2019-03-19T13:06:04Z", "2019-03-19T13:05:21Z",
"2019-03-19T13:04:37Z", "2019-03-19T12:47:28Z",
"2019-03-19T12:46:42Z"),
transport = c("Trainride", "Trainride", "Trainride", "Trainride",
"Trainride", "Trainride", "Trainride", "Trainride",
"Trainride", "Trainride"),
id = c("5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3",
"5-3"),
disc = c("start", "stop", "start", "stop", "start", "stop", "start",
"stop", "start", "stop")
)
# datetime dttr object
df1 <- df1 %>%
mutate(datetime = lubridate::as_datetime(datetime))
这里:
datetime transport id disc
1 2019-03-19 13:26:52 Trainride 5-3 start
2 2019-03-19 13:26:19 Trainride 5-3 stop
3 2019-03-19 13:23:46 Trainride 5-3 start
4 2019-03-19 13:22:20 Trainride 5-3 stop
5 2019-03-19 13:09:56 Trainride 5-3 start
6 2019-03-19 13:06:04 Trainride 5-3 stop
7 2019-03-19 13:05:21 Trainride 5-3 start
8 2019-03-19 13:04:37 Trainride 5-3 stop
9 2019-03-19 12:47:28 Trainride 5-3 start
10 2019-03-19 12:46:42 Trainride 5-3 stop
第二个数据集是加速度计日志中的数据集,显示移动(=INVH)或空闲(=NIVH):
df2 <- data.frame(stringsAsFactors=FALSE,
datetime = c("2019-03-19T23:20:00Z", "2019-03-19T23:17:30Z",
"2019-03-19T13:08:00Z", "2019-03-19T13:07:00Z",
"2019-03-19T12:38:45Z", "2019-03-19T12:32:45Z",
"2019-03-19T11:13:15Z", "2019-03-19T11:11:45Z", "2019-03-19T10:17:45Z",
"2019-03-19T10:16:45Z"),
id = c("5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3",
"5-3"),
code = c("NIVH", "INVH", "NIVH", "INVH", "NIVH", "INVH", "NIVH",
"INVH", "NIVH", "INVH")
)
# datetime dttr object
df2 <- df2 %>%
mutate(datetime = lubridate::as_datetime(datetime))
这里:
datetime id code
1 2019-03-19 23:20:00 5-3 NIVH
2 2019-03-19 23:17:30 5-3 INVH
3 2019-03-19 13:08:00 5-3 NIVH
4 2019-03-19 13:07:00 5-3 INVH
5 2019-03-19 12:38:45 5-3 NIVH
6 2019-03-19 12:32:45 5-3 INVH
7 2019-03-19 11:13:15 5-3 NIVH
8 2019-03-19 11:11:45 5-3 INVH
9 2019-03-19 10:17:45 5-3 NIVH
10 2019-03-19 10:16:45 5-3 INVH
我需要根据时间戳字段之间的时间差连接两个数据帧。例如,在df1上左转join,查看app日记数据是如何与真正的加速度计日志一致的。简单的左联接在这里不起作用,因为在大多数情况下,存在滞后时间。因此,我的问题是如何加入这两个数据集的记录,其中差异是最小绝对差。
编辑2- @soren建议的解决方案是有用的,但是如果我以开始日期时间和结束时间为间隔,将无法工作。有什么想法吗?
df1 <- df1 %>%
mutate(datetime = lubridate::as_datetime(datetime)) %>%
arrange(datetime) %>%
mutate(datetime_end = lead(datetime),
# Create an interval object.
Travel_Interval = lubridate::interval(start = datetime, end = datetime_end))
发布于 2019-03-21 19:33:52
滚动连接将完成这一任务,通过最近的日期时间连接数据帧。以下使用data.table的解决方案
library(data.table)
dt1 <- as.data.table(df1)
setkeyv(dt1,"datetime")
dt2 <- as.data.table(df2)
setkeyv(dt2,"datetime")
dt2[,nearest_date:=datetime]
dt2[dt1,roll="nearest"]
> dt2[dt1,roll="nearest"]
datetime id code nearest_date transport i.id disc
1: 2019-03-19 12:46:42 5-3 NIVH 2019-03-19 12:38:45 Trainride 5-3 stop
2: 2019-03-19 12:47:28 5-3 NIVH 2019-03-19 12:38:45 Trainride 5-3 start
3: 2019-03-19 13:04:37 5-3 INVH 2019-03-19 13:07:00 Trainride 5-3 stop
4: 2019-03-19 13:05:21 5-3 INVH 2019-03-19 13:07:00 Trainride 5-3 start
5: 2019-03-19 13:06:04 5-3 INVH 2019-03-19 13:07:00 Trainride 5-3 stop
6: 2019-03-19 13:09:56 5-3 NIVH 2019-03-19 13:08:00 Trainride 5-3 start
7: 2019-03-19 13:22:20 5-3 NIVH 2019-03-19 13:08:00 Trainride 5-3 stop
8: 2019-03-19 13:23:46 5-3 NIVH 2019-03-19 13:08:00 Trainride 5-3 start
9: 2019-03-19 13:26:19 5-3 NIVH 2019-03-19 13:08:00 Trainride 5-3 stop
10: 2019-03-19 13:26:52 5-3 NIVH 2019-03-19 13:08:00 Trainride 5-3 start
请注意,附加列"nearest_date“被添加为dt2[,nearest_date:=datetime]
,以便当dt2和dt1连接时,日期时间的值将被保留(或可用于差分计算)。默认情况下,将数据集合并/连接在一起只返回主键列。通常,这是可取的,因为联接列通常是相同的。在这里,通过滚动连接,它们不是,保留列或者手动创建冗余的列很有用,如下面的例子所示。
还请注意,最近的id
值和datetime
值也可能与应用程序相关(仅给出OP中的一个id值)。这是通过设置键来实现的,键在联接语法中使用:
setkeyv(dt2,c("id","datetime"))
setkeyv(dt1,c("id","datetime"))
https://stackoverflow.com/questions/55287807
复制相似问题