我有一个列日期时间(start)和datetime_end的数据集。在数据操作之后,我想按每一行每分钟来分解这个间隔--假设我有这个间隔
datetime datetime_end id disc
2019-03-19 12:47:28 2019-03-19 12:50:37 5-3 start
我想把它分成几分钟来做这样的事情:
datetime id disc
2019-03-19 12:48:00 5-3 start
2019-03-19 12:49:00 5-3 start
2019-03-19 12:50:00 5-3 start
2019-03-19 12:51:00 5-3 start
这是假数据
df1 <- data.frame(stringsAsFactors=FALSE,
datetime = c("2019-03-19T13:26:52Z", "2019-03-19T13:26:19Z",
"2019-03-19T13:23:46Z", "2019-03-19T13:22:20Z",
"2019-03-19T13:09:56Z", "2019-03-19T13:06:04Z", "2019-03-19T13:05:21Z",
"2019-03-19T13:04:37Z", "2019-03-19T12:47:28Z",
"2019-03-19T12:46:42Z"),
id = c("5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3",
"5-3"),
disc = c("car", "stop", "start", "stop", "start", "stop", "start",
"stop", "start", "stop")
)
我试着使用lubridate::interval函数来创建一个interval对象(旅行间隔),但是我很难按每一行分钟来分解它(如上面所示)。所以,如果有人知道解决办法,我会非常感激的。
这是我的剧本
library(tidyverse)
library(lubridate)
df <- df1 %>%
mutate(datetime = lubridate::as_datetime(datetime)) %>%
arrange(datetime) %>%
mutate(datetime_end = lead(datetime),
# Create an interval object.
Travel_Interval =
lubridate::interval(start = datetime, end = datetime_end)) %>%
filter(!is.na(Travel_Interval)) %>%
# select(-Travel_Interval)
select(datetime,datetime_end , id , disc,Travel_Interval) %>%
filter(disc == "start")
发布于 2019-06-15 11:04:44
我会用purrr::map2()
来做这个:
# take df1 %>% mutate datetime column to datetime format %>% sort by datetime
# %>% add datetime_end as lead of datetime %>% filter out records with no
# recorded datetime_end %>% mutate to create column 'minute' by using
# purrr::map2 to iterate over each datetime and datetime_end pair and apply the
# following function {create an sequence of datestamps starting at the "minute
# ceiling" of 'start'datetime' and ending at the "minute ceiling" of
# 'datetime_end in one minute intervals} %>% since the resultant column is a
# list, we have to unnest the data
df <- df1 %>%
mutate(datetime = as_datetime(datetime)) %>%
arrange(datetime) %>%
mutate(datetime_end = lead(datetime, n = 1L)) %>%
filter(!is.na(datetime_end)) %>%
mutate(minute = purrr::map2(datetime, datetime_end, function(start, stop) {
seq.POSIXt(from = ceiling_date(start, 'minute'), to = ceiling_date(stop, 'minute'), by = 'min')
})) %>%
unnest()
但是,请注意,由于您使用某种形式的舍入(在这种情况下使用上限)将时间戳有效地缩短为分钟间隔,您将不得不决定如何处理边界情况。例如:disc
== "stop“的第一行将以minute
== 2019-03-19 12:48:00结束,但随后的disc
== " start”_run的第一行也将以minute
== 2019-03-19 12:48:00开始:
datetime id disc datetime_end minute
1 2019-03-19 12:46:42 5-3 stop 2019-03-19 12:47:28 2019-03-19 12:47:00
2 2019-03-19 12:46:42 5-3 stop 2019-03-19 12:47:28 2019-03-19 12:48:00
3 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:48:00
4 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:49:00
5 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:50:00
6 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:51:00
7 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:52:00
8 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:53:00
9 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:54:00
10 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:55:00
11 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:56:00
12 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:57:00
13 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:58:00
14 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 12:59:00
15 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:00:00
16 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:01:00
17 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:02:00
18 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:03:00
19 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:04:00
20 2019-03-19 12:47:28 5-3 start 2019-03-19 13:04:37 2019-03-19 13:05:00
21 2019-03-19 13:04:37 5-3 stop 2019-03-19 13:05:21 2019-03-19 13:05:00
22 2019-03-19 13:04:37 5-3 stop 2019-03-19 13:05:21 2019-03-19 13:06:00
发布于 2019-06-15 11:33:30
df1 %>%
mutate(datetime = lubridate::as_datetime(datetime)) %>%
arrange(datetime) %>%
mutate(datetime_end = lead(datetime)) %>%
filter(!is.na(datetime_end)) %>%
mutate_at(vars(contains("datetime")), ~ round_date(.x + seconds(30), unit = "minute")) %>%
mutate(diff = time_length(interval(datetime, datetime_end), unit = "minutes")) %>%
mutate(time = map2(datetime, diff, ~ .x + minutes(seq(0, .y)))) %>%
unnest(time)
我只是想贴出来,因为我已经在写了--尽管已经有了很好的答案。这使用lubridate
函数time_length
和interval
来获取序列。
https://stackoverflow.com/questions/56609502
复制相似问题