我有一个类似于这样的事件的数据:
EVENT DATE LONG LAT TYPE
1 1/1/2000 23 45 A
2 2/1/2000 23 45 B
3 3/1/2000 23 45 B
3 5/2/2000 22 56 A
4 6/2/2000 19 21 A我希望将此折叠起来,以便将在同一位置连续发生的任何事件(如LONG,LAT所定义的)折叠为一个具有起始日期和结束日期的单个事件,以及所涉及类型的级联列。
因此,上表将成为:
EVENT START-DATE END-DATE LONG LAT TYPE
1 1/1/2000 3/1/2000 23 45 ABB
2 5/2/2000 5/2/2000 22 56 A
3 6/2/2000 6/2/2000 19 21 A任何关于如何最好地解决这一问题的建议都将不胜感激。
发布于 2017-09-01 05:36:22
以下是Ronak解决方案的修改版本,将同一地点的非连续事件作为单独的事件周期。
# expanded data sample
df <- data.frame(
DATE = as.Date(c("2000-01-01", "2000-01-02", "2000-01-03", "2000-01-05",
"2000-02-05", "2000-02-06", "2000-02-07"), format = "%Y-%m-%d"),
LONG = c(23, 23, 23, 23, 22, 19, 22),
LAT = c(45, 45, 45, 45, 56, 21, 56),
TYPE = c("A", "B", "B", "A", "A", "B", "A")
)
library(dplyr)
df %>%
group_by(LONG, LAT) %>%
arrange(DATE) %>%
mutate(DATE.diff = c(1, diff(DATE))) %>%
mutate(PERIOD = cumsum(DATE.diff != 1)) %>%
ungroup() %>%
group_by(LONG, LAT, PERIOD) %>%
summarise(START_DATE = min(DATE),
END_DATe = max(DATE),
TYPE = paste(TYPE, collapse = "")) %>%
ungroup()
# A tibble: 5 x 6
LONG LAT PERIOD START_DATE END_DATe TYPE
<dbl> <dbl> <int> <date> <date> <chr>
1 19 21 0 2000-02-06 2000-02-06 B
2 22 56 0 2000-02-05 2000-02-05 A
3 22 56 1 2000-02-07 2000-02-07 A
4 23 45 0 2000-01-01 2000-01-03 ABB
5 23 45 1 2000-01-05 2000-01-05 A编辑以添加对“句点”变量所发生的情况的解释。
为了简单起见,让我们考虑在同一位置连续和非连续的一些事件,这样我们就可以跳过group_by(LONG, LAT)和arrange(DATE)步骤:
# sample dataset of 10 events at the same location.
# first 3 are on consecutive days, next 2 are on consecutive days,
# next 4 are on consecutive days, & last 1 is on its own.
df2 <- data.frame(
DATE = as.Date(c("2001-01-01", "2001-01-02", "2001-01-03",
"2001-01-05", "2001-01-06",
"2001-02-01", "2001-02-02", "2001-02-03", "2001-02-04",
"2001-04-01"), format = "%Y-%m-%d"),
LONG = rep(23, 10),
LAT = rep(45, 10),
TYPE = LETTERS[1:10]
)作为中间步骤,我们创建了一些助手变量:
DATE.diff != 1更改为DATE.diff > 1。作为辅助变量的结果,“期间”对于每组连续日期具有不同的值。
df2.intermediate <- df2 %>%
mutate(DATE.diff = c(1, diff(DATE))) %>%
mutate(non.consecutive = DATE.diff != 1) %>%
mutate(PERIOD = cumsum(non.consecutive))
> df2.intermediate
DATE LONG LAT TYPE DATE.diff non.consecutive PERIOD
1 2001-01-01 23 45 A 1 FALSE 0
2 2001-01-02 23 45 B 1 FALSE 0
3 2001-01-03 23 45 C 1 FALSE 0
4 2001-01-05 23 45 D 2 TRUE 1
5 2001-01-06 23 45 E 1 FALSE 1
6 2001-02-01 23 45 F 26 TRUE 2
7 2001-02-02 23 45 G 1 FALSE 2
8 2001-02-03 23 45 H 1 FALSE 2
9 2001-02-04 23 45 I 1 FALSE 2
10 2001-04-01 23 45 J 56 TRUE 3然后,我们可以将“期”视为分组变量,以便找到每个期间内的开始/结束日期&事件:
df2.intermediate %>%
group_by(PERIOD) %>%
summarise(START_DATE = min(DATE),
END_DATe = max(DATE),
TYPE = paste(TYPE, collapse = "")) %>%
ungroup()
# A tibble: 4 x 4
PERIOD START_DATE END_DATe TYPE
<int> <date> <date> <chr>
1 0 2001-01-01 2001-01-03 ABC
2 1 2001-01-05 2001-01-06 DE
3 2 2001-02-01 2001-02-04 FGHI
4 3 2001-04-01 2001-04-01 Jhttps://stackoverflow.com/questions/45993458
复制相似问题