文章/答案/技术大牛

发布

社区首页 >问答首页 >时间间隔:当样品超过小时标记时，按一天的小时分组

问时间间隔:当样品超过小时标记时，按一天的小时分组
EN

Stack Overflow用户

提问于 2020-03-03 21:03:14

回答 4查看 420关注 0票数 1

我有两种鸟类行为的持续时间的视频数据，当鸟在巢上和鸟离开巢的时候。就我的分析而言，我需要每小时都要在鸟巢上下工作。然而，通常情况下，不同的行为重叠于小时标记。例如，这只鸟在4:10-4:42和4:50-5:20的鸟巢上，我需要把第二个时段分开，分别是4:50-5:00和5:00-5:20，这样我就可以每小时加起来。我已经寻找了相当一段时间的包装润滑油，但没有找到一个方法，但认为一定有一些东西。有什么建议吗？

样本数据如下。"off.time.diff“是以秒为单位的"off.bout.id"，而对于"on.time.diff”则是一样的。用这里的一个例子，鸟是从17:25:39到18:03:29。我可以得到总时间(2270秒)，但不知道如何区分这个每小时。

Event   DT.event        off.bout.ID  on.bout.ID  off.time.diff  on.time.diff
off     4/27/12 17:25:13    1          0           NA               NA
on      4/27/12 17:25:39    1          1           26               NA
off     4/27/12 18:03:29    2          1           NA              2270
on      4/27/12 18:03:57    2          2           28               NA
off     4/27/12 19:41:16    3          2           NA              5839
on      4/27/12 19:43:50    3          3           154              NA
off     4/28/12 6:23:57     4          3           NA              38407
on      4/28/12 6:32:13     4          4           496              NA
off     4/28/12 6:40:20     5          4           NA              487
on      4/28/12 6:40:48     5          5           28               NA
off     4/28/12 8:16:07     6          5           NA              5719

datetime

time

intervals

lubridate

回答 4

Stack Overflow用户

回答已采纳

发布于 2020-03-03 23:43:56

一个比达里奥更漂亮的解决方案可以用tidyverse来实现：

读取数据

a =  
        read.csv(header = F, sep = ";",
                 col.names = c("Event","DT.event","off.bout.ID","on.bout.ID","off.time.diff","on.time.diff"),
                 text = gsub(pattern = "\\s+{2}",replacement = ";", 
                             x="off     4/27/12 17:25:13    1          0           NA               NA
                        on      4/27/12 17:25:39    1          1           26               NA
                        off     4/27/12 18:03:29    2          1           NA               2270
                        on      4/27/12 18:03:57    2          2           28               NA
                        off     4/27/12 19:41:16    3          2           NA               5839
                        on      4/27/12 19:43:50    3          3           154              NA
                        off     4/28/12 6:23:57     4          3           NA               38407
                        on      4/28/12 6:32:13     4          4           496              NA
                        off     4/28/12 6:40:20     5          4           NA               487
                        on      4/28/12 6:40:48     5          5           28               NA
                        off     4/28/12 8:16:07     6          5           NA               5719"
                 )
        ) 

a$DT.event <- mdy_hms(a$DT.event)

添加包含可能感兴趣的时数的新行。

b <- a %>% select(DT.event) %>%
        mutate(DT.event = floor_date(DT.event,"hours")) %>%
        group_by(DT.event) %>%
        summarise() %>%
        full_join(a) %>%
        arrange(DT.event)

发现差异

c <- b %>% fill(Event, .direction = "up") %>%
        mutate(on.time.diff.hour = ifelse(Event == "off",
                                          difftime(DT.event, lag(DT.event),
                                                   "secs"), NA))

您只需要注意检查第二行中是否有额外的值(因为在第二行之前没有)。

结果

# A tibble: 16 x 7
   DT.event            Event off.bout.ID on.bout.ID off.time.diff on.time.diff on.time.diff.hour
   <dttm>              <fct>       <int>      <int>         <int>        <int>             <dbl>
 1 2012-04-27 17:00:00 off            NA         NA            NA           NA                NA
 2 2012-04-27 17:25:13 off             1          0            NA           NA              1513
 3 2012-04-27 17:25:39 on              1          1            26           NA                NA
 4 2012-04-27 18:00:00 off            NA         NA            NA           NA              2061
 5 2012-04-27 18:03:29 off             2          1            NA         2270               209
 6 2012-04-27 18:03:57 on              2          2            28           NA                NA
 7 2012-04-27 19:00:00 off            NA         NA            NA           NA              3363
 8 2012-04-27 19:41:16 off             3          2            NA         5839              2476
 9 2012-04-27 19:43:50 on              3          3           154           NA                NA
10 2012-04-28 06:00:00 off            NA         NA            NA           NA             36970
11 2012-04-28 06:23:57 off             4          3            NA        38407              1437
12 2012-04-28 06:32:13 on              4          4           496           NA                NA
13 2012-04-28 06:40:20 off             5          4            NA          487               487
14 2012-04-28 06:40:48 on              5          5            28           NA                NA
15 2012-04-28 08:00:00 off            NA         NA            NA           NA              4752
16 2012-04-28 08:16:07 off             6          5            NA         5719               967

票数 1

Stack Overflow用户

发布于 2020-03-03 22:49:43

我的建议背后的想法是检查每一个事件有多少个完整的小时标记被传递，并在每小时中插入一个额外的行，并相应地改变时间。

加载示例数据：

df <- read.table(text='Event   DT.event        off.bout.ID  on.bout.ID  off.time.diff  on.time.diff
off     4/27/12-17:25:13    1          0           NA               NA
on      4/27/12-17:25:39    1          1           26               NA
off     4/27/12-18:03:29    2          1           NA              2270
on      4/27/12-18:03:57    2          2           28               NA
off     4/27/12-19:41:16    3          2           NA              5839
on      4/27/12-19:43:50    3          3           154              NA
off     4/28/12-6:23:57     4          3           NA              38407
on      4/28/12-6:32:13     4          4           496              NA
off     4/28/12-6:40:20     5          4           NA              487
on      4/28/12-6:40:48     5          5           28               NA
off     4/28/12-8:16:07     6          5           NA              5719', header=T, stringsAsFactors=F)

设置日期时间变量。必要时调整tz参数：

df$DT.event <- as.POSIXct(df$DT.event, format = "%m/%d/%y-%H:%M:%S")


library(dplyr)
library(tidyr)

# reshape data
# 
df2 <- df %>%
  select(Event, DT.event, on.bout.ID) %>% 
  pivot_wider(names_from = Event,
              values_from = DT.event) %>% 
  select(on.bout.ID, on, off)

df2是df以更广泛的形式提供的一些信息：

on.bout.ID on 10 NA 2012-04-27 17:25:13 2 1 2012-04-27 17:25:39 2012-04-27 18:03:29 3 2 2012-04-27 18:03:57 2012-04-27 19:41:16 4 3 2012-04-27 19:43:50 2012-04-28 06:23:57 5 2012-04-28 :32:13 2012-04-28 06:40:20 6 2012-04-28 06:40:48 2012-04- 08:16:07

# Make a copy so we don't mutate the object we are using to iterate
#
df3 <- df2

for (i in seq_along(df2$on.bout.ID)) {

  # extract current iterations start and end time
  # 
  id <- df2$on.bout.ID[i]
  from <- df2$on[i]
  to <- df2$off[i]

  # calculate number of rows to insert
  # 
  hoursDiff <- as.numeric(format(to, "%H")) - as.numeric(format(from , "%H"))

  # compensate for crossing of midnight (00:00AM)
  # by adding 24
  #
  hoursDiff <- as.integer(difftime(as.Date(to), as.Date(from), unit="days")) * 24 + hoursDiff

  # if there is at least on pass of the full hour, insert a copy of the
  # current row but adapt on and off times
  # 
  if (!is.na(hoursDiff) & hoursDiff > 0) {
    for (hour in 1:hoursDiff) {

      # startime of this additional row
      # 
      fromTime <- as.POSIXct(paste0(format(from  + 3600 * hour, "%m/%d/%y-%H"), ":00:00"), format="%m/%d/%y-%H:%M:%S")

      # Maximal endtime of this additional row
      # 
      toTime <- fromTime + 3599

      # copy current line
      # 
      insert <- df2[i, ]

      # set start time for this new row to full hour
      #
      insert$on <- fromTime

      # if this is the last row to insert do NOT adapt off time
      # 
      if (!(toTime > to)) {
        insert$off <- toTime
      } 

      # add additional row
      # 
      df3 <- rbind(df3, insert)
    }

  # set off-time for the current line to end of first hour
  # 
  df3[df3$on.bout.ID == id & df3$on == from & df3$off == to,]$off <-  as.POSIXct(paste0(format(from, "%m/%d/%y-%H"), ":59:59"), format="%m/%d/%y-%H:%M:%S")
  }
}

# Use `dplyr` to sort result
#
library(dplyr)    
df3 %>% arrange(on.bout.ID, on)

# A tibble: 21 x 3 on.bout.ID on off <int> <dttm> <dttm> 1 0 NA 2012-04-27 17:25:13 2 1 2012-04-27 17:25:39 2012-04-27 17:59:59 3 1 2012-04-27 18:00:00 2012-04-27 18:03:29 4 2 2012-04-27 18:03:57 2012-04-27 18:59:59 5 2 2012-04-27 19:00:00 2012-04-27 19:41:16 6 3 2012-04-27 19:43:50 2012-04-27 19:59:59 7 3 2012-04-27 20:00:00 2012-04-27 20:59:59 8 3 2012-04-27 21:00:00 2012-04-27 21:59:59 9 3 2012-04-27 22:00:00 2012-04-27 22:59:59 10 3 2012-04-27 23:00:00 2012-04-27 23:59:59 # … with 11 more rows

它漂亮吗？不是的！它起作用了吗？我也这么想

编辑：

已添加

 hoursDiff <- as.integer(difftime(as.Date(to), as.Date(from), unit="days")) * 24 + hoursDiff

扩展午夜过关的功能

票数 2

Stack Overflow用户

发布于 2020-03-03 22:01:07

这里有个主意

library(dplyr)
library(lubridate)

# Yours data
a =  
  read.csv(header = F, sep = ";", stringsAsFactors = F,
           col.names = c("Event","DT.event","off.bout.ID","on.bout.ID","off.time.diff","on.time.diff"),
           text = gsub(pattern = "\\s+{2}",replacement = ";", 
                       x="off     4/27/12 17:25:13    1          0           NA               NA
                        on      4/27/12 17:25:39    1          1           26               NA
                        off     4/27/12 18:03:29    2          1           NA               2270
                        on      4/27/12 18:03:57    2          2           28               NA
                        off     4/27/12 19:41:16    3          2           NA               5839
                        on      4/27/12 19:43:50    3          3           154              NA
                        off     4/28/12 6:23:57     4          3           NA               38407
                        on      4/28/12 6:32:13     4          4           496              NA
                        off     4/28/12 6:40:20     5          4           NA               487
                        on      4/28/12 6:40:48     5          5           28               NA
                        off     4/28/12 8:16:07     6          5           NA               5719"
           )
  ) %>% mutate(DT.event = as.POSIXct(DT.event, format = "%m/%d/%Y %H:%M:%S")
              )
# Ordering by time, if it isn't ordered
a = a[order(a$DT.event),]

# Build a trick column to calculate time difs with 'next_event'
a[,"next_eve"] = as.POSIXct(c(a$DT.event[2:nrow(a)],NA))

# Build column with time difference by "complete" hours
a = a %>%
      mutate(dif_comp_hour_sec =  
               case_when(
                 floor_date(next_eve,unit = "hour") > floor_date(next_eve,unit = "hour") ~ as.numeric(floor_date(next_eve,unit = "hour") - DT.event),
                                  T ~ as.numeric(next_eve - DT.event  )
                                 )
            )

如果需要，可以使用列"Event“再次拆分为on/off列。

在这里，输出：

#    Event          DT.event off.bout.ID on.bout.ID off.time.diff on.time.diff          next_eve dif_comp_hour_sec
# 1    off 12-04-27 17:25:13           1          0            NA           NA 12-04-27 17:25:39                26
# 2     on 12-04-27 17:25:39           1          1            26           NA 12-04-27 18:03:29              2270
# 3    off 12-04-27 18:03:29           2          1            NA         2270 12-04-27 18:03:57                28
# 4     on 12-04-27 18:03:57           2          2            28           NA 12-04-27 19:41:16              5839
# 5    off 12-04-27 19:41:16           3          2            NA         5839 12-04-27 19:43:50               154
# 6     on 12-04-27 19:43:50           3          3           154           NA 12-04-28 06:23:57             38407
# 7    off 12-04-28 06:23:57           4          3            NA        38407 12-04-28 06:32:13               496
# 8     on 12-04-28 06:32:13           4          4           496           NA 12-04-28 06:40:20               487
# 9    off 12-04-28 06:40:20           5          4            NA          487 12-04-28 06:40:48                28
# 10    on 12-04-28 06:40:48           5          5            28           NA 12-04-28 08:16:07              5719
# 11   off 12-04-28 08:16:07           6          5            NA         5719              <NA>                NA

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60515468

复制

相似问题

问时间间隔:当样品超过小时标记时，按一天的小时分组
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问时间间隔:当样品超过小时标记时，按一天的小时分组EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问时间间隔:当样品超过小时标记时，按一天的小时分组
EN