我有两个数据帧(GPSData和ACCData),它们来自相同的数据帧,但我只对ACCData使用了一个函数。现在,我想加入他们的行列。但有些时候,我没有相同的行数,所以我不能连接它们,并确保时间线一致。
此GPSData df:
device_id UTC_datetime UTC_date UTC_time datatype
182197 2020-02-19 13:53:03 2020-02-19 13:53:03 GPS
182197 2020-02-19 13:53:28 2020-02-19 13:53:28 GPS
182197 2020-02-19 13:53:59 2020-02-19 13:53:59 GPS
182197 2020-02-19 13:54:30 2020-02-19 13:54:30 GPS
182197 2020-02-19 13:54:58 2020-02-19 13:54:58 GPS
182197 2020-02-19 13:55:28 2020-02-19 13:55:28 GPS
这是ACCData df:
start.timestamp burst.id axis.sample burst.duration
2020-02-19 13:53:04 0 200 9
2020-02-19 13:53:29 1 200 9
2020-02-19 13:54:00 2 200 9
2020-02-19 13:54:31 3 200 9
2020-02-19 13:54:59 4 200 9
2020-02-19 13:55:29 5 200 9
我想要一个代码行谁说在GPSData中只保留与ACCData的爆发相对应的行,谁有一个UTC_Datetime接近start.timesstamp在ACCData,大多数时候ACCData的爆发开始后1或2秒的全球定位系统的位置。希望你能帮助我,这对我的学习会有很大的帮助。谢谢!
发布于 2020-04-01 23:21:12
这似乎是一个困难的问题,但在data.table
中使用滚动连接很容易实现。
library(data.table)
library(fasttime)
setDT(GPSData)
setDT(ACCData)
GPSData[,Time := fastPOSIXct(UTC_datetime)]
ACCData[,Time := fastPOSIXct(start.timestamp)]
Result <- GPSData[ACCData,on = "Time", roll="nearest"]
Result[,.(device_id,UTC_datetime,start.timestamp,burst.id,axis.sample,burst.duration)]
# device_id UTC_datetime start.timestamp burst.id axis.sample burst.duration
#1: 182197 2020-02-19 13:53:03 2020-02-19 13:53:04 0 200 9
#2: 182197 2020-02-19 13:53:28 2020-02-19 13:53:29 1 200 9
#3: 182197 2020-02-19 13:53:59 2020-02-19 13:54:00 2 200 9
#4: 182197 2020-02-19 13:54:30 2020-02-19 13:54:31 3 200 9
#5: 182197 2020-02-19 13:54:58 2020-02-19 13:54:59 4 200 9
#6: 182197 2020-02-19 13:55:28 2020-02-19 13:55:29 5 200 9
在连接之后,您可以计算UTC_datetime
和start.timestamp
之间的差异,以过滤掉相距太远的行。
Result[,`:=`(UTC_date = NULL, UTC_time = NULL, datatype = NULL, Time = NULL, UTC_datetime = fastPOSIXct(UTC_datetime), start.timestamp = fastPOSIXct(start.timestamp))]
Result[,Diff := abs(UTC_datetime - start.timestamp)]
Result[Diff < 5,]
# device_id UTC_datetime start.timestamp burst.id axis.sample burst.duration Diff
#1: 182197 2020-02-19 08:53:03 2020-02-19 08:53:04 0 200 9 1 secs
#2: 182197 2020-02-19 08:53:28 2020-02-19 08:53:29 1 200 9 1 secs
#3: 182197 2020-02-19 08:53:59 2020-02-19 08:54:00 2 200 9 1 secs
#4: 182197 2020-02-19 08:54:30 2020-02-19 08:54:31 3 200 9 1 secs
#5: 182197 2020-02-19 08:54:58 2020-02-19 08:54:59 4 200 9 1 secs
#6: 182197 2020-02-19 08:55:28 2020-02-19 08:55:29 5 200 9 1 secs
如果您尝试通过device_id连接,则可以在多个列上连接,但是的最后一个值可以是滚动的。
GPSData[ACCData,on = c("device_id","Time"), roll="nearest"]
Data
GPSData <- structure(list(device_id = c(182197L, 182197L, 182197L, 182197L,
182197L, 182197L), UTC_datetime = structure(1:6, .Label = c("2020-02-19 13:53:03",
"2020-02-19 13:53:28", "2020-02-19 13:53:59", "2020-02-19 13:54:30",
"2020-02-19 13:54:58", "2020-02-19 13:55:28"), class = "factor"),
UTC_date = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "2020-02-19", class = "factor"),
UTC_time = structure(1:6, .Label = c("13:53:03", "13:53:28",
"13:53:59", "13:54:30", "13:54:58", "13:55:28"), class = "factor"),
datatype = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "GPS", class = "factor"),
Time = structure(c(1582120383, 1582120408, 1582120439, 1582120470,
1582120498, 1582120528), class = c("POSIXct", "POSIXt"))), class = "data.frame", row.names = c(NA,
-6L))
ACCData <- structure(list(start.timestamp = structure(1:6, .Label = c("2020-02-19 13:53:04",
"2020-02-19 13:53:29", "2020-02-19 13:54:00", "2020-02-19 13:54:31",
"2020-02-19 13:54:59", "2020-02-19 13:55:29"), class = "factor"),
burst.id = 0:5, axis.sample = c(200L, 200L, 200L, 200L, 200L,
200L), burst.duration = c(9L, 9L, 9L, 9L, 9L, 9L), Time = structure(c(1582120384,
1582120409, 1582120440, 1582120471, 1582120499, 1582120529
), class = c("POSIXct", "POSIXt"))), class = "data.frame", row.names = c(NA,
-6L))
https://stackoverflow.com/questions/60974156
复制相似问题