文章/答案/技术大牛

发布

社区首页 >问答首页 >是否可以按时间间隔对数据进行分组？

问是否可以按时间间隔对数据进行分组？
EN

Stack Overflow用户

提问于 2019-09-22 02:03:12

回答 1查看 52关注 0票数 1

我在试着分析卫星遥测数据。我们的卫星有178个频道。我们希望根据它们的更新时间间隔进行分组。例如，通道1每10秒发送一条消息(有100个通道)，通道2每20秒发送一次消息，通道3 30秒发送一条消息，通道4 60秒发送一条消息。因此，我们的通道根据它们的更新时间发送信息。可以按时间间隔对它们进行排序吗？例如，10秒组:所有通道10秒更新，依此类推

数据：

channel 1:  3:25:15 (update time 10 secs)
channel 1:  3:25:25
channel 2:  3:25:35 (update time 20 secs)
channel 1:  3:25:35
channel 1:  3:25:45
channel 3:  3:25:45 (update time 30 secs)
channel 1:  3:25:55
channel 2:  3:25:55
channel 1:  3:26:05
channel 1:  3:26:15
channel 2:  3:26:15
channel 3:  3:26:15
channel 4:  3:26:15  (update time 60 secs)

我想要一个结果：

group by 10 secs:

channel 1: 3:25:15
channel 1: 3:25:25
channel 1: 3:25:35
channel 1: 3:25:45
channel 1: 3:25:55
channel 1: 3:26:05

对于每个时间间隔依此类推。

注:共有178个频道。我不知道哪些频道有10秒、20秒等等。所以我必须根据它们的更新时间对它们进行排序。

python

回答 1

Stack Overflow用户

发布于 2019-09-22 02:58:00

好的，下面的可能需要调整，但你应该明白:)让我们一步一步来

首先，我的数据存储在/tmp/data中。我必须为通道4添加另一个测量值，否则它将被排除(参见后面)：

$ cat /tmp/data
channel 1:  3:25:15 (update time 10 secs)
channel 1:  3:25:25
channel 2:  3:25:35 (update time 20 secs)
channel 1:  3:25:35
channel 1:  3:25:45
channel 3:  3:25:45 (update time 30 secs)
channel 1:  3:25:55
channel 2:  3:25:55
channel 1:  3:26:05
channel 1:  3:26:15
channel 2:  3:26:15
channel 3:  3:26:15
channel 4:  3:26:15  (update time 60 secs)
channel 4:  3:27:15  (update time 60 secs)

现在，我创建了一个加载函数，它将创建一个字典，其中键是频道号(int)，值是更新时间列表：

from collections import defaultdict
from datetime import datetime

import re

def read_data(fpath):
    # Format: {"channel X": [update1, update2]}
    data = defaultdict(list)
    with open(fpath) as f:
        for line in f:
            parts = re.findall('[:\w]+', line)

            data[int(parts[1][:-1])].append(parts[2])

    return data


data = read_data("/tmp/data")

# Sort timestamps
for channel in data:
    data[channel].sort()
print(data)

这给了你(格式化的输出，使其更容易阅读)：

defaultdict(<type 'list'>, {
  1: ['3:25:15', '3:25:25', '3:25:35', '3:25:45', '3:25:55', '3:26:05', '3:26:15'], 
  2: ['3:25:35', '3:25:55', '3:26:15'], 
  3: ['3:25:45', '3:26:15'], 
  4: ['3:26:15', '3:27:15']
})

最后，到有趣的代码！我们将循环此数据，对于每个通道，我们将：

计算更新间隔列表(在timestamps)
Average和四舍五入到整数之间的timediff-此部分在 data中可能需要更多工作

将上面的内容存储在另一个字典中，其中它的键是间隔，值是似乎具有此更新间隔的通道列表：

# Identify intervals
channel_interval = defaultdict(list)
FMT = '%H:%M:%S'

for channel, report_times in data.items():
    # We need at least 2 samples to determine interval - 
    # channel 4 needed another entry for this to work
    if len(report_times) < 2:
        continue

    # Collect all reports timediff for this channel
    diffs = []
    # This converts timestamp to datatime
    prev_time = datetime.strptime(report_times[0], FMT)

    for rt in report_times[1:]:
        cur_time = datetime.strptime(rt, FMT)
        diffs.append((cur_time - prev_time).seconds)
        prev_time = cur_time

    # average the report time difference - int division
    # here you might need to be smarter with real data and round up a bit
    # if needed
    interval = sum(diffs) // len(diffs)
    channel_interval[interval].append(channel)

报告:只需循环每个channel_interval，并为落入此间隔的每个通道打印时间戳：

# report
for interval, channels in channel_interval.items():
    print("Updating every {} seconds (channels={})".format(interval, channels))
    for channel in channels:
        hdr = '\nchannel {}: '.format(channel)
        print(hdr + hdr.join(data[channel]))
        print("\n")

最终输出为：

Updating every 60 seconds (channels=[4])

channel 4: 3:26:15
channel 4: 3:27:15


Updating every 10 seconds (channels=[1])

channel 1: 3:25:15
channel 1: 3:25:25
channel 1: 3:25:35
channel 1: 3:25:45
channel 1: 3:25:55
channel 1: 3:26:05
channel 1: 3:26:15


Updating every 20 seconds (channels=[2])

channel 2: 3:25:35
channel 2: 3:25:55
channel 2: 3:26:15


Updating every 30 seconds (channels=[3])

channel 3: 3:25:45
channel 3: 3:26:15

正如我所说的，上面可能需要对真实数据进行小的修改，但这应该是一个很好的开始。如果你有什么问题，请告诉我

更新1:如果您希望打印按时间间隔排序，您可以循环

for channel, interval in sorted(channel_interval.items(), key=lambda x: x[0])

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58042826

复制

相似问题

问是否可以按时间间隔对数据进行分组？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问是否可以按时间间隔对数据进行分组？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问是否可以按时间间隔对数据进行分组？
EN