我在试着分析卫星遥测数据。我们的卫星有178个频道。我们希望根据它们的更新时间间隔进行分组。例如,通道1每10秒发送一条消息(有100个通道),通道2每20秒发送一次消息,通道3 30秒发送一条消息,通道4 60秒发送一条消息。因此,我们的通道根据它们的更新时间发送信息。可以按时间间隔对它们进行排序吗?例如,10秒组:所有通道10秒更新,依此类推
数据:
channel 1:  3:25:15 (update time 10 secs)
channel 1:  3:25:25
channel 2:  3:25:35 (update time 20 secs)
channel 1:  3:25:35
channel 1:  3:25:45
channel 3:  3:25:45 (update time 30 secs)
channel 1:  3:25:55
channel 2:  3:25:55
channel 1:  3:26:05
channel 1:  3:26:15
channel 2:  3:26:15
channel 3:  3:26:15
channel 4:  3:26:15  (update time 60 secs)我想要一个结果:
group by 10 secs:
channel 1: 3:25:15
channel 1: 3:25:25
channel 1: 3:25:35
channel 1: 3:25:45
channel 1: 3:25:55
channel 1: 3:26:05对于每个时间间隔依此类推。
注:共有178个频道。我不知道哪些频道有10秒、20秒等等。所以我必须根据它们的更新时间对它们进行排序。
发布于 2019-09-22 02:58:00
好的,下面的可能需要调整,但你应该明白:)让我们一步一步来
首先,我的数据存储在/tmp/data中。我必须为通道4添加另一个测量值,否则它将被排除(参见后面):
$ cat /tmp/data
channel 1:  3:25:15 (update time 10 secs)
channel 1:  3:25:25
channel 2:  3:25:35 (update time 20 secs)
channel 1:  3:25:35
channel 1:  3:25:45
channel 3:  3:25:45 (update time 30 secs)
channel 1:  3:25:55
channel 2:  3:25:55
channel 1:  3:26:05
channel 1:  3:26:15
channel 2:  3:26:15
channel 3:  3:26:15
channel 4:  3:26:15  (update time 60 secs)
channel 4:  3:27:15  (update time 60 secs)现在,我创建了一个加载函数,它将创建一个字典,其中键是频道号(int),值是更新时间列表:
from collections import defaultdict
from datetime import datetime
import re
def read_data(fpath):
    # Format: {"channel X": [update1, update2]}
    data = defaultdict(list)
    with open(fpath) as f:
        for line in f:
            parts = re.findall('[:\w]+', line)
            data[int(parts[1][:-1])].append(parts[2])
    return data
data = read_data("/tmp/data")
# Sort timestamps
for channel in data:
    data[channel].sort()
print(data)这给了你(格式化的输出,使其更容易阅读):
defaultdict(<type 'list'>, {
  1: ['3:25:15', '3:25:25', '3:25:35', '3:25:45', '3:25:55', '3:26:05', '3:26:15'], 
  2: ['3:25:35', '3:25:55', '3:26:15'], 
  3: ['3:25:45', '3:26:15'], 
  4: ['3:26:15', '3:27:15']
})最后,到有趣的代码!我们将循环此数据,对于每个通道,我们将:
将上面的内容存储在另一个字典中,其中它的键是间隔,值是似乎具有此更新间隔的通道列表:
# Identify intervals
channel_interval = defaultdict(list)
FMT = '%H:%M:%S'
for channel, report_times in data.items():
    # We need at least 2 samples to determine interval - 
    # channel 4 needed another entry for this to work
    if len(report_times) < 2:
        continue
    # Collect all reports timediff for this channel
    diffs = []
    # This converts timestamp to datatime
    prev_time = datetime.strptime(report_times[0], FMT)
    for rt in report_times[1:]:
        cur_time = datetime.strptime(rt, FMT)
        diffs.append((cur_time - prev_time).seconds)
        prev_time = cur_time
    # average the report time difference - int division
    # here you might need to be smarter with real data and round up a bit
    # if needed
    interval = sum(diffs) // len(diffs)
    channel_interval[interval].append(channel)报告:只需循环每个channel_interval,并为落入此间隔的每个通道打印时间戳:
# report
for interval, channels in channel_interval.items():
    print("Updating every {} seconds (channels={})".format(interval, channels))
    for channel in channels:
        hdr = '\nchannel {}: '.format(channel)
        print(hdr + hdr.join(data[channel]))
        print("\n")最终输出为:
Updating every 60 seconds (channels=[4])
channel 4: 3:26:15
channel 4: 3:27:15
Updating every 10 seconds (channels=[1])
channel 1: 3:25:15
channel 1: 3:25:25
channel 1: 3:25:35
channel 1: 3:25:45
channel 1: 3:25:55
channel 1: 3:26:05
channel 1: 3:26:15
Updating every 20 seconds (channels=[2])
channel 2: 3:25:35
channel 2: 3:25:55
channel 2: 3:26:15
Updating every 30 seconds (channels=[3])
channel 3: 3:25:45
channel 3: 3:26:15正如我所说的,上面可能需要对真实数据进行小的修改,但这应该是一个很好的开始。如果你有什么问题,请告诉我
更新1:如果您希望打印按时间间隔排序,您可以循环
for channel, interval in sorted(channel_interval.items(), key=lambda x: x[0])https://stackoverflow.com/questions/58042826
复制相似问题