文章/答案/技术大牛

发布

社区首页 >问答首页 >如何对齐两个不等大小的时间序列numpy数组？

问如何对齐两个不等大小的时间序列numpy数组？
EN

Stack Overflow用户

提问于 2012-05-28 17:11:59

回答 6查看 5.5K关注 0票数 14

我有两个numpy数组，包含timeseries (unix时间戳)。

我希望找到时间戳对(每个数组中有1)，其差值位于阈值内。

为了实现这一点，我需要将两个时间序列数据对齐成两个数组，这样每个索引都有其最接近的对。(如果数组中的两个时间戳与另一个数组中的另一个时间戳相当接近，我不介意选择其中的一个，因为对的计数比实际值更重要。)

因此，对齐的数据集将有两个大小相同的的数组，加上一个正在填充空数据的较小的数组。

我正在考虑使用timeseries包和align函数。

但是，我不确定如何为我的数据使用对齐，这是一个时刻表。

示例考虑两个timeseries数组：

ts1=np.array([ 1311242821.0, 1311242882.0, 1311244025.0, 1311244145.0, 1311251330.0, 
               1311282555.0, 1311282614.0])
ts2=np.array([ 1311226761.0, 1311227001.0, 1311257033.0, 1311257094.0, 1311281265.0])

输出样本：

对于ts2[2] (1311257033.0)，它最接近的对应该是ts1[4] (1311251330.0)，因为区别是5703.0，它在threshold中是最小的。既然ts2[2]和ts1[4] 已经成对了，那么它们就应该被排除在其他计算之外。

应该找到这样的对，因此输出数组可能比实际数组长。

abs(ts1-ts2) = 16060 abs(ts1-ts21) = 15820 //对 abs(ts1-ts22) = 14212 abs(Ts1-T23)= 14273 abs(ts1-ts24) = 38444

abs(ts11-ts2) = 16121

abs(ts11-ts21) = 15881

abs(TS-11-TS22)= 14151

abs(ts11-ts23) = 14212

abs(TS-11-TS24)= 38383

abs(TS-12-TS2)= 17264

防抱死物质(ts12-ts21)= 17024

abs(TS-12-TS22)= 13008

防抱死物质(ts12-ts23)= 13069

abs(TS-12-TS24)= 37240

abs(ts13-ts2) = 17384

防抱死物质(ts13-ts21)= 17144

abs(ts13-ts22) = 12888

防抱死物质(ts13-ts23)= 17144

abs(TS-TS24)= 37120

abs(ts14-ts2) = 24569

abs(ts14-ts21) = 24329

abs(ts14-ts22) = 5703 //对

abs(ts14-ts23) = 5764

abs(ts14-ts24) = 29935

防抱死物质(ts15-TS2)= 55794

防抱死物质(ts15-ts21)= 55554

abs(TS-15-TS22)= 25522

防抱死物质(ts15-ts23)= 25461

abs(ts15-ts24) = 1290 //对

abs(ts16-ts2) = 55853

abs(TS16-T21)= 55613

abs(ts16-ts22) = 25581

abs(TS16-T23)= 25520

abs(ts16-ts24) = 1349

所以这对是：(ts1[0],ts2[1]), (ts1[4],ts2[2]), (ts1[5],ts2[4])

其余元素应该以null作为它们的对。

最后两个数组的大小为9。

如果这个问题很清楚，请告诉我。

python

alignment

time-series

closest-points

回答 6

Stack Overflow用户

发布于 2017-06-21 18:52:33

使用numpy Mask arrays输出对齐时间序列(_ts1，_ts2)的解决方案。

结果是3对，只有距离为1的对可以用来对齐Threshold=1之前的时间序列。

def compute_diffs(threshold):
    dtype = [('diff', int), ('ts1', int), ('ts2', int), ('threshold', int)]
    diffs = np.empty((ts1.shape[0], ts2.shape[0]), dtype=dtype)
    pairs = np.ma.make_mask_none(diffs.shape)

    for i1, t1 in enumerate(ts1):
        for i2, t2 in enumerate(ts2):
            diffs[i1, i2] = (abs(t1 - t2), i1, i2, abs(i1-i2))

        d1 = diffs[i1][diffs[i1]['threshold'] == threshold]
        if d1.size == 1:
            (diff, y, x, t) = d1[0]
            pairs[y, x] = True
    return diffs, pairs

def align_timeseries(diffs):
    def _sync(ts, ts1, ts2, i1, i2, ii):
        while i1 < i2:
            ts1[ii] = ts[i1]; i1 +=1
            ts2[ii] = DTNULL
            ii += 1
        return ii, i1

    _ts1 = np.array([DTNULL]*9)
    _ts2 = np.copy(_ts1)
    ii = _i1 = _i2 = 0

    for n, (diff, i1, i2, t) in enumerate(np.sort(diffs, order='ts1')):
        ii, _i1 = _sync(ts1, _ts1, _ts2, _i1, i1, ii)
        ii, _i2 = _sync(ts2, _ts2, _ts1, _i2, i2, ii)

        if _i1 == i1:
            _ts1[ii] = ts1[i1]; _i1 += 1
            _ts2[ii] = ts2[i2]; _i2 += 1
            ii += 1

    ii, _i1 = _sync(ts1, _ts1, _ts2, _i1, ts1.size, ii)
    return _ts1, _ts2

主要：

diffs, pairs = compute_diffs(threshold=1)
print('diffs[pairs]:{}'.format(diffs[pairs]))
_ts1, _ts2 = align_timeseries(diffs[pairs])
pprint(ts1, ts2, _ts1, _ts2)

输出： (15820，0，1) ( 5703，4，2) ( 1290，5，4) ts1 ts2 _ts1 diff _ts2 0: 2011-07-21 12: 07: 01 2011-07-21 :39:2143:21 2011-07:07:01 15820 2011-07-21 07:43:21 2: 2011-07-21 12:27:05 2011-07-21 :03:53 2011-07-21 12:02052011-07-21 14:28:50 5703 2011-07-21 16:03:53 6: 2011-07-21 23:10:14-- 2011-07-21 23:09:15 1290 2011-07-21 22:47:45 8：

用Python测试的：3.4.2

票数 2

Stack Overflow用户

发布于 2012-05-28 18:26:17

我不知道你说对齐时间戳是什么意思。但是您可以使用时间模块将时间戳表示为浮点数或整数。在第一步中，您可以将任何用户格式转换为time.struct_time定义的数组。在第二个步骤中，您可以将此转换为epoche的秒表开始。然后，可以使用时间戳来计算整数。

如何使用time.strptime()转换用户格式在文档中有很好的说明：

    >>> import time
    >>> t = time.strptime("30 Nov 00", "%d %b %y")
    >>> t
    time.struct_time(tm_year=2000, tm_mon=11, tm_mday=30, tm_hour=0, tm_min=0,
             tm_sec=0, tm_wday=3, tm_yday=335, tm_isdst=-1)
    >>> time.mktime(t)
    975538800.0

票数 1

Stack Overflow用户

发布于 2017-06-18 16:27:25

除了问题中的小错误外，我还能猜到问题的实质是什么。

您所看到的是一个典型的赋值问题示例。Scipy为您提供了匈牙利算法的实现，请检查文档这里。它不必是时间戳，可以是任何数字(整数或浮点数)。

下面的代码段将处理2个不同大小的numpy数组以及一个阈值，以便为您提供成本数组(按阈值过滤)或对应于2个numpy数组的索引对(同样，成本由阈值过滤的对)。

注释将以给定的时间戳数组为例，向您介绍整个过程。

import numpy as np
from scipy.optimize import linear_sum_assignment


def closest_pairs(inp1, inp2, threshold=np.inf):
    cost = np.zeros((inp1.shape[0], inp2.shape[0]), dtype=np.int64)

    for x in range(ts1.shape[0]):
        for y in range(ts2.shape[0]):
            cost[x][y] = abs(ts1[x] - ts2[y])

    print(cost)
    # cost for the above example:
    # [[16060 15820 14212 14273 38444]
    # [16121 15881 14151 14212 38383]
    # [17264 17024 13008 13069 37240]
    # [17384 17144 12888 12949 37120]
    # [24569 24329  5703  5764 29935]
    # [55794 55554 25522 25461  1290]
    # [55853 55613 25581 25520  1349]]

    # hungarian algorithm implementation provided by scipy
    row_ind, col_ind = linear_sum_assignment(cost)
    # row_ind = [0 1 3 4 5], col_ind = [1 0 3 2 4] 
    # where (ts1[5] - ts2[4]) = 1290

    # if you want the distances only
    out = [item 
           for item in cost[row_ind, col_ind] 
           if item < threshold]

    # if you want the pair of indices filtered by the threshold
    pairs = [(row, col) 
             for row, col in zip(row_ind, col_ind) 
             if cost[row, col] < threshold]

    return out, pairs


if __name__ == '__main__':
    out, pairs = closest_pairs(ts1, ts2, 6000)
    print(out, pairs)
    # out = [5703, 1290] 
    # pairs = [(4, 2), (5, 4)]

    out, pairs = closest_pairs(ts1, ts2)
    print(out, pairs)
    # out = [15820, 16121, 12949, 5703, 1290] 
    # pairs = [(0, 1), (1, 0), (3, 3), (4, 2), (5, 4)]

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/10788233

复制

相似问题

问如何对齐两个不等大小的时间序列numpy数组？
EN

回答 6

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何对齐两个不等大小的时间序列numpy数组？EN

回答 6

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何对齐两个不等大小的时间序列numpy数组？
EN