在对齐的时间序列索引中索引副本

时间:2022-02-15 07:38:02

Say I have two time sequences whose indices are aligned as follows:

假设我有两个时间序列,其索引对齐如下:

import numpy as np    

t1_ind = np.array([ 1,  1,  1,  2,  3,  4,  5,  5,  6])
t2_ind = np.array([20, 21, 22, 23, 23, 24, 25, 26, 27])

which means that the index 1 of t1 is aligned with index 20, 21 and 22 of t2 (implying that t1 is faster than t2 in the first three increments) and so on.

这意味着t1的索引1与t2的索引20,21和22对齐(暗示t1在前三个增量中比t2快),依此类推。

The expected output should be:

预期的输出应该是:

y = np.array(([ 1,  2,  4,  5,  6], 
              [20, 23, 24, 25, 27]))

The logic is to "scan" t1_ind and t2_ind and mark both the onset and offset of every duplicate segment. In this example, the entry 1 in t1_ind is followed by its duplicate, so the onset pair is recorded in y[:,0], and the respective offset pair is y[:,1]. The next duplicate segment in t1_ind starts and ends as y[:,3] and y[:,4], respectively. t2_ind is done in the same way, the resulting pairs are y[:,1] (won't be recorded twice though) and y[:,2]. It seems to me similar with a duplicate-removal problem but I don't know how.

逻辑是“扫描”t1_ind和t2_ind并标记每个重复段的开始和偏移。在此示例中,t1_ind中的条目1后跟其副本,因此起始对记录在y [:,0]中,并且相应的偏移对是y [:,1]。 t1_ind中的下一个重复段分别以y [:,3]和y [:,4]开始和结束。 t2_ind以相同的方式完成,得到的对是y [:,1](虽然不会记录两次)和y [:,2]。在我看来,类似于重复删除问题,但我不知道如何。

Sorry it is kinda hard for me to think of a proper title and to explain the logic precisely in short. Thanks for any help.

对不起,我很难想到一个合适的标题,并简要地解释逻辑。谢谢你的帮助。

1 个解决方案

#1


2  

You can create a boolean slice that you can pass to both array, based on the conditions you set up. Since nothing comes before the first elements, we will always keep the those. You can check for repeated elements after the first by subtracting slices of the arrays that are shifted by 1. Doing this for both arrays gives you the boolean array to use as the slice.

您可以根据您设置的条件创建一个可以传递给两个数组的布尔切片。由于在第一个元素之前没有任何东西,我们将始终保留这些元素。您可以通过减去移位数为1的数组的切片来检查第一个之后的重复元素。对两个数组执行此操作会为您提供布尔数组以用作切片。

array_slice = np.concatenate((
     np.array([True]),
     ((t1_ind[1:] - t1_ind[:-1]) != 0) & 
      (t2_ind[1:] - t2_ind[:-1]) != 0)
    ))

array_slice
# returns:
array([ True, False, False,  True, False,  True,  True, False,  True], dtype=bool)

t1_ind[array_slice]
t2_ind[array_slice]
# returns:
array([1, 2, 4, 5, 6])
array([20, 23, 24, 25, 27])

#1


2  

You can create a boolean slice that you can pass to both array, based on the conditions you set up. Since nothing comes before the first elements, we will always keep the those. You can check for repeated elements after the first by subtracting slices of the arrays that are shifted by 1. Doing this for both arrays gives you the boolean array to use as the slice.

您可以根据您设置的条件创建一个可以传递给两个数组的布尔切片。由于在第一个元素之前没有任何东西,我们将始终保留这些元素。您可以通过减去移位数为1的数组的切片来检查第一个之后的重复元素。对两个数组执行此操作会为您提供布尔数组以用作切片。

array_slice = np.concatenate((
     np.array([True]),
     ((t1_ind[1:] - t1_ind[:-1]) != 0) & 
      (t2_ind[1:] - t2_ind[:-1]) != 0)
    ))

array_slice
# returns:
array([ True, False, False,  True, False,  True,  True, False,  True], dtype=bool)

t1_ind[array_slice]
t2_ind[array_slice]
# returns:
array([1, 2, 4, 5, 6])
array([20, 23, 24, 25, 27])