please excuse me if this (or something similar) has already been asked.
如果已经问过这个(或类似的东西),请原谅。
I've got a numpy structured numpy array with > 1E7 entries. Now one of the columns o the array is the timestamp of a specific event. What I'd like to do is filter the array based on timestamps. I'd like to keep the N'th row if the N+1 row's timestamp is larger than the previous entry by T. Is there an efficient way to do this in numpy? I've been going about it in the following way, but it's too slow to be useful (y is the structured array filled with all of our data. x is the filtered array)
我有一个numpy结构化的numpy数组,> 1E7条目。现在,数组中的一列是特定事件的时间戳。我想做的是根据时间戳过滤数组。如果N + 1行的时间戳大于T的前一个条目,我想保留第N行。是否有一种有效的方法在numpy中执行此操作?我一直在以下面的方式讨论它,但它太慢而无用(y是填充了我们所有数据的结构化数组.x是过滤后的数组)
T=250
x=np.ndarray(len(y),dtype=y.dtype)
for i in range(len(y['timestamp'])-1):
if y['timestamp'][i+1]-y['timestamp'][i]>T:
x[i]=y[i]
1 个解决方案
#1
1
This is a good example of using advanced indexing in numpy:
这是在numpy中使用高级索引的一个很好的例子:
this_row = y['timestamp'][:-1]
next_row = y['timestamp'][1:]
selection = next_row - this_row > T
result = y[:-1][selection]
The y[:-1]
in the last line is necessary because selection
has only length len(y) - 1
and the last element should be dropped always according to your code. Alternatively, you could also concatenate another False
to selection, but this might be slower since it necessitates copying the values of selection
. But if performance is really an issue, you should benchmark these two options.
最后一行中的y [: - 1]是必要的,因为选择只有长度len(y) - 1,最后一个元素应该根据你的代码丢弃。或者,您也可以将另一个False连接到选择,但这可能会更慢,因为它需要复制选择的值。但如果性能确实存在问题,那么您应该对这两个选项进行基准测试。
#1
1
This is a good example of using advanced indexing in numpy:
这是在numpy中使用高级索引的一个很好的例子:
this_row = y['timestamp'][:-1]
next_row = y['timestamp'][1:]
selection = next_row - this_row > T
result = y[:-1][selection]
The y[:-1]
in the last line is necessary because selection
has only length len(y) - 1
and the last element should be dropped always according to your code. Alternatively, you could also concatenate another False
to selection, but this might be slower since it necessitates copying the values of selection
. But if performance is really an issue, you should benchmark these two options.
最后一行中的y [: - 1]是必要的,因为选择只有长度len(y) - 1,最后一个元素应该根据你的代码丢弃。或者,您也可以将另一个False连接到选择,但这可能会更慢,因为它需要复制选择的值。但如果性能确实存在问题,那么您应该对这两个选项进行基准测试。