如果所有索引都在子数组中,如何使用字典将数组索引映射到相应的argsorted索引?

时间:2020-11-27 19:38:31

I have multiple arrays that correspond to data parameters of a time-series. The data parameters include things like speed, hour of occurrence, day of occurrence, month of occurrence, elapsed hour of occurrence, etc.

我有多个数组,对应于时间序列的数据参数。数据参数包括速度,发生时间,发生日期,发生月份,发生的经过时间等。

I am trying to find the indices that correspond to a grouping of a specified data parameter from highest to lowest frequency of occurrence.

我试图找到对应于从最高到最低出现频率的指定数据参数的分组的索引。

As a simple example, consider the following:

举个简单的例子,请考虑以下事项:

import numpy as np

speed = np.array([4, 6, 8, 3, 6, 9, 7, 6, 4, 3])*100
elap_hr = sorted(np.random.randint(low=1, high=40, size=10))
## ... other time parameter arrays

print(speed)
# [400 600 800 300 600 900 700 600 400 300]

print(elap_hr)
# [ 1  2  6  7 13 19 21 28 33 38]

So observed speed = 400 (2 occurrences) corresponds to the elapsed hours = 1, 33; speed = 600 (3 occurrences) corresponds to elapsed hours = 2, 13, 28.

因此观察到的速度= 400(2次出现)对应于经过的小时数= 1,33;速度= 600(3次出现)对应于经过时间= 2,13,28。

For this example, say I am interested in grouping speed by frequency of occurrence. Once I have the indices that group speed from highest to lowest frequency, I can apply the same indices on the other data parameter arrays (like elap_hr).

对于这个例子,假设我对按发生频率分组速度感兴趣。一旦我有索引将速度从最高频率分组到最低频率,我可以在其他数据参数数组上应用相同的索引(如elap_hr)。

I first sort and argsort speed; then I find the unique elements of sorted speed. I combine these to find the indices of sorted speed that correspond to the sorted unique speed, which are grouped as sub-arrays per value in the sorted unique speed.

我首先排序和提高速度;然后我找到了排序速度的独特元素。我将这些组合起来找到与排序的唯一速度相对应的排序速度的索引,这些索引在排序的唯一速度中被分组为每个值的子阵列。

def get_sorted_data(data, sort_type='default'):
    if sort_type == 'default':
        res = sorted(data)
    elif sort_type == 'argsort':
        res = np.argsort(data)
    elif sort_type == 'by size':
        res = sorted(data, key=len)
    return res

def sort_data_by_frequency(data):
    uniq_data = np.unique(data)
    sorted_data = get_sorted_data(data)
    res = [np.where(sorted_data == uniq_data[i])[0] for i in range(len(uniq_data))]
    res = get_sorted_data(res, 'by size')[::-1]
    return res 

sorted_speed = get_sorted_data(speed)
argsorted_speed = get_sorted_data(speed, 'argsort')
freqsorted_speed = sort_data_by_frequency(speed)

print(sorted_speed)
# [300, 300, 400, 400, 600, 600, 600, 700, 800, 900]
print(argsorted_speed)
# [3 9 0 8 1 4 7 6 2 5]
print(freqsorted_speed)
# [array([4, 5, 6]), array([2, 3]), array([0, 1]), array([9]), array([8]), array([7])]

In freqsorted_speed, the first sub-array [4, 5, 6] corresponds to the indices of elements [600, 600, 600] in the sorted array.

在freqsorted_speed中,第一个子数组[4,5,6]对应于有序数组中元素[600,600,600]的索引。

This is ok up to this point. But, I want the indices to apply to all data parameter arrays. So, I need to map the argsorted indices to the original array indices.

到目前为止,这是好的。但是,我希望索引适用于所有数据参数数组。所以,我需要将argsorted索引映射到原始数组索引。

def get_dictionary_mapping(keys, values):
    ## since all indices are unique, there is no worry about identical keys
    return dict(zip(keys, values))

idx_orig = np.array([i for i in range(len(argsorted_speed))], dtype=int)
index_to_index_map = get_dictionary_mapping(idx_orig, argsorted_speed)

print(index_to_index_map)
# {0: 3, 1: 9, 2: 0, 3: 8, 4: 1, 5: 4, 6: 7, 7: 6, 8: 2, 9: 5}

print(speed[idx_orig])
# [400 600 800 300 600 900 700 600 400 300]

print(speed[argsorted_speed])
# [300 300 400 400 600 600 600 700 800 900]

print([index_to_index_map[idx_orig[i]] for i in range(len(idx_orig))])
# [3, 9, 0, 8, 1, 4, 7, 6, 2, 5]

I have all the necessary pieces to accomplish what I want. But I'm not quite sure how to put this altogether. Any advice would be appreciated.

我有所有必要的部分来完成我想要的东西。但我不太确定如何完全放这个。任何意见,将不胜感激。

EDIT:

As an end result, I would like to have the original indices of speed grouped by frequency like so:

作为最终结果,我希望将原始的速度指数按频率分组,如下所示:

res = [[1, 4, 7], [3, 9], [0, 8], ...]
## corresponds to 3 600's, 2 300's, 2 400's, etc.
## for values of equal frequency, the secondary grouping is from min-to-max

This way, I can choose the values by the nth most frequent value or by the frequency itself.

这样,我可以按第n个最常值或频率本身选择值。

1 个解决方案

#1


1  

Your desired result can be obtained as follows:

您可以按如下方式获得所需的结果:

>>> idx = np.argsort(speed)
>>> res = sorted(np.split(idx, np.flatnonzero(np.diff(speed[idx])) + 1), key=len, reverse=True)
>>> res
[array([1, 4, 7]), array([3, 9]), array([0, 8]), array([6]), array([2]), array([5])]

#1


1  

Your desired result can be obtained as follows:

您可以按如下方式获得所需的结果:

>>> idx = np.argsort(speed)
>>> res = sorted(np.split(idx, np.flatnonzero(np.diff(speed[idx])) + 1), key=len, reverse=True)
>>> res
[array([1, 4, 7]), array([3, 9]), array([0, 8]), array([6]), array([2]), array([5])]