从n×2 numpy数组填充SortedLIst

I have a numpy array with the shape n×2, a bunch of tuples of length 2, that I would like to transfer to SortedList. So the goal is to create a SortedList with integer tuples of length 2.

我有一个形状为n×2的numpy数组,一堆长度为2的元组,我想转移到SortedList。因此,目标是创建一个长度为2的整数元组的SortedList。

The problem is that the constructor of SortedList checks the truth value of each entry. This works fine for one-dimensional arrays:

问题是SortedList的构造函数检查每个条目的真值。这适用于一维数组:

In [1]: import numpy as np
In [2]: from sortedcontainers import SortedList
In [3]: a = np.array([1,2,3,4])
In [4]: SortedList(a)
Out[4]: SortedList([1, 2, 3, 4], load=1000)

But for two dimensions, when each entry is an array, there is no clear truth value and SortedList is uncooperative:

但是对于两个维度,当每个条目都是一个数组时,没有明确的真值,而且SortedList是不合作的:

In [5]: a.resize(2,2)
In [6]: a
Out[6]: 
array([[1, 2],
       [3, 4]])

In [7]: SortedList(a)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-7a4b2693bb52> in <module>()
----> 1 SortedList(a)

/home/me/miniconda3/envs/env/lib/python3.6/site-packages/sortedcontainers/sortedlist.py in __init__(self, iterable, load)
     81 
     82         if iterable is not None:
---> 83             self._update(iterable)
     84 
     85     def __new__(cls, iterable=None, key=None, load=1000):

/home/me/miniconda3/envs/env/lib/python3.6/site-packages/sortedcontainers/sortedlist.py in update(self, iterable)
    176         _lists = self._lists
    177         _maxes = self._maxes
--> 178         values = sorted(iterable)
    179 
    180         if _maxes:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

My current workaround is to convert each line to a tuple manually:

我目前的解决方法是手动将每一行转换为元组:

sl = SortedList()
for t in np_array:
    x, y = t
    sl.add((x,y))

However, this solution offers some room for improvement. Does anyone have an idea how to solve this problem without explicitly unpacking all the arrays into tuples?

但是,该解决方案提供了一些改进空间。有没有人知道如何在没有显式解压缩所有数组到元组的情况下解决这个问题?

1 个解决方案

#1

The problem isn't that the truth value of the arrays is being checked, it's that they're being compared so that they can be sorted. If you use comparison operators on arrays, you get arrays back:

问题不在于检查数组的真值,而是对它们进行比较以便对它们进行排序。如果在数组上使用比较运算符,则会返回数组:

>>> import numpy as np
>>> np.array([1, 4]) < np.array([2, 3])
array([ True, False], dtype=bool)

This resulting boolean array is actually the array whose truth value is being checked by sorted.

这个结果的布尔数组实际上是通过sorted检查其真值的数组。

On the other hand, the same operation with tuples (or lists) will do an element by element comparison and return a single boolean value:

另一方面,与元组(或列表)相同的操作将逐元素进行元素比较并返回单个布尔值:

>>> (1, 4) < (2, 3)
True
>>> (1, 4) < (1, 3)
False

So when SortedList tries to use sorted on a sequence of numpy arrays, it can't do a comparison, because it needs single boolean values to be returned by comparison operators.

因此,当SortedList尝试在numpy数组序列上使用sorted时,它无法进行比较,因为它需要比较运算符返回单个布尔值。

One way to abstract this would be to create a new array class that implements comparison operators like __eq__, __lt__, __gt__, etc. to reproduce the sorting behavior of tuples. Ironically, the easiest way to do this would be to cast the underlying arrays to tuples, like:

抽象这种方法的一种方法是创建一个新的数组类,它实现比较运算符,如__eq __,__ lt __,__ gt__等,以重现元组的排序行为。具有讽刺意味的是,最简单的方法是将底层数组转换为元组,如:

class SortableArray(object):

    def __init__(self, seq):
        self._values = np.array(seq)

    def __eq__(self, other):
        return tuple(self._values) == tuple(other._values)
        # or:
        # return np.all(self._values == other._values)

    def __lt__(self, other):
        return tuple(self._values) < tuple(other._values)

    def __gt__(self, other):
        return tuple(self._values) > tuple(other._values)

    def __le__(self, other):
        return tuple(self._values) <= tuple(other._values)

    def __ge__(self, other):
        return tuple(self._values) >= tuple(other._values)

    def __str__(self):
        return str(self._values)

    def __repr__(self):
        return repr(self._values)

With this implementation, you can now sort a list of SortableArray objects:

通过此实现,您现在可以对SortableArray对象列表进行排序:

In [4]: ar1 = SortableArray([1, 3])

In [5]: ar2 = SortableArray([1, 4])

In [6]: ar3 = SortableArray([1, 3])

In [7]: ar4 = SortableArray([4, 5])

In [8]: ar5 = SortableArray([0, 3])

In [9]: lst1 = [ar1, ar2, ar3, ar4, ar5]

In [10]: lst1
Out[10]: [array([1, 3]), array([1, 4]), array([1, 3]), array([4, 5]), array([0, 3])]

In [11]: sorted(lst1)
Out[11]: [array([0, 3]), array([1, 3]), array([1, 3]), array([1, 4]), array([4, 5])]

This might be overkill for what you need, but it's one way to do it. In any case, you won't get away with using sorted on a sequence of objects that don't return a single boolean value on comparison.

这可能对你需要的东西有点过分,但这是一种方法。在任何情况下,您都不会使用在比较上没有返回单个布尔值的对象序列上进行排序。

If all you're after is avoiding the for loop, you could just replace it with a list comprehension (i.e. SortedList([tuple(row) for row in np_array])).

如果您只是在避开for循环,那么您可以将其替换为列表推导(即SortedList([np_array中行的[元组(行)]))。

#1