防止numpy创建多维数组

NumPy is really helpful when creating arrays. If the first argument for numpy.array has a __getitem__ and __len__ method these are used on the basis that it might be a valid sequence.

NumPy在创建数组时非常有用。如果numpy.array的第一个参数有一个__getitem__和__len__方法,则可以使用它们,因为它可能是一个有效的序列。

Unfortunatly I want to create an array containing dtype=object without NumPy being "helpful".

不幸的是,我想创建一个包含dtype = object而不使用NumPy“有用”的数组。

Broken down to a minimal example the class would like this:

细分为一个最小的例子,这个类是这样的:

import numpy as np

class Test(object):
    def __init__(self, iterable):
        self.data = iterable

    def __getitem__(self, idx):
        return self.data[idx]

    def __len__(self):
        return len(self.data)

    def __repr__(self):
        return '{}({})'.format(self.__class__.__name__, self.data)

and if the "iterables" have different lengths everything is fine and I get exactly the result I want to have:

如果“iterables”有不同的长度,一切都很好,我得到了我想要的结果:

>>> np.array([Test([1,2,3]), Test([3,2])], dtype=object)
array([Test([1, 2, 3]), Test([3, 2])], dtype=object)

but NumPy creates a multidimensional array if these happen to have the same length:

但是如果NumPy恰好具有相同的长度,则会创建一个多维数组:

>>> np.array([Test([1,2,3]), Test([3,2,1])], dtype=object)
array([[1, 2, 3],
       [3, 2, 1]], dtype=object)

Unfortunatly there is only a ndmin argument so I was wondering if there is a way to enforce a ndmax or somehow prevent NumPy from interpreting the custom classes as another dimension (without deleting __len__ or __getitem__)?

不幸的是,只有一个ndmin参数,所以我想知道是否有办法强制执行ndmax或以某种方式阻止NumPy将自定义类解释为另一个维度(不删除__len__或__getitem__)?

3 个解决方案

#1

A workaround is of course to create an array of the desired shape and then copy the data:

解决方法当然是创建所需形状的数组,然后复制数据:

In [19]: lst = [Test([1, 2, 3]), Test([3, 2, 1])]

In [20]: arr = np.empty(len(lst), dtype=object)

In [21]: arr[:] = lst[:]

In [22]: arr
Out[22]: array([Test([1, 2, 3]), Test([3, 2, 1])], dtype=object)

Notice that in any case I would not be surprised if numpy behavior w.r.t. interpreting iterable objects (which is what you want to use, right?) is numpy version dependent. And possibly buggy. Or maybe some of these bugs are actually features. Anyway, I'd be wary of breakage when a numpy version changes.

请注意,无论如何我不会感到惊讶,如果numpy行为w.r.t.解释可迭代对象(这是你想要使用的,对吧?)是依赖于numpy版本的。可能有马车。或许这些错误中的一些实际上是功能。无论如何,当一个numpy版本发生变化时,我会对破损保持警惕。

On the contrary, copying into a pre-created array should be way more robust.

相反,复制到预先创建的数组应该更加健壮。

#2

This behavior has been discussed a number of times before (e.g. Override a dict with numpy support). np.array tries to make as high a dimensional array as it can. The model case is nested lists. If it can iterate and the sublists are equal in length it will 'drill' on down.

之前已经多次讨论过这种行为(例如,覆盖带有numpy支持的dict)。 np.array尝试制作尽可能高的维数组。模型案例是嵌套列表。如果它可以迭代并且子列表的长度相等,那么它将“向下钻取”。

Here it went down 2 levels before encountering lists of different length:

在遇到不同长度的列表之前它下降了2级:

In [250]: np.array([[[1,2],[3]],[1,2]],dtype=object)
Out[250]: 
array([[[1, 2], [3]],
       [1, 2]], dtype=object)
In [251]: _.shape
Out[251]: (2, 2)

Without a shape or ndmax parameter it has no way of knowing whether I want it to be (2,) or (2,2). Both of those would work with the dtype.

没有形状或ndmax参数,它无法知道我是否想要它(2,)或(2,2)。这两个都适用于dtype。

It's compiled code, so it isn't easy to see exactly what tests it uses. It tries to iterate on lists and tuples, but not on sets or dictionaries.

它是已编译的代码,因此很难确切地看到它使用的测试。它尝试迭代列表和元组,但不会迭代集或字典。

The surest way to make an object array with a given dimension is to start with an empty one, and fill it

制作具有给定维度的对象数组的最可靠方法是从空数据开始,然后填充它

In [266]: A=np.empty((2,3),object)
In [267]: A.fill([[1,'one']])
In [276]: A[:]={1,2}
In [277]: A[:]=[1,2]   # broadcast error

Another way is to start with at least one different element (e.g. a None), and then replace that.

另一种方法是从至少一个不同的元素(例如,无)开始,然后替换它。

There is a more primitive creator, ndarray that takes shape:

有一个更原始的创造者,ndarray形成:

In [280]: np.ndarray((2,3),dtype=object)
Out[280]: 
array([[None, None, None],
       [None, None, None]], dtype=object)

But that's basically the same as np.empty (unless I give it a buffer).

但这与np.empty基本相同(除非我给它一个缓冲区)。

These are fudges, but they aren't expensive (time wise).

这些是软糖,但它们并不昂贵(时间明智)。

================ (edit)

https://github.com/numpy/numpy/issues/5933, Enh: Object array creation function. is an enhancement request. Also https://github.com/numpy/numpy/issues/5303 the error message for accidentally irregular arrays is confusing.

https://github.com/numpy/numpy/issues/5933,En:对象数组创建功能。是一个增强请求。另外https://github.com/numpy/numpy/issues/5303意外不规则数组的错误消息令人困惑。

The developer sentiment seems to favor a separate function to create dtype=object arrays, one with more control over the initial dimensions and depth of iteration. They might even strengthen the error checking to keep np.array from creating 'irregular' arrays.

开发人员的情绪似乎倾向于使用单独的函数来创建dtype =对象数组,其中一个对初始维度和迭代深度有更多控制。他们甚至可以加强错误检查,以防止np.array创建“不规则”数组。

Such a function could detect the shape of a regular nested iterable down to a specified depth, and build an object type array to be filled.

这样的函数可以检测到规则嵌套迭代的形状直到指定的深度,并构建要填充的对象类型数组。

def objarray(alist, depth=1):
    shape=[]; l=alist
    for _ in range(depth):
        shape.append(len(l))
        l = l[0]
    arr = np.empty(shape, dtype=object)
    arr[:]=alist
    return arr

With various depths:

有各种深度:

In [528]: alist=[[Test([1,2,3])], [Test([3,2,1])]]
In [529]: objarray(alist,1)
Out[529]: array([[Test([1, 2, 3])], [Test([3, 2, 1])]], dtype=object)
In [530]: objarray(alist,2)
Out[530]: 
array([[Test([1, 2, 3])],
       [Test([3, 2, 1])]], dtype=object)
In [531]: objarray(alist,3)  
Out[531]: 
array([[[1, 2, 3]],

       [[3, 2, 1]]], dtype=object)
In [532]: objarray(alist,4)
...
TypeError: object of type 'int' has no len()

#3

This workaround may not be the most efficient, but I like it for its clarity:

这种解决方法可能不是最有效的,但我喜欢它的清晰度:

test_list = [Test([1,2,3]), Test([3,2,1])]
test_list.append(None)
test_array = np.array(test_list, dtype=object)[:-1]

Summary: You take your list, append None, then convert to a numpy array, preventing numpy from converting to a multidimensional array. Finally you just remove the last entry to get the structure you want.

简介:您获取列表,追加None,然后转换为numpy数组,防止numpy转换为多维数组。最后,您只需删除最后一个条目即可获得所需的结构。

#1