如何制作具有不同行大小的多维numpy数组?

时间:2021-07-22 21:46:58

I would like to create a two dimensional numpy array of arrays that has a different number of elements on each row.

我想创建一个二维numpy数组数组,每行有不同数量的元素。

Trying

cells = numpy.array([[0,1,2,3], [2,3,4]])

gives an error

给出错误

ValueError: setting an array element with a sequence.

4 个解决方案

#1


17  

While Numpy knows about arrays of arbitrary objects, it's optimized for homogeneous arrays of numbers with fixed dimensions. If you really need arrays of arrays, better use a nested list. But depending on the intended use of your data, different data structures might be even better, e.g. a masked array if you have some invalid data points.

虽然Numpy知道任意对象的数组,但它针对具有固定维度的同类数字数组进行了优化。如果您确实需要数组数组,最好使用嵌套列表。但是,根据数据的预期用途,不同的数据结构可能会更好,例如如果您有一些无效的数据点,则为掩码数组。

If you really want flexible Numpy arrays, use something like this:

如果你真的想要灵活的Numpy数组,请使用以下内容:

numpy.array([[0,1,2,3], [2,3,4]], dtype=object)

However this will create a one-dimensional array that stores references to lists, which means that you will lose most of the benefits of Numpy (vector processing, locality, slicing, etc.).

但是,这将创建一个存储列表引用的一维数组,这意味着您将失去Numpy的大部分好处(矢量处理,局部性,切片等)。

#2


17  

We are now almost 7 years after the question was asked, and your code

我们问了这个问题差不多7年了,还有你的代码

cells = numpy.array([[0,1,2,3], [2,3,4]])

executed in numpy 1.12.0, python 3.5, doesn't produce any error and cells contains:

在numpy 1.12.0,python 3.5中执行,不会产生任何错误,单元格包含:

array([[0, 1, 2, 3], [2, 3, 4]], dtype=object)

You access your cells elements as cells[0][2] # (=2) .

您可以将单元格元素作为单元格[0] [2]#(= 2)进行访问。

An alternative to tom10's solution if you want to build your list of numpy arrays on the fly as new elements (i.e. arrays) become available is to use append:

如果要在新元素(即数组)可用时动态构建numpy数组列表,则替代tom10的解决方案是使用append:

d = []                 # initialize an empty list
a = np.arange(3)       # array([0, 1, 2])
d.append(a)            # [array([0, 1, 2])]
b = np.arange(3,-1,-1) #array([3, 2, 1, 0])
d.append(b)            #[array([0, 1, 2]), array([3, 2, 1, 0])]

#3


12  

This isn't well supported in Numpy (by definition, almost everywhere, a "two dimensional array" has all rows of equal length). A Python list of Numpy arrays may be a good solution for you, as this way you'll get the advantages of Numpy where you can use them:

这在Numpy中得不到很好的支持(根据定义,几乎在所有地方,“二维数组”都具有相等长度的所有行)。 Numpy数组的Python列表对您来说可能是一个很好的解决方案,因为这样您就可以获得Numpy的优势,您可以在其中使用它们:

cells = [numpy.array(a) for a in [[0,1,2,3], [2,3,4]]]

#4


1  

Another option would be to store your arrays as one contiguous array and also store their sizes or offsets. This takes a little more conceptual thought around how to operate on your arrays, but a surprisingly large number of operations can be made to work as if you had a two dimensional array with different sizes. In the cases where they can't, then np.split can be used to create the list that calocedrus recommends. The easiest operations are ufuncs, because they require almost no modification. Here are some examples:

另一种选择是将数组存储为一个连续的数组,并存储它们的大小或偏移量。这需要对如何对阵列进行操作进行更多的概念性思考,但是可以使用大量的操作,就好像你有一个不同大小的二维数组一样。在他们不能的情况下,可以使用np.split来创建calocedrus建议的列表。最简单的操作是ufunc,因为它们几乎不需要修改。这里有些例子:

cells_flat = numpy.array([0, 1, 2, 3, 2, 3, 4])
# One of these is required, it's pretty easy to convert between them,
# but having both makes the examples easy
cell_lengths = numpy.array([4, 3])
cell_starts = numpy.insert(cell_lengths[:-1].cumsum(), 0, 0)
cell_lengths2 = numpy.diff(numpy.append(cell_starts, cells_flat.size))
assert np.all(cell_lengths == cell_lengths2)

# Copy prevents shared memory
cells = numpy.split(cells_flat.copy(), cell_starts[1:])
# [array([0, 1, 2, 3]), array([2, 3, 4])]

numpy.array([x.sum() for x in cells])
# array([6, 9])
numpy.add.reduceat(cells_flat, cell_starts)
# array([6, 9])

[a + v for a, v in zip(cells, [1, 3])]
# [array([1, 2, 3, 4]), array([5, 6, 7])]
cells_flat + numpy.repeat([1, 3], cell_lengths)
# array([1, 2, 3, 4, 5, 6, 7])

[a.astype(float) / a.sum() for a in cells]
# [array([ 0.        ,  0.16666667,  0.33333333,  0.5       ]),
#  array([ 0.22222222,  0.33333333,  0.44444444])]
cells_flat.astype(float) / np.add.reduceat(cells_flat, cell_starts).repeat(cell_lengths)
# array([ 0.        ,  0.16666667,  0.33333333,  0.5       ,  0.22222222,
#         0.33333333,  0.44444444])

def complex_modify(array):
    """Some complicated function that modifies array

    pretend this is more complex than it is"""
    array *= 3

for arr in cells:
    complex_modify(arr)
cells
# [array([0, 3, 6, 9]), array([ 6,  9, 12])]
for arr in numpy.split(cells_flat, cell_starts[1:]):
    complex_modify(arr)
cells_flat
# array([ 0,  3,  6,  9,  6,  9, 12])

#1


17  

While Numpy knows about arrays of arbitrary objects, it's optimized for homogeneous arrays of numbers with fixed dimensions. If you really need arrays of arrays, better use a nested list. But depending on the intended use of your data, different data structures might be even better, e.g. a masked array if you have some invalid data points.

虽然Numpy知道任意对象的数组,但它针对具有固定维度的同类数字数组进行了优化。如果您确实需要数组数组,最好使用嵌套列表。但是,根据数据的预期用途,不同的数据结构可能会更好,例如如果您有一些无效的数据点,则为掩码数组。

If you really want flexible Numpy arrays, use something like this:

如果你真的想要灵活的Numpy数组,请使用以下内容:

numpy.array([[0,1,2,3], [2,3,4]], dtype=object)

However this will create a one-dimensional array that stores references to lists, which means that you will lose most of the benefits of Numpy (vector processing, locality, slicing, etc.).

但是,这将创建一个存储列表引用的一维数组,这意味着您将失去Numpy的大部分好处(矢量处理,局部性,切片等)。

#2


17  

We are now almost 7 years after the question was asked, and your code

我们问了这个问题差不多7年了,还有你的代码

cells = numpy.array([[0,1,2,3], [2,3,4]])

executed in numpy 1.12.0, python 3.5, doesn't produce any error and cells contains:

在numpy 1.12.0,python 3.5中执行,不会产生任何错误,单元格包含:

array([[0, 1, 2, 3], [2, 3, 4]], dtype=object)

You access your cells elements as cells[0][2] # (=2) .

您可以将单元格元素作为单元格[0] [2]#(= 2)进行访问。

An alternative to tom10's solution if you want to build your list of numpy arrays on the fly as new elements (i.e. arrays) become available is to use append:

如果要在新元素(即数组)可用时动态构建numpy数组列表,则替代tom10的解决方案是使用append:

d = []                 # initialize an empty list
a = np.arange(3)       # array([0, 1, 2])
d.append(a)            # [array([0, 1, 2])]
b = np.arange(3,-1,-1) #array([3, 2, 1, 0])
d.append(b)            #[array([0, 1, 2]), array([3, 2, 1, 0])]

#3


12  

This isn't well supported in Numpy (by definition, almost everywhere, a "two dimensional array" has all rows of equal length). A Python list of Numpy arrays may be a good solution for you, as this way you'll get the advantages of Numpy where you can use them:

这在Numpy中得不到很好的支持(根据定义,几乎在所有地方,“二维数组”都具有相等长度的所有行)。 Numpy数组的Python列表对您来说可能是一个很好的解决方案,因为这样您就可以获得Numpy的优势,您可以在其中使用它们:

cells = [numpy.array(a) for a in [[0,1,2,3], [2,3,4]]]

#4


1  

Another option would be to store your arrays as one contiguous array and also store their sizes or offsets. This takes a little more conceptual thought around how to operate on your arrays, but a surprisingly large number of operations can be made to work as if you had a two dimensional array with different sizes. In the cases where they can't, then np.split can be used to create the list that calocedrus recommends. The easiest operations are ufuncs, because they require almost no modification. Here are some examples:

另一种选择是将数组存储为一个连续的数组,并存储它们的大小或偏移量。这需要对如何对阵列进行操作进行更多的概念性思考,但是可以使用大量的操作,就好像你有一个不同大小的二维数组一样。在他们不能的情况下,可以使用np.split来创建calocedrus建议的列表。最简单的操作是ufunc,因为它们几乎不需要修改。这里有些例子:

cells_flat = numpy.array([0, 1, 2, 3, 2, 3, 4])
# One of these is required, it's pretty easy to convert between them,
# but having both makes the examples easy
cell_lengths = numpy.array([4, 3])
cell_starts = numpy.insert(cell_lengths[:-1].cumsum(), 0, 0)
cell_lengths2 = numpy.diff(numpy.append(cell_starts, cells_flat.size))
assert np.all(cell_lengths == cell_lengths2)

# Copy prevents shared memory
cells = numpy.split(cells_flat.copy(), cell_starts[1:])
# [array([0, 1, 2, 3]), array([2, 3, 4])]

numpy.array([x.sum() for x in cells])
# array([6, 9])
numpy.add.reduceat(cells_flat, cell_starts)
# array([6, 9])

[a + v for a, v in zip(cells, [1, 3])]
# [array([1, 2, 3, 4]), array([5, 6, 7])]
cells_flat + numpy.repeat([1, 3], cell_lengths)
# array([1, 2, 3, 4, 5, 6, 7])

[a.astype(float) / a.sum() for a in cells]
# [array([ 0.        ,  0.16666667,  0.33333333,  0.5       ]),
#  array([ 0.22222222,  0.33333333,  0.44444444])]
cells_flat.astype(float) / np.add.reduceat(cells_flat, cell_starts).repeat(cell_lengths)
# array([ 0.        ,  0.16666667,  0.33333333,  0.5       ,  0.22222222,
#         0.33333333,  0.44444444])

def complex_modify(array):
    """Some complicated function that modifies array

    pretend this is more complex than it is"""
    array *= 3

for arr in cells:
    complex_modify(arr)
cells
# [array([0, 3, 6, 9]), array([ 6,  9, 12])]
for arr in numpy.split(cells_flat, cell_starts[1:]):
    complex_modify(arr)
cells_flat
# array([ 0,  3,  6,  9,  6,  9, 12])