在2D数组中追加/连接数组

I'm trying to generate a 3x4 array, for which each element will become an array as well of unknown size. During the process I'm appending new numbers one by one to certain cells of the 3x4 matrix. I eventually want to end up with an array that looks like this:

我正在尝试生成一个3x4数组,每个元素都将成为一个未知大小的数组。在此过程中,我将新数字逐个附加到3x4矩阵的某些单元格。我最终希望得到一个如下所示的数组:

[[[1,8,9],[1,2],[],[]],
[[8],[],[4,5],[9,1]],
[[],[7,1,4],[],[2,1,3]]]

Right now I've been trying to use append and concatenate, but I can't seem to find a good way to do this since the inside arrays are of changing size. Also I don't know what the best way is to initialize my matrix. Simplified, my code looks like this:

现在我一直在尝试使用append和concatenate,但我似乎无法找到一个好方法来实现这一点,因为内部数组的大小正在改变。另外我不知道初始化矩阵的最佳方法是什么。简化后,我的代码如下所示:

mat = np.empty((3,4,1))
for x in range(1000):
    i, j, value = somefunction()
    mat[i,j,:] = np.append(mat[i,j,:], value)

Does anybody know the best way to append (or concatenate or...) these values to my matrix? I have been looking up similar questions concerning appending and concatenation and tried a lot of different things, but I wasn't able to figure it out. I found it quite hard to explain my question, so I hope my description is clear.

有没有人知道将这些值附加(或连接或...)到我的矩阵的最佳方法?我一直在查找有关追加和连接的类似问题,并尝试了很多不同的东西,但我无法弄清楚。我发现很难解释我的问题,所以我希望我的描述清楚。

2 个解决方案

#1

You can use so called object arrays to get this job done. Normally, numpy arrays consist of primitive type but it is possible to create arrays where each element is an arbitrary Python object. This way you can make an array that contains arrays.

您可以使用所谓的对象数组来完成此任务。通常,numpy数组由基本类型组成,但是可以创建数组,其中每个元素都是一个任意的Python对象。这样您就可以创建一个包含数组的数组。

mat = np.empty((3, 4), dtype=object)

Note that each element in mat is now None. Let's fill the matrix:

请注意,mat中的每个元素现在都是None。让我们填写矩阵:

for x in range(1000):
    i, j, value = somefunction()
    if mat[i, j] is None:
        mat[i, j] = np.array(value)
    else:
        mat[i, j] = np.append(mat[i, j], value)

This should get the job done, but it's most horribly inefficient for two reasons:

这应该可以完成工作,但由于两个原因,这是非常低效的:

dtype=object loose almost all properties that make numpy arrays fast. Every operation on an element must involve the Python interpreter, which normally would not happen.

dtype = object几乎所有使numpy数组快速生成的属性。元素上的每个操作都必须包含Python解释器,这通常不会发生。

numpy arrays are designed to be static; they are not designed to grow. So what np.append really does is copying the old array into a new bigger array. This gets slower over time the more the array grows.

numpy数组被设计为静态的;它们不是为了成长而设计的。那么np.append真正做的是将旧数组复制到一个新的更大的数组中。随着时间的推移,数组越长,速度越慢。

Considering that you want to reduce the whole thing into a 3x4 array in the end, it's probably better to work with regular Python lists:

考虑到你想要将整个东西减少到3x4数组,最好使用常规的Python列表:

# initialize a 3x4x0 hierarchy of nested lists
mat = [[[] for _ in range(4)] for _ in range(3)]

for x in range(1000):
    i, j, value = somefunction()
    mat[i][j].append(value)

# reduce each sub-list to its mean (empty list -> nan)
for i in range(3):
    for j in range(4):
        mat[i][j] = np.mean(mat[i][j])

# FINALLY convert to array
mat = np.array(mat)

#2

An easy way to test whether such an array will be useful is to wrap your list of lists in np.array:

测试这样的数组是否有用的简单方法是将列表列表包装在np.array中:

In [767]: mat = np.array([[[1,8,9],[1,2],[],[]],
     ...: [[8],[],[4,5],[9,1]],
     ...: [[],[7,1,4],[],[2,1,3]]])
In [768]: mat
Out[768]: 
array([[list([1, 8, 9]), list([1, 2]), list([]), list([])],
       [list([8]), list([]), list([4, 5]), list([9, 1])],
       [list([]), list([7, 1, 4]), list([]), list([2, 1, 3])]], dtype=object)
In [769]: mat.shape
Out[769]: (3, 4)

The result is (3,4) object dtype array. This isn't the most reliable way of making an object dtype array (starting with the np.empty((3,4),object) is more general), but in this case it works fine.

结果是(3,4)对象dtype数组。这不是制作对象的最可靠方式dtype数组(从np.empty((3,4),对象开始)更通用),但在这种情况下它工作正常。

But such an array doesn't have many advantages compared to the original list of lists. Most of the faster array operations don't work. Most tasks will require Python level iteration over the list elements.

但是与原始列表列表相比,这样的阵列没有很多优点。大多数更快的阵列操作都不起作用。大多数任务都需要对列表元素进行Python级迭代。

I could use np.vectorize to iterate, for example to take means:

我可以使用np.vectorize进行迭代,例如采取方法:

In [775]: np.vectorize(np.mean)(mat)
/usr/local/lib/python3.5/dist-packages/numpy/core/fromnumeric.py:2909: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/usr/local/lib/python3.5/dist-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Out[775]: 
array([[ 6. ,  1.5,  nan,  nan],
       [ 8. ,  nan,  4.5,  5. ],
       [ nan,  4. ,  nan,  2. ]])

It doesn't like taking the mean of an empty list. We could write a simple function that handles [] more gracefully.

它不喜欢采用空列表的平均值。我们可以编写一个简单的函数来更好地处理[]。

I could turn the lists into arrays (note the use of otypes):

我可以将列表转换为数组(注意使用otypes):

In [777]: arr = np.vectorize(np.array,otypes=[object])(mat)
In [778]: arr
Out[778]: 
array([[array([1, 8, 9]), array([1, 2]), array([], dtype=float64),
        array([], dtype=float64)],
       [array([8]), array([], dtype=float64), array([4, 5]), array([9, 1])],
       [array([], dtype=float64), array([7, 1, 4]),
        array([], dtype=float64), array([2, 1, 3])]], dtype=object)

though I'm not sure this buys us much.

虽然我不确定这会给我们买多少钱。

#1

mat = np.empty((3, 4), dtype=object)