np.arrays的形状，意想不到的额外维度

I'm dealing with arrays in python, and this generated a lot of doubts...

我在python中处理数组，这引起了很多疑惑......

1) I produce a list of list reading 4 columns from N files and I store 4 elements for N times in a list. I then convert this list in a numpy array:

1）我生成一个列表，列表从N个文件中读取4列，并在列表中存储4个元素N次。然后我在一个numpy数组中转换这个列表：

s = np.array(s)

and I ask for the shape of this array. The answer is correct:

我问这个阵列的形状。答案是正确的：

print s.shape
#(N,4)

I then produce the mean of this Nx4 array:

然后我生成这个Nx4数组的平均值：

s_m = sum(s)/len(s)
print s_m.shape
#(4,)

that I guess it means that this array is a 1D array. Is this correct?

我想这意味着这个数组是一维数组。它是否正确？

2) If I subtract the mean vector s_m from the rows of the array s, I can proceed in two ways:

2）如果我从数组s的行中减去平均向量s_m，我可以用两种方式进行：

residuals_s = s - s_m

or:

要么：

residuals_s = []

for i in range(len(s)):
    residuals_s.append([])
    tmp = s[i] - s_m
    residuals_s.append(tmp)

if I now ask for the shape of residuals_s in the two cases I obtain two different answers. In the first case I obtain:

如果我现在要求在两种情况下残差的形状我得到两个不同的答案。在第一种情况下，我获得：

(N,4)

in the second:

在第二：

(N,1,4)

can someone explain why there is an additional dimension?

有人可以解释为什么还有一个额外的维度？

2 个解决方案

#1

You can get the mean using the numpy method (producing the same (4,) shape):

你可以使用numpy方法得到均值（产生相同的（4，）形状）：

s_m = s.mean(axis=0)

s - s_m works because s_m is 'broadcasted' to the dimensions of s.

s - s_m有效，因为s_m被'广播'到s的维度。

If I run your second residuals_s I get a list containing empty lists and arrays:

如果我运行你的第二个residuals_s我得到一个包含空列表和数组的列表：

[[],
 array([ 1.02649662,  0.43613824,  0.66276758,  2.0082684 ]),
 [],
 array([ 1.13000227, -0.94129685,  0.63411801, -0.383982  ]),
 ...
]

That does not convert to a (N,1,4) array, but rather a (M,) array with dtype=object. Did you copy and paste correctly?

这不会转换为（N，1,4）数组，而是转换为具有dtype = object的（M，）数组。你是否正确复制和粘贴？

A corrected iteration is:

更正的迭代是：

for i in range(len(s)):
    residuals_s.append(s[i]-s_m)

produces a simpler list of arrays:

生成一个更简单的数组列表：

[array([ 1.02649662,  0.43613824,  0.66276758,  2.0082684 ]),
 array([ 1.13000227, -0.94129685,  0.63411801, -0.383982  ]),
...]

which converts to a (N,4) array.

它转换为（N，4）数组。

Iteration like this usually is not needed. But if it is, appending to lists like this is one way to go. Another is to pre allocate an array, and assign rows

通常不需要像这样的迭代。但如果是这样，追加到这样的列表是一种方法。另一种方法是预先分配一个数组，并分配行

residuals_s = np.zeros_like(s)
for i in range(s.shape[0]):
    residuals_s[i,:] = s[i]-s_m

I get your (N,1,4) with:

我得到你的（N，1,4）：

In [39]: residuals_s=[]
In [40]: for i in range(len(s)):
   ....:     residuals_s.append([])
   ....:     tmp = s[i] - s_m
   ....:     residuals_s[-1].append(tmp)
In [41]: residuals_s
Out[41]: 
[[array([ 1.02649662,  0.43613824,  0.66276758,  2.0082684 ])],
 [array([ 1.13000227, -0.94129685,  0.63411801, -0.383982  ])],
...]
In [43]: np.array(residuals_s).shape
Out[43]: (10, 1, 4)

Here the s[i]-s_m array is appended to an empty list, which has been appended to the main list. So it's an array within a list within a list. It's this intermediate list that produces the middle 1 dimension.

这里s [i] -s_m数组被附加到一个空列表中，该列表已被附加到主列表中。所以它是列表中列表中的数组。这是产生中间1维的中间列表。

#2

You are using NumPy ndarray without using the functions in NumPy, sum() is a python builtin function, you should use numpy.sum() instead.

你使用NumPy ndarray而不使用NumPy中的函数，sum（）是一个python内置函数，你应该使用numpy.sum（）代替。

I suggest you change your code as:

我建议你改变你的代码：

import numpy as np
np.random.seed(0)
s = np.random.randn(10, 4)
s_m = np.mean(a, axis=0, keepdims=True)
residuals_s = s - s_m

print s.shape, s_m.shape, residuals_s.shape

use mean() function with axis and keepdims arguments will give you the correct result.

使用带有axis和keepdims参数的mean（）函数将为您提供正确的结果。

#1