In numpy
the dimensions of the resulting array vary at run time. There is often confusion between a 1d array and a 2d array with 1 column. In one case I can iterate over the columns, in the other case I cannot.
在numpy中,结果数组的维数在运行时是不同的。在1d数组和带有1列的2d数组之间经常存在混淆。在一种情况下,我可以遍历列,在另一种情况下,我不能。
How do you solve elegantly that problem? To avoid littering my code with if
statements checking for the dimensionality, I use this function:
如何优雅地解决这个问题?为了避免在if语句中对代码进行维度检查,我使用了这个函数:
def reshape_to_vect(ar):
if len(ar.shape) == 1:
return ar.reshape(ar.shape[0],1)
return ar
However, this feels inelegant and costly. Is there a better solution?
然而,这感觉不雅和昂贵。有更好的解决方案吗?
5 个解决方案
#1
7
You could do -
你能做的,
ar.reshape(ar.shape[0],-1)
That second input to reshape
: -1
takes care of the number of elements for the second axis. Thus, for a 2D
input case, it does no change. For a 1D
input case, it creates a 2D
array with all elements being "pushed" to the first axis because of ar.shape[0]
, which was the total number of elements.
第二个要重新塑造的输入:-1负责第二个轴的元素数量。因此,对于二维输入情况,它没有变化。对于一维输入情况,它创建了一个二维数组,所有元素都被“推”到第一个轴,因为ar.shape[0],即元素的总数。
Sample runs
样本运行
1D Case :
一维情况下:
In [87]: ar
Out[87]: array([ 0.80203158, 0.25762844, 0.67039516, 0.31021513, 0.80701097])
In [88]: ar.reshape(ar.shape[0],-1)
Out[88]:
array([[ 0.80203158],
[ 0.25762844],
[ 0.67039516],
[ 0.31021513],
[ 0.80701097]])
2D Case :
2 d的例子:
In [82]: ar
Out[82]:
array([[ 0.37684126, 0.16973899, 0.82157815, 0.38958523],
[ 0.39728524, 0.03952238, 0.04153052, 0.82009233],
[ 0.38748174, 0.51377738, 0.40365096, 0.74823535]])
In [83]: ar.reshape(ar.shape[0],-1)
Out[83]:
array([[ 0.37684126, 0.16973899, 0.82157815, 0.38958523],
[ 0.39728524, 0.03952238, 0.04153052, 0.82009233],
[ 0.38748174, 0.51377738, 0.40365096, 0.74823535]])
#2
5
The simplest way:
最简单的方法:
ar.reshape(-1, 1)
#3
2
A variant of the answer by divakar is: x = np.reshape(x, (len(x),-1))
, which also deals with the case when the input is a 1d or 2d list.
divakar给出的答案的一个变体是:x = np。重构(x, (len(x),-1)),也处理输入为1d或2d列表时的情况。
#4
0
I asked about dtype
because your example is puzzling.
我问了关于dtype的问题,因为你的例子令人费解。
I can make a structured array with 3 elements (1d) and 3 fields:
我可以创建一个有3个元素(1d)和3个字段的结构化数组:
In [1]: A = np.ones((3,), dtype='i,i,i')
In [2]: A
Out[2]:
array([(1, 1, 1), (1, 1, 1), (1, 1, 1)],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
I can access one field by name (adding brackets doesn't change things)
我可以按名称访问一个字段(添加括号不会改变什么)
In [3]: A['f0'].shape
Out[3]: (3,)
but if I access 2 fields, I still get a 1d array
但是如果我访问两个字段,我仍然会得到一个1d数组。
In [4]: A[['f0','f1']].shape
Out[4]: (3,)
In [5]: A[['f0','f1']]
Out[5]:
array([(1, 1), (1, 1), (1, 1)],
dtype=[('f0', '<i4'), ('f1', '<i4')])
Actually those extra brackets do matter, if I look at values
实际上这些额外的括号很重要,如果我看值的话
In [22]: A['f0']
Out[22]: array([1, 1, 1], dtype=int32)
In [23]: A[['f0']]
Out[23]:
array([(1,), (1,), (1,)],
dtype=[('f0', '<i4')])
If the array is a simple 2d one, I still don't get your shapes
如果这个数组是一个简单的二维数组,我仍然没有得到你的形状。
In [24]: A=np.ones((3,3),int)
In [25]: A[0].shape
Out[25]: (3,)
In [26]: A[[0]].shape
Out[26]: (1, 3)
In [27]: A[[0,1]].shape
Out[27]: (2, 3)
But as to question of making sure an array is 2d, regardless of whether the indexing returns 1d or 2, your function is basically ok
但是,要确保数组是2d的问题,无论索引返回的是1d还是2,您的函数基本上是ok的
def reshape_to_vect(ar):
if len(ar.shape) == 1:
return ar.reshape(ar.shape[0],1)
return ar
You could test ar.ndim
instead of len(ar.shape)
. But either way it is not costly - that is, the execution time is minimal - no big array operations. reshape
doesn't copy data (unless your strides are weird), so it is just the cost of creating a new array object with a shared data pointer.
你可以测试ar.ndim而不是len(ar.shape)。但无论哪种方式,都不会造成太大的开销——也就是说,执行时间很短——没有大的数组操作。整形不会复制数据(除非你的步长很奇怪),所以这只是用共享数据指针创建一个新的数组对象的成本。
Look at the code for np.atleast_2d
; it tests for 0d and 1d. In the 1d case it returns result = ary[newaxis,:]
. It adds the extra axis first, the more natural numpy
location for adding an axis. You add it at the end.
查看np.atleast_2d的代码;它测试0d和1d。在1d情况下,它返回结果= ary[newaxis,:]。它首先添加额外的轴,增加一个轴的更自然的numpy位置。你把它加在最后。
ar.reshape(ar.shape[0],-1)
is a clever way of bypassing the if
test. In small timing tests it faster, but we are talking about microseconds, the effect of a function call layer.
形(ar.shape[0],-1)是绕过if测试的一种聪明的方法。在小型计时测试中,测试速度更快,但我们讨论的是微秒,即函数调用层的影响。
np.column_stack
is another function that creates column arrays if needed. It uses:
np。column_stack是另一个在需要时创建列数组的函数。它使用:
if arr.ndim < 2:
arr = array(arr, copy=False, subok=True, ndmin=2).T
#5
0
To avoid the need to reshape in the first place, if you slice a row / column with a list, or a "running" slice, you will get a 2D array with one row / column
为了避免在第一时间进行重构,如果您用列表或“运行”切片对行/列进行切片,您将得到一个具有一行/列的2D数组。
import numpy as np
x = np.array(np.random.normal(size=(4,4)))
print x, '\n'
Result:
[[ 0.01360395 1.12130368 0.95429414 0.56827029]
[-0.66592215 1.04852182 0.20588886 0.37623406]
[ 0.9440652 0.69157556 0.8252977 -0.53993904]
[ 0.6437994 0.32704783 0.52523173 0.8320762 ]]
y = x[:,[0]]
print y, 'col vector \n'
Result:
[[ 0.01360395]
[-0.66592215]
[ 0.9440652 ]
[ 0.6437994 ]] col vector
y = x[[0],:]
print y, 'row vector \n'
Result:
[[ 0.01360395 1.12130368 0.95429414 0.56827029]] row vector
# Slice with "running" index on a column
y = x[:,0:1]
print y, '\n'
Result:
[[ 0.01360395]
[-0.66592215]
[ 0.9440652 ]
[ 0.6437994 ]]
Instead if you use a single number for choosing the row/column, it will result in a 1D array, which is the root cause of your issue:
相反,如果您使用单个数字来选择行/列,则会产生一个1D数组,这是问题的根本原因:
y = x[:,0]
print y, '\n'
Result:
[ 0.01360395 -0.66592215 0.9440652 0.6437994 ]
#1
7
You could do -
你能做的,
ar.reshape(ar.shape[0],-1)
That second input to reshape
: -1
takes care of the number of elements for the second axis. Thus, for a 2D
input case, it does no change. For a 1D
input case, it creates a 2D
array with all elements being "pushed" to the first axis because of ar.shape[0]
, which was the total number of elements.
第二个要重新塑造的输入:-1负责第二个轴的元素数量。因此,对于二维输入情况,它没有变化。对于一维输入情况,它创建了一个二维数组,所有元素都被“推”到第一个轴,因为ar.shape[0],即元素的总数。
Sample runs
样本运行
1D Case :
一维情况下:
In [87]: ar
Out[87]: array([ 0.80203158, 0.25762844, 0.67039516, 0.31021513, 0.80701097])
In [88]: ar.reshape(ar.shape[0],-1)
Out[88]:
array([[ 0.80203158],
[ 0.25762844],
[ 0.67039516],
[ 0.31021513],
[ 0.80701097]])
2D Case :
2 d的例子:
In [82]: ar
Out[82]:
array([[ 0.37684126, 0.16973899, 0.82157815, 0.38958523],
[ 0.39728524, 0.03952238, 0.04153052, 0.82009233],
[ 0.38748174, 0.51377738, 0.40365096, 0.74823535]])
In [83]: ar.reshape(ar.shape[0],-1)
Out[83]:
array([[ 0.37684126, 0.16973899, 0.82157815, 0.38958523],
[ 0.39728524, 0.03952238, 0.04153052, 0.82009233],
[ 0.38748174, 0.51377738, 0.40365096, 0.74823535]])
#2
5
The simplest way:
最简单的方法:
ar.reshape(-1, 1)
#3
2
A variant of the answer by divakar is: x = np.reshape(x, (len(x),-1))
, which also deals with the case when the input is a 1d or 2d list.
divakar给出的答案的一个变体是:x = np。重构(x, (len(x),-1)),也处理输入为1d或2d列表时的情况。
#4
0
I asked about dtype
because your example is puzzling.
我问了关于dtype的问题,因为你的例子令人费解。
I can make a structured array with 3 elements (1d) and 3 fields:
我可以创建一个有3个元素(1d)和3个字段的结构化数组:
In [1]: A = np.ones((3,), dtype='i,i,i')
In [2]: A
Out[2]:
array([(1, 1, 1), (1, 1, 1), (1, 1, 1)],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
I can access one field by name (adding brackets doesn't change things)
我可以按名称访问一个字段(添加括号不会改变什么)
In [3]: A['f0'].shape
Out[3]: (3,)
but if I access 2 fields, I still get a 1d array
但是如果我访问两个字段,我仍然会得到一个1d数组。
In [4]: A[['f0','f1']].shape
Out[4]: (3,)
In [5]: A[['f0','f1']]
Out[5]:
array([(1, 1), (1, 1), (1, 1)],
dtype=[('f0', '<i4'), ('f1', '<i4')])
Actually those extra brackets do matter, if I look at values
实际上这些额外的括号很重要,如果我看值的话
In [22]: A['f0']
Out[22]: array([1, 1, 1], dtype=int32)
In [23]: A[['f0']]
Out[23]:
array([(1,), (1,), (1,)],
dtype=[('f0', '<i4')])
If the array is a simple 2d one, I still don't get your shapes
如果这个数组是一个简单的二维数组,我仍然没有得到你的形状。
In [24]: A=np.ones((3,3),int)
In [25]: A[0].shape
Out[25]: (3,)
In [26]: A[[0]].shape
Out[26]: (1, 3)
In [27]: A[[0,1]].shape
Out[27]: (2, 3)
But as to question of making sure an array is 2d, regardless of whether the indexing returns 1d or 2, your function is basically ok
但是,要确保数组是2d的问题,无论索引返回的是1d还是2,您的函数基本上是ok的
def reshape_to_vect(ar):
if len(ar.shape) == 1:
return ar.reshape(ar.shape[0],1)
return ar
You could test ar.ndim
instead of len(ar.shape)
. But either way it is not costly - that is, the execution time is minimal - no big array operations. reshape
doesn't copy data (unless your strides are weird), so it is just the cost of creating a new array object with a shared data pointer.
你可以测试ar.ndim而不是len(ar.shape)。但无论哪种方式,都不会造成太大的开销——也就是说,执行时间很短——没有大的数组操作。整形不会复制数据(除非你的步长很奇怪),所以这只是用共享数据指针创建一个新的数组对象的成本。
Look at the code for np.atleast_2d
; it tests for 0d and 1d. In the 1d case it returns result = ary[newaxis,:]
. It adds the extra axis first, the more natural numpy
location for adding an axis. You add it at the end.
查看np.atleast_2d的代码;它测试0d和1d。在1d情况下,它返回结果= ary[newaxis,:]。它首先添加额外的轴,增加一个轴的更自然的numpy位置。你把它加在最后。
ar.reshape(ar.shape[0],-1)
is a clever way of bypassing the if
test. In small timing tests it faster, but we are talking about microseconds, the effect of a function call layer.
形(ar.shape[0],-1)是绕过if测试的一种聪明的方法。在小型计时测试中,测试速度更快,但我们讨论的是微秒,即函数调用层的影响。
np.column_stack
is another function that creates column arrays if needed. It uses:
np。column_stack是另一个在需要时创建列数组的函数。它使用:
if arr.ndim < 2:
arr = array(arr, copy=False, subok=True, ndmin=2).T
#5
0
To avoid the need to reshape in the first place, if you slice a row / column with a list, or a "running" slice, you will get a 2D array with one row / column
为了避免在第一时间进行重构,如果您用列表或“运行”切片对行/列进行切片,您将得到一个具有一行/列的2D数组。
import numpy as np
x = np.array(np.random.normal(size=(4,4)))
print x, '\n'
Result:
[[ 0.01360395 1.12130368 0.95429414 0.56827029]
[-0.66592215 1.04852182 0.20588886 0.37623406]
[ 0.9440652 0.69157556 0.8252977 -0.53993904]
[ 0.6437994 0.32704783 0.52523173 0.8320762 ]]
y = x[:,[0]]
print y, 'col vector \n'
Result:
[[ 0.01360395]
[-0.66592215]
[ 0.9440652 ]
[ 0.6437994 ]] col vector
y = x[[0],:]
print y, 'row vector \n'
Result:
[[ 0.01360395 1.12130368 0.95429414 0.56827029]] row vector
# Slice with "running" index on a column
y = x[:,0:1]
print y, '\n'
Result:
[[ 0.01360395]
[-0.66592215]
[ 0.9440652 ]
[ 0.6437994 ]]
Instead if you use a single number for choosing the row/column, it will result in a 1D array, which is the root cause of your issue:
相反,如果您使用单个数字来选择行/列,则会产生一个1D数组,这是问题的根本原因:
y = x[:,0]
print y, '\n'
Result:
[ 0.01360395 -0.66592215 0.9440652 0.6437994 ]