无法确定包含转置操作的循环中numpy数组的形状

时间:2021-03-10 21:40:21

I have been trying to create a small neural network to learn softmax function with an article from the following website: https://mlxai.github.io/2017/01/09/implementing-softmax-classifier-with-vectorized-operations.html

我一直在尝试使用以下网站上的文章创建一个小型神经网络来学习softmax函数:https://mlxai.github.io/2017/01/09/implementing-softmax-classifier-with-vectorized-operations。 HTML

It works well for a single iteration. But, when I create a loop for training the network with updated weights, I get the following error: ValueError: operands could not be broadcast together with shapes (5,10) (1,5) (5,10). I have attached a screenshot of the output here.无法确定包含转置操作的循环中numpy数组的形状

它适用于单次迭代。但是,当我创建一个用更新权重训练网络的循环时,我得到以下错误:ValueError:操作数不能与形状(5,10)(1,5)(5,10)一起广播。我在这里附上了输出的截图。

Debugging this issue, I found out that np.max() returns array of shape (5,1) and (1,5) at different iterations even though the axis is being set to 1. Please help me in identifying what went wrong in the following code.

调试这个问题,我发现np.max()在不同的迭代中返回shape(5,1)和(1,5)的数组,即使轴被设置为1.请帮我识别出错了什么以下代码。

import numpy as np

N = 5
D = 10
C = 10

W = np.random.rand(D,C)
X = np.random.randint(255, size = (N,D))
X = X/255
y = np.random.randint(C, size = (N))
#print (y)
lr = 0.1

for i in range(100):
  print (i)
  loss = 0.0
  dW = np.zeros_like(W)
  N = X.shape[0]
  C = W.shape[1]

  f = X.dot(W)
  #print (f)

  print (np.matrix(np.max(f, axis=1)))
  print (np.matrix(np.max(f, axis=1)).T)
  f -= np.matrix(np.max(f, axis=1)).T
  #print (f)  

  term1 = -f[np.arange(N), y]
  sum_j = np.sum(np.exp(f), axis=1)
  term2 = np.log(sum_j)
  loss = term1 + term2
  loss /= N 
  loss += 0.5 * reg * np.sum(W * W)
  #print (loss)

  coef = np.exp(f) / np.matrix(sum_j).T
  coef[np.arange(N),y] -= 1
  dW = X.T.dot(coef)
  dW /= N
  dW += reg*W

  W = W - lr*dW

1 个解决方案

#1


3  

In your first iteration, W is an instance of np.ndarray with shape (D, C). f inherits ndarray, so when you do np.max(f, axis = 1), it returns a an ndarray of shape (D,), which np.matrix() turns into shape (1, D) which is then transposed to (D, 1)

在第一次迭代中,W是具有形状(D,C)的np.ndarray的实例。 f继承ndarray,所以当你做np.max(f,axis = 1)时,它会返回一个形状为ndarray(D,),np.matrix()变成shape(1,D),然后转换为(D,1)

But on your following iterations, W is an instance of np.matrix (which it inherits from dW in W = W - lr*dW). f then inherits np.matrix, and np.max(f, axis = 1) returns a np.matrix of shape (D, 1), which passes through np.matrix() unphased and turns into shape (1, D) after .T

但是在接下来的迭代中,W是np.matrix的一个实例(它在W = W - lr * dW中从dW继承)。然后f继承np.matrix,并且np.max(f,axis = 1)返回一个形状为(n,1)的np.matrix,它通过np.matrix()非相位并变为形状(1,D)之后.T

To fix this, make sure you don't mix np.ndarray with np.matrix. Either define everything as np.matrix from the start (i.e. W = np.matrix(np.random.rand(D,C))) or use keepdims to maintain your axes like:

要解决此问题,请确保不要将np.ndarray与np.matrix混合使用。从一开始就将所有内容定义为np.matrix(即W = np.matrix(np.random.rand(D,C)))或使用keepdims来维护你的轴:

f -= np.max(f, axis = 1, keepdims = True)

which will let you keep everything 2D without needing to cast to np.matrix.(also do this for sum_j)

这将让你保留所有2D而无需转换为np.matrix。(也为sum_j执行此操作)

#1


3  

In your first iteration, W is an instance of np.ndarray with shape (D, C). f inherits ndarray, so when you do np.max(f, axis = 1), it returns a an ndarray of shape (D,), which np.matrix() turns into shape (1, D) which is then transposed to (D, 1)

在第一次迭代中,W是具有形状(D,C)的np.ndarray的实例。 f继承ndarray,所以当你做np.max(f,axis = 1)时,它会返回一个形状为ndarray(D,),np.matrix()变成shape(1,D),然后转换为(D,1)

But on your following iterations, W is an instance of np.matrix (which it inherits from dW in W = W - lr*dW). f then inherits np.matrix, and np.max(f, axis = 1) returns a np.matrix of shape (D, 1), which passes through np.matrix() unphased and turns into shape (1, D) after .T

但是在接下来的迭代中,W是np.matrix的一个实例(它在W = W - lr * dW中从dW继承)。然后f继承np.matrix,并且np.max(f,axis = 1)返回一个形状为(n,1)的np.matrix,它通过np.matrix()非相位并变为形状(1,D)之后.T

To fix this, make sure you don't mix np.ndarray with np.matrix. Either define everything as np.matrix from the start (i.e. W = np.matrix(np.random.rand(D,C))) or use keepdims to maintain your axes like:

要解决此问题,请确保不要将np.ndarray与np.matrix混合使用。从一开始就将所有内容定义为np.matrix(即W = np.matrix(np.random.rand(D,C)))或使用keepdims来维护你的轴:

f -= np.max(f, axis = 1, keepdims = True)

which will let you keep everything 2D without needing to cast to np.matrix.(also do this for sum_j)

这将让你保留所有2D而无需转换为np.matrix。(也为sum_j执行此操作)