I recently moved to Python 3.5 and noticed the new matrix multiplication operator (@) sometimes behaves differently from the numpy dot operator. In example, for 3d arrays:
最近我转到Python 3.5,并注意到新的矩阵乘法运算符(@)有时与numpy点操作符不同。例如,对于3d数组:
import numpy as np
a = np.random.rand(8,13,13)
b = np.random.rand(8,13,13)
c = a @ b # Python 3.5+
d = np.dot(a, b)
The @
operator returns an array of shape:
@操作符返回一个形状数组:
c.shape
(8, 13, 13)
while the np.dot()
function returns:
函数的作用是:
d.shape
(8, 13, 8, 13)
How can I reproduce the same result with numpy dot? Are there any other significant differences?
我如何用numpy点复制同样的结果?还有其他显著的区别吗?
3 个解决方案
#1
58
The @
operator calls the array's __matmul__
method, not dot
. This method is also present in the API as the function np.matmul
.
@运算符调用数组的__matmul__方法,而不是点。该方法也在API中作为函数np.matmul。
>>> a = np.random.rand(8,13,13)
>>> b = np.random.rand(8,13,13)
>>> np.matmul(a, b).shape
(8, 13, 13)
From the documentation:
从文档:
matmul
differs fromdot
in two important ways.matmul在两个重要方面与dot不同。
- Multiplication by scalars is not allowed.
- 不允许用标量乘。
- Stacks of matrices are broadcast together as if the matrices were elements.
- 成堆的矩阵被一起广播,就好像矩阵是元素一样。
The last point makes it clear that dot
and matmul
methods behave differently when passed 3D (or higher dimensional) arrays. Quoting from the documentation some more:
最后一点表明,当通过3D(或更高维度)数组时,dot和matmul方法的行为是不同的。从文件中引用了更多:
For matmul
:
matmul:
If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
如果任何一个参数是N- d, N > 2,它就被当作是在最后两个索引中驻留的矩阵的堆栈,并相应地广播。
For np.dot
:
np.dot:
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b
对于二维数组,它等价于矩阵乘法,对于一维数组,它等于向量的内积(没有复杂的共轭)。对于N维,它是a和b的倒数第二个轴的和乘积。
#2
4
The answer by @ajcr explains how the dot
and matmul
(invoked by the @
symbol) differ. By looking at a simple example, one clearly sees how the two behave differently when operating on 'stacks of matricies' or tensors.
@ajcr的答案解释了dot和matmul(由@符号调用)的不同之处。通过观察一个简单的例子,我们可以清楚地看到,当在“成堆的矩阵”或张量上操作时,两者的行为是不同的。
To clarify the differences take a 4x4 array and return the dot
product and matmul
product with a 2x4x3 'stack of matricies' or tensor.
为了澄清差异,使用4x4阵列并返回点积和matmul产品,其中包含2x4x3的矩阵或张量。
import numpy as np
fourbyfour = np.array([
[1,2,3,4],
[3,2,1,4],
[5,4,6,7],
[11,12,13,14]
])
twobyfourbythree = np.array([
[[2,3],[11,9],[32,21],[28,17]],
[[2,3],[1,9],[3,21],[28,7]],
[[2,3],[1,9],[3,21],[28,7]],
])
print('4x4*4x2x3 dot:\n {}\n'.format(np.dot(fourbyfour,twobyfourbythree)))
print('4x4*4x2x3 matmul:\n {}\n'.format(np.matmul(fourbyfour,twobyfourbythree)))
The products of each operation appear below. Notice how the dot product is,
每个操作的产品如下所示。注意点积是什么,
...a sum product over the last axis of a and the second-to-last of b
…a和b的倒数第二个轴的和乘积。
and how the matrix product is formed by broadcasting the matrix together.
矩阵乘积是如何通过将矩阵传播到一起而形成的。
4x4*4x2x3 dot:
[[[232 152]
[125 112]
[125 112]]
[[172 116]
[123 76]
[123 76]]
[[442 296]
[228 226]
[228 226]]
[[962 652]
[465 512]
[465 512]]]
4x4*4x2x3 matmul:
[[[232 152]
[172 116]
[442 296]
[962 652]]
[[125 112]
[123 76]
[228 226]
[465 512]]
[[125 112]
[123 76]
[228 226]
[465 512]]]
#3
1
In mathematics, I think the dot in numpy makes more sense
在数学中,我认为numpy中的点更有意义。
dot(a,b)_{i,j,k,a,b,c} = \sum_m a_{i,j,k,m}b_{a,b,m,c}
点(a,b)_ { i,j,k,a,b,c } = \ sum_m现代{ i,j,k、m } b_ { a、b、m c }
since it gives the dot product when a and b are vectors, or the matrix multiplication when a and b are matrices
因为当a和b是向量时它给出了点积,或者矩阵a和b是矩阵乘法。
As for matmul operation in numpy, it consists of parts of dot result, and it can be defined as
对于numpy中的matmul操作,它由点结果的部分组成,可以定义为。
matmul(a,b)_{i,j,k,c} = \sum_m a_{i,j,k,m}b_{i,j,m,c}
matmul(a,b)_ { i,j,k、c } = \ sum_m现代{ i,j,k、m } b_ { i,j,m c }
So, you can see that matmul(a,b) returns an array with a small shape, which has smaller memory consumption and make more sense in applications. In particular, combining with broadcasting, you can get
因此,您可以看到matmul(a,b)返回一个具有小形状的数组,它的内存消耗更小,在应用程序中更有意义。特别是,结合广播,你可以得到。
matmul(a,b)_{i,j,k,l} = \sum_m a_{i,j,k,m}b_{j,m,l}
matmul(a,b)_ { i,j,k,l } = \ sum_m现代{ i,j,k、m } b_ { j,m,l }
for example.
为例。
From the above two definitions, you can see the requirements to use those two operations. Assume a.shape=(s1,s2,s3,s4) and b.shape=(t1,t2,t3,t4)
从以上两个定义中,您可以看到使用这两个操作的需求。假设a.shape =(s1,s2、s3、s4)和b.shape =(t1、t2、t3、t4)
-
To use dot(a,b) you need
使用点(a,b)你需要。
1. **t3=s4**;
-
To use matmul(a,b) you need
使用matmul(a,b)你需要。
- t3=s4
- t3 = s4
- t2=s2, or one of t2 and s2 is 1
- t2=s2,或者t2和s2中的一个是1。
- t1=s1, or one of t1 and s1 is 1
- t1=s1,或者t1, s1 = 1。
Use the following piece of code to convince yourself.
使用下面的代码来说服自己。
Code sample
import numpy as np
for it in xrange(10000):
a = np.random.rand(5,6,2,4)
b = np.random.rand(6,4,3)
c = np.matmul(a,b)
d = np.dot(a,b)
#print 'c shape: ', c.shape,'d shape:', d.shape
for i in range(5):
for j in range(6):
for k in range(2):
for l in range(3):
if not c[i,j,k,l] == d[i,j,k,j,l]:
print it,i,j,k,l,c[i,j,k,l]==d[i,j,k,j,l] #you will not see them
#1
58
The @
operator calls the array's __matmul__
method, not dot
. This method is also present in the API as the function np.matmul
.
@运算符调用数组的__matmul__方法,而不是点。该方法也在API中作为函数np.matmul。
>>> a = np.random.rand(8,13,13)
>>> b = np.random.rand(8,13,13)
>>> np.matmul(a, b).shape
(8, 13, 13)
From the documentation:
从文档:
matmul
differs fromdot
in two important ways.matmul在两个重要方面与dot不同。
- Multiplication by scalars is not allowed.
- 不允许用标量乘。
- Stacks of matrices are broadcast together as if the matrices were elements.
- 成堆的矩阵被一起广播,就好像矩阵是元素一样。
The last point makes it clear that dot
and matmul
methods behave differently when passed 3D (or higher dimensional) arrays. Quoting from the documentation some more:
最后一点表明,当通过3D(或更高维度)数组时,dot和matmul方法的行为是不同的。从文件中引用了更多:
For matmul
:
matmul:
If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
如果任何一个参数是N- d, N > 2,它就被当作是在最后两个索引中驻留的矩阵的堆栈,并相应地广播。
For np.dot
:
np.dot:
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b
对于二维数组,它等价于矩阵乘法,对于一维数组,它等于向量的内积(没有复杂的共轭)。对于N维,它是a和b的倒数第二个轴的和乘积。
#2
4
The answer by @ajcr explains how the dot
and matmul
(invoked by the @
symbol) differ. By looking at a simple example, one clearly sees how the two behave differently when operating on 'stacks of matricies' or tensors.
@ajcr的答案解释了dot和matmul(由@符号调用)的不同之处。通过观察一个简单的例子,我们可以清楚地看到,当在“成堆的矩阵”或张量上操作时,两者的行为是不同的。
To clarify the differences take a 4x4 array and return the dot
product and matmul
product with a 2x4x3 'stack of matricies' or tensor.
为了澄清差异,使用4x4阵列并返回点积和matmul产品,其中包含2x4x3的矩阵或张量。
import numpy as np
fourbyfour = np.array([
[1,2,3,4],
[3,2,1,4],
[5,4,6,7],
[11,12,13,14]
])
twobyfourbythree = np.array([
[[2,3],[11,9],[32,21],[28,17]],
[[2,3],[1,9],[3,21],[28,7]],
[[2,3],[1,9],[3,21],[28,7]],
])
print('4x4*4x2x3 dot:\n {}\n'.format(np.dot(fourbyfour,twobyfourbythree)))
print('4x4*4x2x3 matmul:\n {}\n'.format(np.matmul(fourbyfour,twobyfourbythree)))
The products of each operation appear below. Notice how the dot product is,
每个操作的产品如下所示。注意点积是什么,
...a sum product over the last axis of a and the second-to-last of b
…a和b的倒数第二个轴的和乘积。
and how the matrix product is formed by broadcasting the matrix together.
矩阵乘积是如何通过将矩阵传播到一起而形成的。
4x4*4x2x3 dot:
[[[232 152]
[125 112]
[125 112]]
[[172 116]
[123 76]
[123 76]]
[[442 296]
[228 226]
[228 226]]
[[962 652]
[465 512]
[465 512]]]
4x4*4x2x3 matmul:
[[[232 152]
[172 116]
[442 296]
[962 652]]
[[125 112]
[123 76]
[228 226]
[465 512]]
[[125 112]
[123 76]
[228 226]
[465 512]]]
#3
1
In mathematics, I think the dot in numpy makes more sense
在数学中,我认为numpy中的点更有意义。
dot(a,b)_{i,j,k,a,b,c} = \sum_m a_{i,j,k,m}b_{a,b,m,c}
点(a,b)_ { i,j,k,a,b,c } = \ sum_m现代{ i,j,k、m } b_ { a、b、m c }
since it gives the dot product when a and b are vectors, or the matrix multiplication when a and b are matrices
因为当a和b是向量时它给出了点积,或者矩阵a和b是矩阵乘法。
As for matmul operation in numpy, it consists of parts of dot result, and it can be defined as
对于numpy中的matmul操作,它由点结果的部分组成,可以定义为。
matmul(a,b)_{i,j,k,c} = \sum_m a_{i,j,k,m}b_{i,j,m,c}
matmul(a,b)_ { i,j,k、c } = \ sum_m现代{ i,j,k、m } b_ { i,j,m c }
So, you can see that matmul(a,b) returns an array with a small shape, which has smaller memory consumption and make more sense in applications. In particular, combining with broadcasting, you can get
因此,您可以看到matmul(a,b)返回一个具有小形状的数组,它的内存消耗更小,在应用程序中更有意义。特别是,结合广播,你可以得到。
matmul(a,b)_{i,j,k,l} = \sum_m a_{i,j,k,m}b_{j,m,l}
matmul(a,b)_ { i,j,k,l } = \ sum_m现代{ i,j,k、m } b_ { j,m,l }
for example.
为例。
From the above two definitions, you can see the requirements to use those two operations. Assume a.shape=(s1,s2,s3,s4) and b.shape=(t1,t2,t3,t4)
从以上两个定义中,您可以看到使用这两个操作的需求。假设a.shape =(s1,s2、s3、s4)和b.shape =(t1、t2、t3、t4)
-
To use dot(a,b) you need
使用点(a,b)你需要。
1. **t3=s4**;
-
To use matmul(a,b) you need
使用matmul(a,b)你需要。
- t3=s4
- t3 = s4
- t2=s2, or one of t2 and s2 is 1
- t2=s2,或者t2和s2中的一个是1。
- t1=s1, or one of t1 and s1 is 1
- t1=s1,或者t1, s1 = 1。
Use the following piece of code to convince yourself.
使用下面的代码来说服自己。
Code sample
import numpy as np
for it in xrange(10000):
a = np.random.rand(5,6,2,4)
b = np.random.rand(6,4,3)
c = np.matmul(a,b)
d = np.dot(a,b)
#print 'c shape: ', c.shape,'d shape:', d.shape
for i in range(5):
for j in range(6):
for k in range(2):
for l in range(3):
if not c[i,j,k,l] == d[i,j,k,j,l]:
print it,i,j,k,l,c[i,j,k,l]==d[i,j,k,j,l] #you will not see them