numpy dot()和Python 3.5+矩阵乘法的区别。

时间:2021-06-24 22:33:59

I recently moved to Python 3.5 and noticed the new matrix multiplication operator (@) sometimes behaves differently from the numpy dot operator. In example, for 3d arrays:

最近我转到Python 3.5,并注意到新的矩阵乘法运算符(@)有时与numpy点操作符不同。例如,对于3d数组:

import numpy as np

a = np.random.rand(8,13,13)
b = np.random.rand(8,13,13)
c = a @ b  # Python 3.5+
d = np.dot(a, b)

The @ operator returns an array of shape:

@操作符返回一个形状数组:

c.shape
(8, 13, 13)

while the np.dot() function returns:

函数的作用是:

d.shape
(8, 13, 8, 13)

How can I reproduce the same result with numpy dot? Are there any other significant differences?

我如何用numpy点复制同样的结果?还有其他显著的区别吗?

3 个解决方案

#1


58  

The @ operator calls the array's __matmul__ method, not dot. This method is also present in the API as the function np.matmul.

@运算符调用数组的__matmul__方法,而不是点。该方法也在API中作为函数np.matmul。

>>> a = np.random.rand(8,13,13)
>>> b = np.random.rand(8,13,13)
>>> np.matmul(a, b).shape
(8, 13, 13)

From the documentation:

从文档:

matmul differs from dot in two important ways.

matmul在两个重要方面与dot不同。

  • Multiplication by scalars is not allowed.
  • 不允许用标量乘。
  • Stacks of matrices are broadcast together as if the matrices were elements.
  • 成堆的矩阵被一起广播,就好像矩阵是元素一样。

The last point makes it clear that dot and matmul methods behave differently when passed 3D (or higher dimensional) arrays. Quoting from the documentation some more:

最后一点表明,当通过3D(或更高维度)数组时,dot和matmul方法的行为是不同的。从文件中引用了更多:

For matmul:

matmul:

If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.

如果任何一个参数是N- d, N > 2,它就被当作是在最后两个索引中驻留的矩阵的堆栈,并相应地广播。

For np.dot:

np.dot:

For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b

对于二维数组,它等价于矩阵乘法,对于一维数组,它等于向量的内积(没有复杂的共轭)。对于N维,它是a和b的倒数第二个轴的和乘积。

#2


4  

The answer by @ajcr explains how the dot and matmul (invoked by the @ symbol) differ. By looking at a simple example, one clearly sees how the two behave differently when operating on 'stacks of matricies' or tensors.

@ajcr的答案解释了dot和matmul(由@符号调用)的不同之处。通过观察一个简单的例子,我们可以清楚地看到,当在“成堆的矩阵”或张量上操作时,两者的行为是不同的。

To clarify the differences take a 4x4 array and return the dot product and matmul product with a 2x4x3 'stack of matricies' or tensor.

为了澄清差异,使用4x4阵列并返回点积和matmul产品,其中包含2x4x3的矩阵或张量。

import numpy as np
fourbyfour = np.array([
                       [1,2,3,4],
                       [3,2,1,4],
                       [5,4,6,7],
                       [11,12,13,14]
                      ])


twobyfourbythree = np.array([
                             [[2,3],[11,9],[32,21],[28,17]],
                             [[2,3],[1,9],[3,21],[28,7]],
                             [[2,3],[1,9],[3,21],[28,7]],
                            ])

print('4x4*4x2x3 dot:\n {}\n'.format(np.dot(fourbyfour,twobyfourbythree)))
print('4x4*4x2x3 matmul:\n {}\n'.format(np.matmul(fourbyfour,twobyfourbythree)))

The products of each operation appear below. Notice how the dot product is,

每个操作的产品如下所示。注意点积是什么,

...a sum product over the last axis of a and the second-to-last of b

…a和b的倒数第二个轴的和乘积。

and how the matrix product is formed by broadcasting the matrix together.

矩阵乘积是如何通过将矩阵传播到一起而形成的。

4x4*4x2x3 dot:
 [[[232 152]
  [125 112]
  [125 112]]

 [[172 116]
  [123  76]
  [123  76]]

 [[442 296]
  [228 226]
  [228 226]]

 [[962 652]
  [465 512]
  [465 512]]]

4x4*4x2x3 matmul:
 [[[232 152]
  [172 116]
  [442 296]
  [962 652]]

 [[125 112]
  [123  76]
  [228 226]
  [465 512]]

 [[125 112]
  [123  76]
  [228 226]
  [465 512]]]

#3


1  

In mathematics, I think the dot in numpy makes more sense

在数学中,我认为numpy中的点更有意义。

dot(a,b)_{i,j,k,a,b,c} = \sum_m a_{i,j,k,m}b_{a,b,m,c}

点(a,b)_ { i,j,k,a,b,c } = \ sum_m现代{ i,j,k、m } b_ { a、b、m c }

since it gives the dot product when a and b are vectors, or the matrix multiplication when a and b are matrices

因为当a和b是向量时它给出了点积,或者矩阵a和b是矩阵乘法。


As for matmul operation in numpy, it consists of parts of dot result, and it can be defined as

对于numpy中的matmul操作,它由点结果的部分组成,可以定义为。

matmul(a,b)_{i,j,k,c} = \sum_m a_{i,j,k,m}b_{i,j,m,c}

matmul(a,b)_ { i,j,k、c } = \ sum_m现代{ i,j,k、m } b_ { i,j,m c }


So, you can see that matmul(a,b) returns an array with a small shape, which has smaller memory consumption and make more sense in applications. In particular, combining with broadcasting, you can get

因此,您可以看到matmul(a,b)返回一个具有小形状的数组,它的内存消耗更小,在应用程序中更有意义。特别是,结合广播,你可以得到。

matmul(a,b)_{i,j,k,l} = \sum_m a_{i,j,k,m}b_{j,m,l}

matmul(a,b)_ { i,j,k,l } = \ sum_m现代{ i,j,k、m } b_ { j,m,l }

for example.

为例。


From the above two definitions, you can see the requirements to use those two operations. Assume a.shape=(s1,s2,s3,s4) and b.shape=(t1,t2,t3,t4)

从以上两个定义中,您可以看到使用这两个操作的需求。假设a.shape =(s1,s2、s3、s4)和b.shape =(t1、t2、t3、t4)

  • To use dot(a,b) you need

    使用点(a,b)你需要。

     1. **t3=s4**;
    
  • To use matmul(a,b) you need

    使用matmul(a,b)你需要。

    1. t3=s4
    2. t3 = s4
    3. t2=s2, or one of t2 and s2 is 1
    4. t2=s2,或者t2和s2中的一个是1。
    5. t1=s1, or one of t1 and s1 is 1
    6. t1=s1,或者t1, s1 = 1。

Use the following piece of code to convince yourself.

使用下面的代码来说服自己。

Code sample

import numpy as np
for it in xrange(10000):
    a = np.random.rand(5,6,2,4)
    b = np.random.rand(6,4,3)
    c = np.matmul(a,b)
    d = np.dot(a,b)
    #print 'c shape: ', c.shape,'d shape:', d.shape

    for i in range(5):
        for j in range(6):
            for k in range(2):
                for l in range(3):
                    if not c[i,j,k,l] == d[i,j,k,j,l]:
                        print it,i,j,k,l,c[i,j,k,l]==d[i,j,k,j,l] #you will not see them

#1


58  

The @ operator calls the array's __matmul__ method, not dot. This method is also present in the API as the function np.matmul.

@运算符调用数组的__matmul__方法,而不是点。该方法也在API中作为函数np.matmul。

>>> a = np.random.rand(8,13,13)
>>> b = np.random.rand(8,13,13)
>>> np.matmul(a, b).shape
(8, 13, 13)

From the documentation:

从文档:

matmul differs from dot in two important ways.

matmul在两个重要方面与dot不同。

  • Multiplication by scalars is not allowed.
  • 不允许用标量乘。
  • Stacks of matrices are broadcast together as if the matrices were elements.
  • 成堆的矩阵被一起广播,就好像矩阵是元素一样。

The last point makes it clear that dot and matmul methods behave differently when passed 3D (or higher dimensional) arrays. Quoting from the documentation some more:

最后一点表明,当通过3D(或更高维度)数组时,dot和matmul方法的行为是不同的。从文件中引用了更多:

For matmul:

matmul:

If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.

如果任何一个参数是N- d, N > 2,它就被当作是在最后两个索引中驻留的矩阵的堆栈,并相应地广播。

For np.dot:

np.dot:

For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b

对于二维数组,它等价于矩阵乘法,对于一维数组,它等于向量的内积(没有复杂的共轭)。对于N维,它是a和b的倒数第二个轴的和乘积。

#2


4  

The answer by @ajcr explains how the dot and matmul (invoked by the @ symbol) differ. By looking at a simple example, one clearly sees how the two behave differently when operating on 'stacks of matricies' or tensors.

@ajcr的答案解释了dot和matmul(由@符号调用)的不同之处。通过观察一个简单的例子,我们可以清楚地看到,当在“成堆的矩阵”或张量上操作时,两者的行为是不同的。

To clarify the differences take a 4x4 array and return the dot product and matmul product with a 2x4x3 'stack of matricies' or tensor.

为了澄清差异,使用4x4阵列并返回点积和matmul产品,其中包含2x4x3的矩阵或张量。

import numpy as np
fourbyfour = np.array([
                       [1,2,3,4],
                       [3,2,1,4],
                       [5,4,6,7],
                       [11,12,13,14]
                      ])


twobyfourbythree = np.array([
                             [[2,3],[11,9],[32,21],[28,17]],
                             [[2,3],[1,9],[3,21],[28,7]],
                             [[2,3],[1,9],[3,21],[28,7]],
                            ])

print('4x4*4x2x3 dot:\n {}\n'.format(np.dot(fourbyfour,twobyfourbythree)))
print('4x4*4x2x3 matmul:\n {}\n'.format(np.matmul(fourbyfour,twobyfourbythree)))

The products of each operation appear below. Notice how the dot product is,

每个操作的产品如下所示。注意点积是什么,

...a sum product over the last axis of a and the second-to-last of b

…a和b的倒数第二个轴的和乘积。

and how the matrix product is formed by broadcasting the matrix together.

矩阵乘积是如何通过将矩阵传播到一起而形成的。

4x4*4x2x3 dot:
 [[[232 152]
  [125 112]
  [125 112]]

 [[172 116]
  [123  76]
  [123  76]]

 [[442 296]
  [228 226]
  [228 226]]

 [[962 652]
  [465 512]
  [465 512]]]

4x4*4x2x3 matmul:
 [[[232 152]
  [172 116]
  [442 296]
  [962 652]]

 [[125 112]
  [123  76]
  [228 226]
  [465 512]]

 [[125 112]
  [123  76]
  [228 226]
  [465 512]]]

#3


1  

In mathematics, I think the dot in numpy makes more sense

在数学中,我认为numpy中的点更有意义。

dot(a,b)_{i,j,k,a,b,c} = \sum_m a_{i,j,k,m}b_{a,b,m,c}

点(a,b)_ { i,j,k,a,b,c } = \ sum_m现代{ i,j,k、m } b_ { a、b、m c }

since it gives the dot product when a and b are vectors, or the matrix multiplication when a and b are matrices

因为当a和b是向量时它给出了点积,或者矩阵a和b是矩阵乘法。


As for matmul operation in numpy, it consists of parts of dot result, and it can be defined as

对于numpy中的matmul操作,它由点结果的部分组成,可以定义为。

matmul(a,b)_{i,j,k,c} = \sum_m a_{i,j,k,m}b_{i,j,m,c}

matmul(a,b)_ { i,j,k、c } = \ sum_m现代{ i,j,k、m } b_ { i,j,m c }


So, you can see that matmul(a,b) returns an array with a small shape, which has smaller memory consumption and make more sense in applications. In particular, combining with broadcasting, you can get

因此,您可以看到matmul(a,b)返回一个具有小形状的数组,它的内存消耗更小,在应用程序中更有意义。特别是,结合广播,你可以得到。

matmul(a,b)_{i,j,k,l} = \sum_m a_{i,j,k,m}b_{j,m,l}

matmul(a,b)_ { i,j,k,l } = \ sum_m现代{ i,j,k、m } b_ { j,m,l }

for example.

为例。


From the above two definitions, you can see the requirements to use those two operations. Assume a.shape=(s1,s2,s3,s4) and b.shape=(t1,t2,t3,t4)

从以上两个定义中,您可以看到使用这两个操作的需求。假设a.shape =(s1,s2、s3、s4)和b.shape =(t1、t2、t3、t4)

  • To use dot(a,b) you need

    使用点(a,b)你需要。

     1. **t3=s4**;
    
  • To use matmul(a,b) you need

    使用matmul(a,b)你需要。

    1. t3=s4
    2. t3 = s4
    3. t2=s2, or one of t2 and s2 is 1
    4. t2=s2,或者t2和s2中的一个是1。
    5. t1=s1, or one of t1 and s1 is 1
    6. t1=s1,或者t1, s1 = 1。

Use the following piece of code to convince yourself.

使用下面的代码来说服自己。

Code sample

import numpy as np
for it in xrange(10000):
    a = np.random.rand(5,6,2,4)
    b = np.random.rand(6,4,3)
    c = np.matmul(a,b)
    d = np.dot(a,b)
    #print 'c shape: ', c.shape,'d shape:', d.shape

    for i in range(5):
        for j in range(6):
            for k in range(2):
                for l in range(3):
                    if not c[i,j,k,l] == d[i,j,k,j,l]:
                        print it,i,j,k,l,c[i,j,k,l]==d[i,j,k,j,l] #you will not see them