numpy dot()和Python 3.5+矩阵乘法的区别。

I recently moved to Python 3.5 and noticed the new matrix multiplication operator (@) sometimes behaves differently from the numpy dot operator. In example, for 3d arrays:

最近我转到Python 3.5，并注意到新的矩阵乘法运算符(@)有时与numpy点操作符不同。例如，对于3d数组:

import numpy as np

a = np.random.rand(8,13,13)
b = np.random.rand(8,13,13)
c = a @ b  # Python 3.5+
d = np.dot(a, b)

The @ operator returns an array of shape:

@操作符返回一个形状数组:

c.shape
(8, 13, 13)

while the np.dot() function returns:

函数的作用是:

d.shape
(8, 13, 8, 13)

How can I reproduce the same result with numpy dot? Are there any other significant differences?

我如何用numpy点复制同样的结果?还有其他显著的区别吗?

3 个解决方案

#1

The @ operator calls the array's __matmul__ method, not dot. This method is also present in the API as the function np.matmul.

@运算符调用数组的__matmul__方法，而不是点。该方法也在API中作为函数np.matmul。

>>> a = np.random.rand(8,13,13)
>>> b = np.random.rand(8,13,13)
>>> np.matmul(a, b).shape
(8, 13, 13)

From the documentation:

从文档:

matmul differs from dot in two important ways.

matmul在两个重要方面与dot不同。

Multiplication by scalars is not allowed.

不允许用标量乘。

Stacks of matrices are broadcast together as if the matrices were elements.

成堆的矩阵被一起广播，就好像矩阵是元素一样。

The last point makes it clear that dot and matmul methods behave differently when passed 3D (or higher dimensional) arrays. Quoting from the documentation some more:

最后一点表明，当通过3D(或更高维度)数组时，dot和matmul方法的行为是不同的。从文件中引用了更多:

For matmul:

matmul:

If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.

如果任何一个参数是N- d, N > 2，它就被当作是在最后两个索引中驻留的矩阵的堆栈，并相应地广播。

For np.dot:

np.dot:

For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b

对于二维数组，它等价于矩阵乘法，对于一维数组，它等于向量的内积(没有复杂的共轭)。对于N维，它是a和b的倒数第二个轴的和乘积。

#2

The answer by @ajcr explains how the dot and matmul (invoked by the @ symbol) differ. By looking at a simple example, one clearly sees how the two behave differently when operating on 'stacks of matricies' or tensors.

@ajcr的答案解释了dot和matmul(由@符号调用)的不同之处。通过观察一个简单的例子，我们可以清楚地看到，当在“成堆的矩阵”或张量上操作时，两者的行为是不同的。

To clarify the differences take a 4x4 array and return the dot product and matmul product with a 2x4x3 'stack of matricies' or tensor.

为了澄清差异，使用4x4阵列并返回点积和matmul产品，其中包含2x4x3的矩阵或张量。

import numpy as np
fourbyfour = np.array([
                       [1,2,3,4],
                       [3,2,1,4],
                       [5,4,6,7],
                       [11,12,13,14]
                      ])


twobyfourbythree = np.array([
                             [[2,3],[11,9],[32,21],[28,17]],
                             [[2,3],[1,9],[3,21],[28,7]],
                             [[2,3],[1,9],[3,21],[28,7]],
                            ])

print('4x4*4x2x3 dot:\n {}\n'.format(np.dot(fourbyfour,twobyfourbythree)))
print('4x4*4x2x3 matmul:\n {}\n'.format(np.matmul(fourbyfour,twobyfourbythree)))

The products of each operation appear below. Notice how the dot product is,

每个操作的产品如下所示。注意点积是什么，

...a sum product over the last axis of a and the second-to-last of b

…a和b的倒数第二个轴的和乘积。

and how the matrix product is formed by broadcasting the matrix together.

矩阵乘积是如何通过将矩阵传播到一起而形成的。

4x4*4x2x3 dot:
 [[[232 152]
  [125 112]
  [125 112]]

 [[172 116]
  [123  76]
  [123  76]]

 [[442 296]
  [228 226]
  [228 226]]

 [[962 652]
  [465 512]
  [465 512]]]

4x4*4x2x3 matmul:
 [[[232 152]
  [172 116]
  [442 296]
  [962 652]]

 [[125 112]
  [123  76]
  [228 226]
  [465 512]]

 [[125 112]
  [123  76]
  [228 226]
  [465 512]]]

#3

In mathematics, I think the dot in numpy makes more sense

在数学中，我认为numpy中的点更有意义。

dot(a,b)_{i,j,k,a,b,c} = \sum_m a_{i,j,k,m}b_{a,b,m,c}

点(a,b)_ { i,j,k,a,b,c } = \ sum_m现代{ i,j,k、m } b_ { a、b、m c }

since it gives the dot product when a and b are vectors, or the matrix multiplication when a and b are matrices

因为当a和b是向量时它给出了点积，或者矩阵a和b是矩阵乘法。

As for matmul operation in numpy, it consists of parts of dot result, and it can be defined as

对于numpy中的matmul操作，它由点结果的部分组成，可以定义为。

matmul(a,b)_{i,j,k,c} = \sum_m a_{i,j,k,m}b_{i,j,m,c}

matmul(a,b)_ { i,j,k、c } = \ sum_m现代{ i,j,k、m } b_ { i,j,m c }

So, you can see that matmul(a,b) returns an array with a small shape, which has smaller memory consumption and make more sense in applications. In particular, combining with broadcasting, you can get

因此，您可以看到matmul(a,b)返回一个具有小形状的数组，它的内存消耗更小，在应用程序中更有意义。特别是，结合广播，你可以得到。

matmul(a,b)_{i,j,k,l} = \sum_m a_{i,j,k,m}b_{j,m,l}

matmul(a,b)_ { i,j,k,l } = \ sum_m现代{ i,j,k、m } b_ { j,m,l }

for example.

为例。

From the above two definitions, you can see the requirements to use those two operations. Assume a.shape=(s1,s2,s3,s4) and b.shape=(t1,t2,t3,t4)

从以上两个定义中，您可以看到使用这两个操作的需求。假设a.shape =(s1,s2、s3、s4)和b.shape =(t1、t2、t3、t4)

To use dot(a,b) you need

使用点(a,b)你需要。
```
 1. **t3=s4**;
```
To use matmul(a,b) you need

使用matmul(a,b)你需要。
1. t3=s4
2. t3 = s4
3. t2=s2, or one of t2 and s2 is 1
4. t2=s2，或者t2和s2中的一个是1。
5. t1=s1, or one of t1 and s1 is 1
6. t1=s1，或者t1, s1 = 1。

Use the following piece of code to convince yourself.

使用下面的代码来说服自己。

Code sample

import numpy as np
for it in xrange(10000):
    a = np.random.rand(5,6,2,4)
    b = np.random.rand(6,4,3)
    c = np.matmul(a,b)
    d = np.dot(a,b)
    #print 'c shape: ', c.shape,'d shape:', d.shape

    for i in range(5):
        for j in range(6):
            for k in range(2):
                for l in range(3):
                    if not c[i,j,k,l] == d[i,j,k,j,l]:
                        print it,i,j,k,l,c[i,j,k,l]==d[i,j,k,j,l] #you will not see them

#1

The @ operator calls the array's __matmul__ method, not dot. This method is also present in the API as the function np.matmul.

@运算符调用数组的__matmul__方法，而不是点。该方法也在API中作为函数np.matmul。

>>> a = np.random.rand(8,13,13)
>>> b = np.random.rand(8,13,13)
>>> np.matmul(a, b).shape
(8, 13, 13)

From the documentation:

从文档:

matmul differs from dot in two important ways.

matmul在两个重要方面与dot不同。

Multiplication by scalars is not allowed.

不允许用标量乘。

Stacks of matrices are broadcast together as if the matrices were elements.

成堆的矩阵被一起广播，就好像矩阵是元素一样。

The last point makes it clear that dot and matmul methods behave differently when passed 3D (or higher dimensional) arrays. Quoting from the documentation some more:

最后一点表明，当通过3D(或更高维度)数组时，dot和matmul方法的行为是不同的。从文件中引用了更多:

For matmul:

matmul:

If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.

如果任何一个参数是N- d, N > 2，它就被当作是在最后两个索引中驻留的矩阵的堆栈，并相应地广播。

For np.dot:

np.dot:

For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b

对于二维数组，它等价于矩阵乘法，对于一维数组，它等于向量的内积(没有复杂的共轭)。对于N维，它是a和b的倒数第二个轴的和乘积。

#2

To clarify the differences take a 4x4 array and return the dot product and matmul product with a 2x4x3 'stack of matricies' or tensor.

为了澄清差异，使用4x4阵列并返回点积和matmul产品，其中包含2x4x3的矩阵或张量。

import numpy as np
fourbyfour = np.array([
                       [1,2,3,4],
                       [3,2,1,4],
                       [5,4,6,7],
                       [11,12,13,14]
                      ])


twobyfourbythree = np.array([
                             [[2,3],[11,9],[32,21],[28,17]],
                             [[2,3],[1,9],[3,21],[28,7]],
                             [[2,3],[1,9],[3,21],[28,7]],
                            ])

print('4x4*4x2x3 dot:\n {}\n'.format(np.dot(fourbyfour,twobyfourbythree)))
print('4x4*4x2x3 matmul:\n {}\n'.format(np.matmul(fourbyfour,twobyfourbythree)))

The products of each operation appear below. Notice how the dot product is,

每个操作的产品如下所示。注意点积是什么，

...a sum product over the last axis of a and the second-to-last of b

…a和b的倒数第二个轴的和乘积。

and how the matrix product is formed by broadcasting the matrix together.

矩阵乘积是如何通过将矩阵传播到一起而形成的。

4x4*4x2x3 dot:
 [[[232 152]
  [125 112]
  [125 112]]

 [[172 116]
  [123  76]
  [123  76]]

 [[442 296]
  [228 226]
  [228 226]]

 [[962 652]
  [465 512]
  [465 512]]]

4x4*4x2x3 matmul:
 [[[232 152]
  [172 116]
  [442 296]
  [962 652]]

 [[125 112]
  [123  76]
  [228 226]
  [465 512]]

 [[125 112]
  [123  76]
  [228 226]
  [465 512]]]

#3

In mathematics, I think the dot in numpy makes more sense

在数学中，我认为numpy中的点更有意义。

dot(a,b)_{i,j,k,a,b,c} = \sum_m a_{i,j,k,m}b_{a,b,m,c}

点(a,b)_ { i,j,k,a,b,c } = \ sum_m现代{ i,j,k、m } b_ { a、b、m c }

since it gives the dot product when a and b are vectors, or the matrix multiplication when a and b are matrices

因为当a和b是向量时它给出了点积，或者矩阵a和b是矩阵乘法。

As for matmul operation in numpy, it consists of parts of dot result, and it can be defined as

对于numpy中的matmul操作，它由点结果的部分组成，可以定义为。

matmul(a,b)_{i,j,k,c} = \sum_m a_{i,j,k,m}b_{i,j,m,c}

matmul(a,b)_ { i,j,k、c } = \ sum_m现代{ i,j,k、m } b_ { i,j,m c }

因此，您可以看到matmul(a,b)返回一个具有小形状的数组，它的内存消耗更小，在应用程序中更有意义。特别是，结合广播，你可以得到。

matmul(a,b)_{i,j,k,l} = \sum_m a_{i,j,k,m}b_{j,m,l}

matmul(a,b)_ { i,j,k,l } = \ sum_m现代{ i,j,k、m } b_ { j,m,l }

for example.

为例。

From the above two definitions, you can see the requirements to use those two operations. Assume a.shape=(s1,s2,s3,s4) and b.shape=(t1,t2,t3,t4)

从以上两个定义中，您可以看到使用这两个操作的需求。假设a.shape =(s1,s2、s3、s4)和b.shape =(t1、t2、t3、t4)

To use dot(a,b) you need

使用点(a,b)你需要。
```
 1. **t3=s4**;
```
To use matmul(a,b) you need

使用matmul(a,b)你需要。
1. t3=s4
2. t3 = s4
3. t2=s2, or one of t2 and s2 is 1
4. t2=s2，或者t2和s2中的一个是1。
5. t1=s1, or one of t1 and s1 is 1
6. t1=s1，或者t1, s1 = 1。

Use the following piece of code to convince yourself.

使用下面的代码来说服自己。

Code sample

import numpy as np
for it in xrange(10000):
    a = np.random.rand(5,6,2,4)
    b = np.random.rand(6,4,3)
    c = np.matmul(a,b)
    d = np.dot(a,b)
    #print 'c shape: ', c.shape,'d shape:', d.shape

    for i in range(5):
        for j in range(6):
            for k in range(2):
                for l in range(3):
                    if not c[i,j,k,l] == d[i,j,k,j,l]:
                        print it,i,j,k,l,c[i,j,k,l]==d[i,j,k,j,l] #you will not see them

秒客网

numpy dot()和Python 3.5+矩阵乘法的区别。

3 个解决方案

#1

#2

#3

Code sample

#1

#2

#3

Code sample

相关文章