MATLAB:如何将两个矩阵的矩阵相乘?

时间:2021-09-04 12:05:24

I have two 3-dimensional arrays, the first two dimensions of which represent matrices and the last one counts through a parameterspace, as a simple example take

我有两个三维数组,前两个维度表示矩阵,最后一个是通过一个参数空间计算的,这是一个简单的例子。

A = repmat([1,2; 3,4], [1 1 4]);

(but assume A(:,:,j) is different for each j). How can one easily perform a per-j matrix multiplication of two such matrix-arrays A and B?

(但是假设A(:, j)对于每个j都是不同的),一个人如何能轻松地执行一个矩阵乘法,两个矩阵A和B?

C = A; % pre-allocate, nan(size(A,1), size(B,2)) would be better but slower
for jj = 1:size(A, 3)
  C(:,:,jj) = A(:,:,jj) * B(:,:,jj);
end

certainly does the job, but if the third dimension is more like 1e3 elements this is very slow since it doesn't use MATLAB's vectorization. So, is there a faster way?

当然可以,但是如果第三个维度更像1e3元素,这就非常慢了,因为它不使用MATLAB的矢量化。那么,有更快的方法吗?

5 个解决方案

#1


3  

I highly recommend you use the MMX toolbox of matlab. It can multiply n-dimensional matrices as fast as possible.

我强烈推荐你使用matlab的MMX工具箱。它可以使n维矩阵尽可能快。

The advantages of MMX are:

MMX的优点是:

  1. It is easy to use.
  2. 它很容易使用。
  3. Multiply n-dimensional matrices (actually it can multiply arrays of 2-D matrices)
  4. n维矩阵相乘(实际上它可以将二维矩阵的数组相乘)
  5. It performs other matrix operations (transpose, Quadratic Multiply, Chol decomposition and more)
  6. 它执行其他矩阵运算(转置、二次乘、Chol分解等)
  7. It uses C compiler and multi-thread computation for speed up.
  8. 它使用C编译器和多线程计算来加速。

For this problem, you just need to write this command:

对于这个问题,您只需要编写这个命令:

C=mmx('mul',A,B);

I added the following function to @Amro's answer

我将以下函数添加到@Amro的答案。

%# mmx toolbox
function C=func6(A,B,n,m,p)
    C=mmx('mul',A,B);
end

I got this result for n=2,m=2,p=1e5:

我得到了n=2的结果,m=2,p=1e5:

    1.6571 # FOR-loop
    4.3110 # ARRAYFUN
    3.3731 # NUM2CELL/FOR-loop/CELL2MAT
    2.9820 # NUM2CELL/CELLFUN/CELL2MAT
    0.0244 # Loop Unrolling
    0.0221 # MMX toolbox  <===================

I used @Amro's code to run the benchmark.

我使用了@Amro的代码来运行基准测试。

#2


6  

I did some timing tests now, the fastest way for 2x2xN turns out to be calculating the matrix elements:

我做了一些计时测试,2x2xN最快的方法是计算矩阵元素:

C = A;
C(1,1,:) = A(1,1,:).*B(1,1,:) + A(1,2,:).*B(2,1,:);
C(1,2,:) = A(1,1,:).*B(1,2,:) + A(1,2,:).*B(2,2,:);
C(2,1,:) = A(2,1,:).*B(1,1,:) + A(2,2,:).*B(2,1,:);
C(2,2,:) = A(2,1,:).*B(1,2,:) + A(2,2,:).*B(2,2,:);

In the general case it turns out the for loop is actually the fastest (don't forget to pre-allocate C though!).

一般情况下,for循环实际上是最快的(不要忘记预先分配C)。

Should one already have the result as cell-array of matrices though, using cellfun is the fastest choice, it is also faster than looping over the cell elements:

如果一个人已经有了作为矩阵的细胞数组的结果,那么使用cellfun是最快的选择,它也比在单元元素上循环快得多:

C = cellfun(@mtimes, A, B, 'UniformOutput', false);

However, having to call num2cell first (Ac = num2cell(A, [1 2])) and cell2mat for the 3d-array case wastes too much time.

但是,必须首先调用num2cell(Ac = num2cell(A,[1 2]))和cell2mat用于3d数组,浪费了太多时间。


Here's some timing I did for a random set of 2 x 2 x 1e4:

这是我为随机的2 x 2 x 1e4做的一些时间:

 array-for: 0.057112
 arrayfun : 0.14206
 num2cell : 0.079468
 cell-for : 0.033173
 cellfun  : 0.025223
 cell2mat : 0.010213
 explicit : 0.0021338

Explicit refers to using direct calculation of the 2 x 2 matrix elements, see bellow. The result is similar for new random arrays, cellfun is the fastest if no num2cell is required before and there is no restriction to 2x2xN. For general 3d-arrays looping over the third dimension is indeed the fastest choice already. Here's the timing code:

显式是指直接计算2×2矩阵元素,见bellow。结果类似于新的随机数组,如果没有num2cell, cellfun是最快的,并且没有限制到2x2xN。对于一般的三维数组来说,在第三维上的循环确实是最快的选择。这是计时代码:

n = 2;
m = 2;
l = 1e4;

A = rand(n,m,l);
B = rand(m,n,l);

% naive for-loop:
tic
%Cf = nan(n,n,l);
Cf = A;
for jl = 1:l
    Cf(:,:,jl) = A(:,:,jl) * B(:,:,jl);
end;
disp([' array-for: ' num2str(toc)]);

% using arrayfun:
tic
Ca = arrayfun(@(k) A(:,:,k)*B(:,:,k), 1:size(A,3), 'UniformOutput',false);
Ca = cat(3,Ca{:});
disp([' arrayfun : ' num2str(toc)]);

tic
Ac = num2cell(A, [1 2]);
Bc = num2cell(B, [1 2]);
disp([' num2cell : ' num2str(toc)]);

% cell for-loop:
tic
Cfc = Ac;
for jl = 1:l
    Cfc{jl} = Ac{jl} * Bc{jl};
end;
disp([' cell-for : ' num2str(toc)]);

% using cellfun:
tic
Cc = cellfun(@mtimes, Ac, Bc, 'UniformOutput', false);
disp([' cellfun  : ' num2str(toc)]);

tic
Cc = cell2mat(Cc);
disp([' cell2mat : ' num2str(toc)]);

tic
Cm = A;
Cm(1,1,:) = A(1,1,:).*B(1,1,:) + A(1,2,:).*B(2,1,:);
Cm(1,2,:) = A(1,1,:).*B(1,2,:) + A(1,2,:).*B(2,2,:);
Cm(2,1,:) = A(2,1,:).*B(1,1,:) + A(2,2,:).*B(2,1,:);
Cm(2,2,:) = A(2,1,:).*B(1,2,:) + A(2,2,:).*B(2,2,:);
disp([' explicit : ' num2str(toc)]);

disp(' ');

#3


4  

Here is my benchmark test comparing the methods mentioned in @TobiasKienzler answer. I am using the TIMEIT function to get more accurate timings.

这是我的基准测试,比较了@TobiasKienzler答案中提到的方法。我使用TIMEIT函数来获得更精确的计时。

function [t,v] = matrixMultTest()
    n = 2; m = 2; p = 1e5;
    A = rand(n,m,p);
    B = rand(m,n,p);

    %# time functions
    t = zeros(5,1);
    t(1) = timeit( @() func1(A,B,n,m,p) );
    t(2) = timeit( @() func2(A,B,n,m,p) );
    t(3) = timeit( @() func3(A,B,n,m,p) );
    t(4) = timeit( @() func4(A,B,n,m,p) );
    t(5) = timeit( @() func5(A,B,n,m,p) );

    %# check the results
    v = cell(5,1);
    v{1} = func1(A,B,n,m,p);
    v{2} = func2(A,B,n,m,p);
    v{3} = func3(A,B,n,m,p);
    v{4} = func4(A,B,n,m,p);
    v{5} = func5(A,B,n,m,p);
    assert( isequal(v{:}) )
end

%# simple FOR-loop
function C = func1(A,B,n,m,p)
    C = zeros(n,n,p);
    for k=1:p
        C(:,:,k) = A(:,:,k) * B(:,:,k);
    end
end

%# ARRAYFUN
function C = func2(A,B,n,m,p)
    C = arrayfun(@(k) A(:,:,k)*B(:,:,k), 1:p, 'UniformOutput',false);
    C = cat(3, C{:});
end

%# NUM2CELL/FOR-loop/CELL2MAT
function C = func3(A,B,n,m,p)
    Ac = num2cell(A, [1 2]);
    Bc = num2cell(B, [1 2]);
    C = cell(1,1,p);
    for k=1:p
        C{k} = Ac{k} * Bc{k};
    end;
    C = cell2mat(C);
end

%# NUM2CELL/CELLFUN/CELL2MAT
function C = func4(A,B,n,m,p)
    Ac = num2cell(A, [1 2]);
    Bc = num2cell(B, [1 2]);
    C = cellfun(@mtimes, Ac, Bc, 'UniformOutput', false);
    C = cell2mat(C);
end

%# Loop Unrolling
function C = func5(A,B,n,m,p)
    C = zeros(n,n,p);
    C(1,1,:) = A(1,1,:).*B(1,1,:) + A(1,2,:).*B(2,1,:);
    C(1,2,:) = A(1,1,:).*B(1,2,:) + A(1,2,:).*B(2,2,:);
    C(2,1,:) = A(2,1,:).*B(1,1,:) + A(2,2,:).*B(2,1,:);
    C(2,2,:) = A(2,1,:).*B(1,2,:) + A(2,2,:).*B(2,2,:);
end

The results:

结果:

>> [t,v] = matrixMultTest();
>> t
t =
      0.63633      # FOR-loop
      1.5902       # ARRAYFUN
      1.1257       # NUM2CELL/FOR-loop/CELL2MAT
      1.0759       # NUM2CELL/CELLFUN/CELL2MAT
      0.05712      # Loop Unrolling

As I explained in the comments, a simple FOR-loop is the best solution (short of loop unwinding in the last case, which is only feasible for these small 2-by-2 matrices).

正如我在注释中所解释的那样,一个简单的for循环是最好的解决方案(在最后一个例子中,循环unwind是很短的,这对于这些小的2×2矩阵来说是可行的)。

#4


1  

One technique would be to create a 2Nx2N sparse matrix and embed on the diagonal the 2x2 matrices, for both A and B. Do the product with sparse matrices and take the result with slightly clever indexing and reshape it to 2x2xN.

一种方法是创建一个2Nx2N稀疏矩阵,并在对角上嵌入2x2矩阵,对a和b都用稀疏矩阵来做乘积,然后用稍微巧妙的索引来做结果,并将其重构到2x2xN。

But I doubt this will be faster than simple looping.

但我怀疑这不会比简单的循环更快。

#5


1  

An even faster method, in my experience, is to use dot multiplication and summation over the three-dimensional matrix. The following function, function z_matmultiply(A,B) multiplies two three dimensional matrices which have the same depth. Dot multiplication is done in a manner that is as parallel as possible, thus you might want to check the speed of this function and compare it to others over a large number of repetitions.

根据我的经验,一个更快的方法是用点乘和对三维矩阵求和。函数z_matmultiply(A,B)乘以两个具有相同深度的三维矩阵。点乘以尽可能平行的方式进行,因此你可能想要检查这个函数的速度,并将它与其他的重复次数进行比较。

function C = z_matmultiply(A,B)

[ma,na,oa] = size(A);
[mb,nb,ob] = size(B);

%preallocate the output as we will do a loop soon
C = zeros(ma,nb,oa);

%error message if the dimensions are not appropriate
if na ~= mb || oa ~= ob
    fprintf('\n z_matmultiply warning: Matrix Dimmensions Inconsistent \n')
else

% if statement minimizes for loops by looping the smallest matrix dimension 
if ma > nb
    for j = 1:nb
        Bp(j,:,:) = B(:,j,:);
        C(:,j,:) = sum(A.*repmat(Bp(j,:,:),[ma,1]),2);
    end
else
    for i = 1:ma
        Ap(:,i,:) = A(i,:,:);
        C(i,:,:) = sum(repmat(Ap(:,i,:),[1,nb]).*B,1);
    end 
end

end

#1


3  

I highly recommend you use the MMX toolbox of matlab. It can multiply n-dimensional matrices as fast as possible.

我强烈推荐你使用matlab的MMX工具箱。它可以使n维矩阵尽可能快。

The advantages of MMX are:

MMX的优点是:

  1. It is easy to use.
  2. 它很容易使用。
  3. Multiply n-dimensional matrices (actually it can multiply arrays of 2-D matrices)
  4. n维矩阵相乘(实际上它可以将二维矩阵的数组相乘)
  5. It performs other matrix operations (transpose, Quadratic Multiply, Chol decomposition and more)
  6. 它执行其他矩阵运算(转置、二次乘、Chol分解等)
  7. It uses C compiler and multi-thread computation for speed up.
  8. 它使用C编译器和多线程计算来加速。

For this problem, you just need to write this command:

对于这个问题,您只需要编写这个命令:

C=mmx('mul',A,B);

I added the following function to @Amro's answer

我将以下函数添加到@Amro的答案。

%# mmx toolbox
function C=func6(A,B,n,m,p)
    C=mmx('mul',A,B);
end

I got this result for n=2,m=2,p=1e5:

我得到了n=2的结果,m=2,p=1e5:

    1.6571 # FOR-loop
    4.3110 # ARRAYFUN
    3.3731 # NUM2CELL/FOR-loop/CELL2MAT
    2.9820 # NUM2CELL/CELLFUN/CELL2MAT
    0.0244 # Loop Unrolling
    0.0221 # MMX toolbox  <===================

I used @Amro's code to run the benchmark.

我使用了@Amro的代码来运行基准测试。

#2


6  

I did some timing tests now, the fastest way for 2x2xN turns out to be calculating the matrix elements:

我做了一些计时测试,2x2xN最快的方法是计算矩阵元素:

C = A;
C(1,1,:) = A(1,1,:).*B(1,1,:) + A(1,2,:).*B(2,1,:);
C(1,2,:) = A(1,1,:).*B(1,2,:) + A(1,2,:).*B(2,2,:);
C(2,1,:) = A(2,1,:).*B(1,1,:) + A(2,2,:).*B(2,1,:);
C(2,2,:) = A(2,1,:).*B(1,2,:) + A(2,2,:).*B(2,2,:);

In the general case it turns out the for loop is actually the fastest (don't forget to pre-allocate C though!).

一般情况下,for循环实际上是最快的(不要忘记预先分配C)。

Should one already have the result as cell-array of matrices though, using cellfun is the fastest choice, it is also faster than looping over the cell elements:

如果一个人已经有了作为矩阵的细胞数组的结果,那么使用cellfun是最快的选择,它也比在单元元素上循环快得多:

C = cellfun(@mtimes, A, B, 'UniformOutput', false);

However, having to call num2cell first (Ac = num2cell(A, [1 2])) and cell2mat for the 3d-array case wastes too much time.

但是,必须首先调用num2cell(Ac = num2cell(A,[1 2]))和cell2mat用于3d数组,浪费了太多时间。


Here's some timing I did for a random set of 2 x 2 x 1e4:

这是我为随机的2 x 2 x 1e4做的一些时间:

 array-for: 0.057112
 arrayfun : 0.14206
 num2cell : 0.079468
 cell-for : 0.033173
 cellfun  : 0.025223
 cell2mat : 0.010213
 explicit : 0.0021338

Explicit refers to using direct calculation of the 2 x 2 matrix elements, see bellow. The result is similar for new random arrays, cellfun is the fastest if no num2cell is required before and there is no restriction to 2x2xN. For general 3d-arrays looping over the third dimension is indeed the fastest choice already. Here's the timing code:

显式是指直接计算2×2矩阵元素,见bellow。结果类似于新的随机数组,如果没有num2cell, cellfun是最快的,并且没有限制到2x2xN。对于一般的三维数组来说,在第三维上的循环确实是最快的选择。这是计时代码:

n = 2;
m = 2;
l = 1e4;

A = rand(n,m,l);
B = rand(m,n,l);

% naive for-loop:
tic
%Cf = nan(n,n,l);
Cf = A;
for jl = 1:l
    Cf(:,:,jl) = A(:,:,jl) * B(:,:,jl);
end;
disp([' array-for: ' num2str(toc)]);

% using arrayfun:
tic
Ca = arrayfun(@(k) A(:,:,k)*B(:,:,k), 1:size(A,3), 'UniformOutput',false);
Ca = cat(3,Ca{:});
disp([' arrayfun : ' num2str(toc)]);

tic
Ac = num2cell(A, [1 2]);
Bc = num2cell(B, [1 2]);
disp([' num2cell : ' num2str(toc)]);

% cell for-loop:
tic
Cfc = Ac;
for jl = 1:l
    Cfc{jl} = Ac{jl} * Bc{jl};
end;
disp([' cell-for : ' num2str(toc)]);

% using cellfun:
tic
Cc = cellfun(@mtimes, Ac, Bc, 'UniformOutput', false);
disp([' cellfun  : ' num2str(toc)]);

tic
Cc = cell2mat(Cc);
disp([' cell2mat : ' num2str(toc)]);

tic
Cm = A;
Cm(1,1,:) = A(1,1,:).*B(1,1,:) + A(1,2,:).*B(2,1,:);
Cm(1,2,:) = A(1,1,:).*B(1,2,:) + A(1,2,:).*B(2,2,:);
Cm(2,1,:) = A(2,1,:).*B(1,1,:) + A(2,2,:).*B(2,1,:);
Cm(2,2,:) = A(2,1,:).*B(1,2,:) + A(2,2,:).*B(2,2,:);
disp([' explicit : ' num2str(toc)]);

disp(' ');

#3


4  

Here is my benchmark test comparing the methods mentioned in @TobiasKienzler answer. I am using the TIMEIT function to get more accurate timings.

这是我的基准测试,比较了@TobiasKienzler答案中提到的方法。我使用TIMEIT函数来获得更精确的计时。

function [t,v] = matrixMultTest()
    n = 2; m = 2; p = 1e5;
    A = rand(n,m,p);
    B = rand(m,n,p);

    %# time functions
    t = zeros(5,1);
    t(1) = timeit( @() func1(A,B,n,m,p) );
    t(2) = timeit( @() func2(A,B,n,m,p) );
    t(3) = timeit( @() func3(A,B,n,m,p) );
    t(4) = timeit( @() func4(A,B,n,m,p) );
    t(5) = timeit( @() func5(A,B,n,m,p) );

    %# check the results
    v = cell(5,1);
    v{1} = func1(A,B,n,m,p);
    v{2} = func2(A,B,n,m,p);
    v{3} = func3(A,B,n,m,p);
    v{4} = func4(A,B,n,m,p);
    v{5} = func5(A,B,n,m,p);
    assert( isequal(v{:}) )
end

%# simple FOR-loop
function C = func1(A,B,n,m,p)
    C = zeros(n,n,p);
    for k=1:p
        C(:,:,k) = A(:,:,k) * B(:,:,k);
    end
end

%# ARRAYFUN
function C = func2(A,B,n,m,p)
    C = arrayfun(@(k) A(:,:,k)*B(:,:,k), 1:p, 'UniformOutput',false);
    C = cat(3, C{:});
end

%# NUM2CELL/FOR-loop/CELL2MAT
function C = func3(A,B,n,m,p)
    Ac = num2cell(A, [1 2]);
    Bc = num2cell(B, [1 2]);
    C = cell(1,1,p);
    for k=1:p
        C{k} = Ac{k} * Bc{k};
    end;
    C = cell2mat(C);
end

%# NUM2CELL/CELLFUN/CELL2MAT
function C = func4(A,B,n,m,p)
    Ac = num2cell(A, [1 2]);
    Bc = num2cell(B, [1 2]);
    C = cellfun(@mtimes, Ac, Bc, 'UniformOutput', false);
    C = cell2mat(C);
end

%# Loop Unrolling
function C = func5(A,B,n,m,p)
    C = zeros(n,n,p);
    C(1,1,:) = A(1,1,:).*B(1,1,:) + A(1,2,:).*B(2,1,:);
    C(1,2,:) = A(1,1,:).*B(1,2,:) + A(1,2,:).*B(2,2,:);
    C(2,1,:) = A(2,1,:).*B(1,1,:) + A(2,2,:).*B(2,1,:);
    C(2,2,:) = A(2,1,:).*B(1,2,:) + A(2,2,:).*B(2,2,:);
end

The results:

结果:

>> [t,v] = matrixMultTest();
>> t
t =
      0.63633      # FOR-loop
      1.5902       # ARRAYFUN
      1.1257       # NUM2CELL/FOR-loop/CELL2MAT
      1.0759       # NUM2CELL/CELLFUN/CELL2MAT
      0.05712      # Loop Unrolling

As I explained in the comments, a simple FOR-loop is the best solution (short of loop unwinding in the last case, which is only feasible for these small 2-by-2 matrices).

正如我在注释中所解释的那样,一个简单的for循环是最好的解决方案(在最后一个例子中,循环unwind是很短的,这对于这些小的2×2矩阵来说是可行的)。

#4


1  

One technique would be to create a 2Nx2N sparse matrix and embed on the diagonal the 2x2 matrices, for both A and B. Do the product with sparse matrices and take the result with slightly clever indexing and reshape it to 2x2xN.

一种方法是创建一个2Nx2N稀疏矩阵,并在对角上嵌入2x2矩阵,对a和b都用稀疏矩阵来做乘积,然后用稍微巧妙的索引来做结果,并将其重构到2x2xN。

But I doubt this will be faster than simple looping.

但我怀疑这不会比简单的循环更快。

#5


1  

An even faster method, in my experience, is to use dot multiplication and summation over the three-dimensional matrix. The following function, function z_matmultiply(A,B) multiplies two three dimensional matrices which have the same depth. Dot multiplication is done in a manner that is as parallel as possible, thus you might want to check the speed of this function and compare it to others over a large number of repetitions.

根据我的经验,一个更快的方法是用点乘和对三维矩阵求和。函数z_matmultiply(A,B)乘以两个具有相同深度的三维矩阵。点乘以尽可能平行的方式进行,因此你可能想要检查这个函数的速度,并将它与其他的重复次数进行比较。

function C = z_matmultiply(A,B)

[ma,na,oa] = size(A);
[mb,nb,ob] = size(B);

%preallocate the output as we will do a loop soon
C = zeros(ma,nb,oa);

%error message if the dimensions are not appropriate
if na ~= mb || oa ~= ob
    fprintf('\n z_matmultiply warning: Matrix Dimmensions Inconsistent \n')
else

% if statement minimizes for loops by looping the smallest matrix dimension 
if ma > nb
    for j = 1:nb
        Bp(j,:,:) = B(:,j,:);
        C(:,j,:) = sum(A.*repmat(Bp(j,:,:),[ma,1]),2);
    end
else
    for i = 1:ma
        Ap(:,i,:) = A(i,:,:);
        C(i,:,:) = sum(repmat(Ap(:,i,:),[1,nb]).*B,1);
    end 
end

end