寻找一种高效的计算方法——Matlab

I have a scalar function f([x,y],[i,j])= exp(-norm([x,y]-[i,j])^2/sigma^2) which receives two 2-dimensional vectors as input (norm here implements the Euclidean norm). The values of x,i range in 1:w and the values y,j range in 1:h. I want to create a cell array X such that X{x,y} will contain a w x h matrix such that X{x,y}(i,j) = f([x,y],[i,j]). This can obviously be done using 4 nested loops like so:

我有一个标量函数f((x,y)(I,j))= exp(规范((x,y)-(I,j))^ 2 /σ^ 2)接收两个二维向量作为输入(规范实现了欧几里得范数)。x,i的取值范围是1:w, y,j的取值范围是1:h。我想创建一个单元数组X，使X{X,y}包含一个w X h矩阵，使X{X,y}(I,j) = f([X,y]，[I,j])。这显然可以通过4个嵌套循环来实现:

for x=1:w;
for y=1:h;
    X{x,y}=zeros(w,h);
    for i=1:w
        for j=1:h
            X{x,y}(i,j)=f([x,y],[i,j])
        end 
    end
end
end

This is however extremely inefficient. I would very much appreciate an efficient way to create X.

然而，这是极其低效的。我非常感谢创建X的有效方法。

3 个解决方案

#1

The one way to do this is to remove the 2 innermost loops and replace then with a vectorised version. By the look of your f function this shouldn't be too bad

这样做的一种方法是删除两个最内部的循环，然后用矢量化的版本替换。从f函数的角度来看，这应该不算太糟

First we need to construct two matrices containing the 1 to w on every row and 1 to h on every column like so

首先，我们需要构造两个矩阵，每个行包含1到w，每个列包含1到h

wMat=repmat(1:w,h,1);
hMat=repmat(1:h,w,1)';

This is going to represent the inner two loops, and the transpose will allow us to get all combinations. Now we can vectorise the calculation (f([x,y],[i,j])= exp(-norm([x,y]-[i,j])^2/sigma^2)):

它表示内部的两个循环，转置可以得到所有的组合。现在我们可以vectorise计算(f(x,y)(i,j))= exp(规范((x,y)-(i,j))^ 2 /σ^ 2)):

for x=1:w;
    for y=1:h;
        temp1=sqrt((x-wMat).^2+(y-hMat).^2);
        X{x,y}=exp(temp1/(sigma^2));
    end
end

Where we have computed the Euclidean norm for all pairs of nodes in the inner loops at once.

这里我们计算了内循环中所有节点对的欧几里得范数。

#2

Some discussion and code

The trick here is to perform the norm-calculations with numeric arrays and save the results into a cell array version as late as possible. For performing the norm-calculations you can take help of ndgrid, bsxfun and some permute + reshape to give it the "shape" as needed for the final cell array version. So, here's the vectorized approach to perform these tasks -

这里的技巧是使用数字数组执行常规计算，并尽可能晚地将结果保存到单元数组版本中。为了执行常规计算，您可以借助ndgrid、bsxfun和一些permute +整形来为最终的单元阵列版本提供所需的“形状”。这是矢量化的方法来执行这些任务

%// Create x-y/i-j values to be used for calculation of function values
[xi,yi] = ndgrid(1:w,1:h);

%// Get the norm values
normvals = sqrt(bsxfun(@minus,xi(:),xi(:).').^2 + ...
                                bsxfun(@minus,yi(:),yi(:).').^2);
%// Get the actual function values
vals = exp(-normvals.^2/sigma^2); 

%// Get the values into blocks of a 4D array and then re-arrange to match
%// with the shape of numeric array version of X
blks = reshape(permute(reshape(vals, w*h, h, []), [2 1 3]), h, w, h, w);
arranged_blks = reshape(permute(blks,[2 3 1 4]),w,h,w,h);

%// Finally get the cell array version
X = squeeze(mat2cell(arranged_blks,w,h,ones(1,w),ones(1,h)));

Benchmarking and runtimes

After improving the original loopy code with pre-allocation for X and function-inling f, runtime-benchmarks were performed with it against the proposed vectorized approach with datasizes as w, h = 60 and the runtime results thus obtained were -

在对原有的loopy代码进行改进后，对X和function-inling f进行预分配，并对基于w、h = 60的矢量化方法进行运行时基准测试，得到的运行时结果为-

----------- With Improved loopy code
Elapsed time is 41.227797 seconds.
----------- With Vectorized code
Elapsed time is 2.116782 seconds.

This suggested a whooping close to 20x speedup with the proposed solution!

这就建议使用所提出的解决方案加速20倍!

For extremely huge datasizes

If you are dealing with huge datasizes, essentially you are not giving enough memory for bsxfun to work with, and bsxfun is known to use up a lot of memory for giving you a performance-efficient vectorized solution. So, for such huge-datasize cases, you can use the following loopy approach to replace normvals calculations that was listed in the earlier bsxfun based solution -

如果您正在处理大量的数据分析，那么基本上您没有为bsxfun提供足够的内存来处理，bsxfun会消耗大量内存，从而为您提供性能高效的矢量化解决方案。因此，对于这种大规模数据化的情况，您可以使用下面的loopy方法来替换先前基于bsxfun的解决方案中列出的常规计算

%// Get the norm values
nx = numel(xi);
normvals = zeros(nx,nx);
for ii = 1:nx
    normvals(:,ii) = sqrt( (xi(:) - xi(ii)).^2 + (yi(:) - yi(ii)).^2 );
end

#3

It seems to me that when you run through the cycle for x=w, y=h, you are calculating all the values you need at once. So you don't need recalculate them. Once you have this:

在我看来，当你运行x=w, y=h时，你需要同时计算所有的值。所以你不需要重新计算它们。一旦你有了这个:

for i=1:w
    for j=1:h
        temp(i,j)=f([x,y],[i,j])
    end 
end

Then, e.g. X{1,1} is just temp(1,1), X{2,2} is just temp(1:2,1:2), and so on. If you can vectorise the calculation of f (norm here is just the Euclidean norm of that vector?) then it will get even simpler.

然后，例如，X{1,1}就是temp(1,1)， X{2,2}就是temp(1:2,1:2)，以此类推。如果你能向量化f的计算(这里的范数就是这个向量的欧几里得范数?)那么它会变得更简单。

#1

The one way to do this is to remove the 2 innermost loops and replace then with a vectorised version. By the look of your f function this shouldn't be too bad

这样做的一种方法是删除两个最内部的循环，然后用矢量化的版本替换。从f函数的角度来看，这应该不算太糟

First we need to construct two matrices containing the 1 to w on every row and 1 to h on every column like so

首先，我们需要构造两个矩阵，每个行包含1到w，每个列包含1到h

wMat=repmat(1:w,h,1);
hMat=repmat(1:h,w,1)';

This is going to represent the inner two loops, and the transpose will allow us to get all combinations. Now we can vectorise the calculation (f([x,y],[i,j])= exp(-norm([x,y]-[i,j])^2/sigma^2)):

它表示内部的两个循环，转置可以得到所有的组合。现在我们可以vectorise计算(f(x,y)(i,j))= exp(规范((x,y)-(i,j))^ 2 /σ^ 2)):

for x=1:w;
    for y=1:h;
        temp1=sqrt((x-wMat).^2+(y-hMat).^2);
        X{x,y}=exp(temp1/(sigma^2));
    end
end

Where we have computed the Euclidean norm for all pairs of nodes in the inner loops at once.

这里我们计算了内循环中所有节点对的欧几里得范数。

#2

Some discussion and code

%// Create x-y/i-j values to be used for calculation of function values
[xi,yi] = ndgrid(1:w,1:h);

%// Get the norm values
normvals = sqrt(bsxfun(@minus,xi(:),xi(:).').^2 + ...
                                bsxfun(@minus,yi(:),yi(:).').^2);
%// Get the actual function values
vals = exp(-normvals.^2/sigma^2); 

%// Get the values into blocks of a 4D array and then re-arrange to match
%// with the shape of numeric array version of X
blks = reshape(permute(reshape(vals, w*h, h, []), [2 1 3]), h, w, h, w);
arranged_blks = reshape(permute(blks,[2 3 1 4]),w,h,w,h);

%// Finally get the cell array version
X = squeeze(mat2cell(arranged_blks,w,h,ones(1,w),ones(1,h)));

Benchmarking and runtimes

在对原有的loopy代码进行改进后，对X和function-inling f进行预分配，并对基于w、h = 60的矢量化方法进行运行时基准测试，得到的运行时结果为-

----------- With Improved loopy code
Elapsed time is 41.227797 seconds.
----------- With Vectorized code
Elapsed time is 2.116782 seconds.

This suggested a whooping close to 20x speedup with the proposed solution!

这就建议使用所提出的解决方案加速20倍!

For extremely huge datasizes

%// Get the norm values
nx = numel(xi);
normvals = zeros(nx,nx);
for ii = 1:nx
    normvals(:,ii) = sqrt( (xi(:) - xi(ii)).^2 + (yi(:) - yi(ii)).^2 );
end

#3

It seems to me that when you run through the cycle for x=w, y=h, you are calculating all the values you need at once. So you don't need recalculate them. Once you have this:

在我看来，当你运行x=w, y=h时，你需要同时计算所有的值。所以你不需要重新计算它们。一旦你有了这个:

for i=1:w
    for j=1:h
        temp(i,j)=f([x,y],[i,j])
    end 
end

秒客网

寻找一种高效的计算方法——Matlab

3 个解决方案

#1

#2

Some discussion and code

Benchmarking and runtimes

For extremely huge datasizes

#3

#1

#2

Some discussion and code

Benchmarking and runtimes

For extremely huge datasizes

#3

相关文章