在R中对四维数组的累积和的快速计算。

I'm relatively new to R programming, and this website has been very helpful for me so far, but I was unable to find a question that already covered what I want to know. So I decided to post a question myself.

我对R编程比较陌生，这个网站到目前为止对我很有帮助，但是我找不到一个已经涵盖了我想知道的问题。所以我决定自己发布一个问题。

My problem is the following: I want to find efficient ways to compute cumulative sums over four-dimensional arrays, i.e. I have data in a four-dimensional array x and want to write a function that computes an array x_sum such that

我的问题是:我想找到一种高效的方法来计算四维数组的累积和，也就是说，我有四维数组x中的数据，并想编写一个函数来计算一个x_sum数组

x_sum[i,j,k,l] = sum_{ind1 <= i, ind2 <= j, ind3 <= k, ind4 <=l} x[ind1, ind2, ind3, ind4].

x_sum[i,j,k,l] = sum_{ind1 <= i, ind2 <= j, ind3 <= k, ind4 <=l}]

I want to use this function billions of times, which makes it very important that it be as efficient as possible. Although I have come up with several ways to calculate the sums (see below), I suspect more experienced R programmers might be able to find a more efficient solution. So if anyone can suggest a better way of doing this, I would be very grateful.

我想要使用这个函数几十亿次，这使得它尽可能的高效非常重要。虽然我提出了几种计算和的方法(见下面)，但我怀疑更有经验的R程序员可能能够找到更有效的解决方案。如果有人能提出更好的方法，我将不胜感激。

Here's what I've tried so far:

以下是我迄今为止所尝试的:

I have found three different implementations (each of which brought a gain in speed) that work (see code below): One in R using the cumsum() function (cumsum_4R) and two implementations where the „heavy lifting“ is done in C (using the .C() interface). The first implementation in C is merely a naive attempt to write the sums using nested for-loops and pointer arithmetic (cumsumC_4_old). In the second C-implementation (cumsumC_4) I tried to adapt my code using the ideas in the following article

我发现三种不同的实现(每一个都带来了增长速度)(请参见下面的代码):工作中使用cumsum R()函数(cumsum_4R)和两种实现„重担”在C(使用C()接口)。C中的第一个实现只是简单地尝试使用嵌套的for循环和指针算法(cumsumC_4_old)来编写和。在第二个c实现(cumsumC_4)中，我尝试使用以下文章中的思想来修改代码

As you can see in the source code below, the adaptation is relatively lopsided: For some dimensions, I was able to replace all the nested for-loops but not for others. Do you have ideas on how to do that?

正如您在下面的源代码中所看到的，适应性是相对不均衡的:对于某些维度，我能够替换所有嵌套的For循环，但对于其他维度则不行。你知道怎么做吗?

Using microbenchmark on the three implementations, I get the following result for arrays of size 40x40x40x40:

使用三个实现上的微基准，我得到了大小为40x40x40x40 x40的数组的以下结果:

Unit: milliseconds
             expr       min         lq       mean     median         uq
     cumsum_4R(x) 976.13258 1029.33371 1064.35100 1051.37782 1074.23234
 cumsumC_4_old(x) 174.72868  177.95875  192.75392  184.11121  203.18141
     cumsumC_4(x)  56.87169   57.73512   67.34714   63.20269   68.80326
       max neval
 1381.5832    50
  283.2384    50
  105.7099    50

Additional information: 1) Since this made it easier to install any needed packages, I ran the benchmarks on my personal computer under Windows, but I plan on running the finished simulations on a computer from my university, which runs under Linux.

附加信息:1)由于这使安装任何需要的软件包更容易，我在Windows下的个人电脑上运行基准测试，但我计划在我大学的一台计算机上运行完成的模拟，该计算机在Linux下运行。

EDIT: 2) The four-dimensional data x[i,j,k,l] is actually the result of the product of two applications of the outer function: First, the outer product of a matrix with itself (i.e. outer(mat,mat)) and then taking the pairwise minima of another matrix (i.e. outer(mat2, mat2, pmin)). Then the data is the product

编辑:2)四维数据x[i,j,k,l]实际上是两个外函数应用的乘积的结果:首先，一个矩阵的外积(即外积(mat,mat))，然后取另一个矩阵(即外积(mat2, mat2, pmin))的对极小。然后数据就是乘积

x = outer(mat, mat) * outer(mat2, mat2, pmin),

x =外层(mat, mat) *外层(mat2, mat2, pmin)

i.e.
x[i,j,k,l] = mat[i,j] * mat[k,l] * min(mat2[i,j], mat2[k,l])

即x[i,j,k,l]=垫(i,j)*垫(k,l)*分钟(mat2(i,j),mat2[k,l])

The four-dimensional array has the corresponding symmetries.

四维数组具有相应的对称性。

3)The reason I need these cumulative sums in the first place is that I want to run simulations of a test for which I need partial sums over „rectangles“ of indices: I want to iterate over all sums of the form

3)我需要这些累积金额的原因首先是,我想要运行的模拟测试,我需要部分和„矩形”指标:我要遍历所有的表单

sum_{k1<=i1 <= m1,k2<=i2 <= m2, k1 <= i3 <= m1, k2 <= i4 <=m2} x[i1, i2, i3, i4],

sum_ { k1 < = i1和i2 < < = m1,k2 < = = m2,k1 < = i3 < = m1,k2 < = < = m2 } x(i1、i2 i3,预告),

where 1<=k1<=m1<=n, 1<=k2<=m2<=n. In order to avoid calculating the sum of the same variables over and over again, I first calculate all the cumulative sums and then calculate the sums over rectangles as functions of the cumulative sums. Do you know of a more efficient way to do this? EDIT to 3): In order to include all potentially important aspects: I also want to calculate sums of the form

1 < = k1 m1 < = n,1 < < = = k2 < = m2 < = n。为了避免反复计算相同变量的和，我首先计算所有的累积和，然后把对矩形的和作为累积和的函数来计算。你知道有什么更有效的方法吗?为了包含所有潜在的重要方面:我还想计算表单的和

sum_{k1<=i1 <= m1,k2<=i2 <= m2, 1 <= i3 <= n, 1 <= i4 <=n} x[i1, i2, i3, i4].

sum_ { k1 < = i1和i2 < < = m1,k2 < = =平方米,1 < = i3 < = n,1 < = < = n } x(i1、i2 i3,预告)。

(Since I can trivially obtain them using the cumulative sums, I had not included this specification before).

(由于我可以使用累积和来获得它们，所以我以前没有包含此规范)。

Here is the C code I use (which I save as „cumsumC.c“):

这是我使用的C代码(我另存为„cumsumC.c”):

#include<R.h>
#include<math.h>
#include <stdio.h>

int min(int a, int b){
    if(a <= b) return a;
    else return b;
}

void cumsumC_4_old(double* x, int* nv){
    int n = *nv;
    int n2 = n*n;
    int n3 = n*n*n;
    //Dim 1
    for(int i=0; i<n; i++){
        for(int j=0; j<n; j++){
            for(int k=0; k<n; k++){
                for(int l=1; l<n; l++){
                    x[i+j*n+k*n2+l*n3] += x[i + j*n +k*n2+(l-1)*n3];
                }
            }
        }
    }
    //Dim 2
    for(int i=0; i<n; i++){
        for(int j=0; j<n; j++){
            for(int k=1; k<n; k++){
                for(int l=0; l<n; l++){
                    x[i+j*n+k*n2+l*n3] += x[i + j*n +(k-1)*n2+l*n3];
                }
            }
        }
    }
    //Dim 3
    for(int i=0; i<n; i++){
        for(int j=1; j<n; j++){
            for(int k=0; k<n; k++){
                for(int l=0; l<n; l++){
                    x[i+j*n+k*n2+l*n3] += x[i + (j-1)*n +k*n2+l*n3];
                }
            }
        }
    }
    //Dim 4
    for(int i=1; i<n; i++){
        for(int j=0; j<n; j++){
            for(int k=0; k<n; k++){
                for(int l=0; l<n; l++){
                    x[i+j*n+k*n2+l*n3] += x[i-1 + j*n +k*n2+l*n3];
                }
            }
        }
    }
}


void cumsumC_4(double* x, int* nv){
    int n = *nv;
    int n2 = n*n;
    int n3 = n*n*n;
    long ind1, ind2; 
    long index, indexges = n +(n-1)*n+(n-1)*n2+(n-1)*n3, indexend;
    //Dim 1
    index = n3;
    while(index != indexges){
        x[index] += x[index-n3];
        index++;
    }   
    //Dim 2
    long teilind = n+(n-1)*n;
    for(int k=1; k<n; k++){
        ind1 = k*n2;
        ind2 = ind1 - n2;
        for(int l=0; l<n; l++){
            index = l*n3;
            indexend = teilind+index;
            while(index != indexend){
                x[index+ind1] += x[index+ind2];
                index++;
            }
        }
    }
    //Dim 3
    ind1 = n;
    while(ind1 < n+(n-1)*n){
        index = 0;
        indexend = indexges - ind1;
        ind2 = ind1-n;
        while(index < indexend){
            x[ind1+index] += x[ind2+index];
            index += n2;
        }
        ind1++;
    }
    //Dim 4
    index = 0;
    int i;
    long minind;
    while(index < indexges){
        i = 1;
        minind = min(indexges, index+n);
        while(index+i < minind){ 
            x[index+i] += x[index+i-1];
            i++;
        }
        index+=n;
    }
}

Here is the R function „cumsum_4R“ and code used to call and compare the cumulative sum functions in R (under Windows; for Linux, the commands dyn.load/dyn.unload need to be adjusted; ideally, I want to use the functions on 50^4 size arrays, but since the call to microbenchmark would then take a while, I have chosen n=40 here):

这是R函数„cumsum_4R”和代码用于调用和比较累积求和函数R(Windows下;对于Linux，命令是dyn.load/dyn。卸货需要调整;理想情况下,我想用50 ^ 4大小数组函数,但由于调用微基准测试将需要一段时间,我在这里选择了n = 40):

library("microbenchmark")

# dyn.load("cumsumC.so")
dyn.load("cumsumC.dll")

cumsum_4R <- function(x){
   return(aperm(apply(apply(aperm(apply(apply(x, 2:4,function(a) cumsum(as.numeric(a))), c(1,3,4) , function(a) cumsum(as.numeric(a))), c(2,1,3,4)), c(1,2,4), function(a) cumsum(as.numeric(a))), 1:3, function(a) cumsum(as.numeric(a))), c(3,4,2,1)))
}

cumsumC_4_old <- function(x){
    n <- dim(x)[1]
    arr <- array(.C("cumsumC_4_old", res=as.double(x), as.integer(n))$res, dim=c(n,n,n,n))
    return(arr)
}


cumsumC_4 <- function(x){
    n <- dim(x)[1]
    arr <- array(.C("cumsumC_4", res=as.double(x), as.integer(n))$res, dim=c(n,n,n,n))
    return(arr)
}

set.seed(1234)

n <- 40
x <- array(rnorm(n^4),dim=c(n,n,n,n))
r <- 6 #parameter for rounding results for comparison

res1 <- cumsum_4R(x)
res2 <- cumsumC_4_old(x)
res3 <- cumsumC_4(x)

print(c("Identical R and C1:", identical(round(res1,r),round(res2,r))))
print(c("Identical R and C2:",identical(round(res1,r),round(res3,r))))

times <- microbenchmark(cumsum_4R(x), cumsumC_4_old(x),cumsumC_4(x),times=50)
print(times)

dyn.unload("cumsumC.dll")
# dyn.unload("cumsumC.so")

Thank you for your help!

谢谢你的帮助!

1 个解决方案

#1

You can indeed use points 2 and 3 in your original question to solve the problem more efficiently. Actually, this makes the problem separable. By separable I mean that the limits of the 4 sums in Equation 3 do not depend on the variables you sum over. This and the fact that x is an outer product of 2 matrices enables you to separate the 4-fold sum in Eq. 3 into an outer product of two 2-fold sums. Even better: the 2 matrices used to define x are the same (denoted by mat by you) - so the two 2-fold sums give the same matrix, which has to be calculated only once. Here the code:

你确实可以在最初的问题中使用2点和3点来更有效地解决问题。实际上，这个问题是可分离的。所谓可分离，我的意思是，方程3中的4个和的极限不取决于你求和的变量。这个和x是2个矩阵的外积的事实使你可以把4倍的和，3的和2倍的和。更好的是:用来定义x的两个矩阵是相同的(你用mat表示)——所以这两个2倍和给出的矩阵是相同的，只需要计算一次。下面的代码:

set.seed(1234)
n=40
mat=array(rnorm(n^2),dim=c(n,n))
x=outer(mat,mat)

cumsum_sep=function(x) {
  #calculate matrix corresponding to 2-fold sums
  #actually it's just one matrix because x is an outer product of mat with itself
  tmp=t(apply(apply(x,2,cumsum),1,cumsum))
  #outer product of two-fold sums
  outer(tmp,tmp)
}

y1=cumsum_4R(x)
#note that cumsum_sep operates on the original matrix mat!
y2=cumsum_sep(mat)

Check whether results are the same

检查结果是否相同

all.equal(y1,y2)
[1] TRUE

This gives the benchmark results

这将给出基准测试结果

microbenchmark(cumsum_4R(x),cumsum_sep(mat),times=10)

Unit: milliseconds
          expr         min          lq       mean     median         uq       max neval cld
 cumsum_4R(xx) 2084.454155 2135.852305 2226.59692 2251.95928 2270.15198 2402.2724    10   b
 cumsum_sep(x)    6.844939    7.145546   32.75852   14.45762   34.94397  120.0846    10  a

Quite a difference! :)

很差!:)

#1

set.seed(1234)
n=40
mat=array(rnorm(n^2),dim=c(n,n))
x=outer(mat,mat)

cumsum_sep=function(x) {
  #calculate matrix corresponding to 2-fold sums
  #actually it's just one matrix because x is an outer product of mat with itself
  tmp=t(apply(apply(x,2,cumsum),1,cumsum))
  #outer product of two-fold sums
  outer(tmp,tmp)
}

y1=cumsum_4R(x)
#note that cumsum_sep operates on the original matrix mat!
y2=cumsum_sep(mat)