I have been trying to do code this: For each
我一直在尝试为每个人编写代码
So far, the best way I came up to do this is by using a loop.Here is an example
到目前为止,最好的方法是使用循环。这是一个例子
y=rnorm(10)
x=c(1,1,1,2,2,2,3,3,3,4)
z=c(5,5,6,6,7,7,8,8,9,9)
data=data.frame(y,x,z)
n=10
s=rep(NA,length(unique(x))*length(unique(z)))
dim(s)=c(length(unique(x)),length(unique(z)))
for (i in 1:length(unique(x))){
for (j in 1:length(unique(z))){
s[i,j]=sum(y*as.numeric((x<=unique(x)[i]))*
as.numeric((z<=unique(z)[j])))
}
}
The output is OK like this, but when my dimensions grows, this becomes inefficient. Since, for a given z, this looks like a conditional cumulative sum, I am 100% sure that there is a more efficient way of doing this, without the loop.
输出是这样的,但是当我的维度增长时,这个就变得低效了。因为,对于给定的z,这看起来像一个条件累积和,我100%肯定有一种更有效的方法,不用循环。
Would any of you have any suggestion? If I didn't have z, I know I could use data.table:
你们有什么建议吗?如果我没有z,我知道我可以使用data.table:
s=data[order(x)][,lapply(.SD, sum),by=c("x"), .SDcols=c("y")]
s=s[,lapply(.SD, cumsum), .SDcols=c("y")]
but with more than one index (x and z, not just x) I was not able to formulate the program.
但有不止一个指标(x和z,而不仅仅是x),我就不能制定程序。
2 个解决方案
#1
8
I don't think you require data.table
for this, as you're using the whole of "y" for each group. This'll be easier to accomplish through some linear algebra:
我认为你不需要数据。这个表格,当你对每个组使用整个y。这将更容易完成一些线性代数:
t(y*outer(x, unique(x), '<=')) %*% outer(z, unique(z), '<=')
[,1] [,2] [,3] [,4] [,5]
[1,] 0.3538152 0.1762013 0.1762013 0.1762013 0.1762013
[2,] 0.3538152 -0.7308157 -1.2421102 -1.2421102 -1.2421102
[3,] 0.3538152 -0.7308157 -1.2421102 -1.1770919 -1.8315592
[4,] 0.3538152 -0.7308157 -1.2421102 -1.1770919 -4.1171477
Here's your version of code for 3-dimensions:
这是你的三维代码版本:
set.seed(1)
y=rnorm(10)
x=c(1,1,1,2,2,2,3,3,3,4)
z=c(5,5,6,6,7,7,8,8,9,9)
w=c(7,7,8,8,9,9,10,10,11,11)
n=10
s=rep(NA,length(unique(w))*length(unique(z))*length(unique(x)))
dim(s)=c(length(unique(w)),length(unique(z)), length(unique(x)))
for (i in 1:length(unique(w))) {
for (j in 1:length(unique(z))) {
for (k in 1:length(unique(x))) {
s[i,j, k]=sum(y*as.numeric((w<=unique(w)[i]))*
as.numeric((z<=unique(z)[j]))*
as.numeric((x<=unique(x)[k])))
}
}
}
Here's how you can accomplish this with the same idea as my previous answer:
以下是如何做到这一点的方法,和我之前的答案一样:
t1 <- outer(x, unique(x), '<=')
t2 <- outer(z, unique(z), '<=')
t3 <- outer(w, unique(w), '<=')
lapply(seq_along(unique(x)), function(idx) t(y*t1[,idx]*t2) %*% t3)
Here the output is a list (instead of array), but the output is identical, you may compare the results with "s". You should be able to take it from here.
这里的输出是一个列表(而不是数组),但是输出是相同的,您可以将结果与“s”进行比较。你应该可以从这里拿出来。
#2
0
Following @Arun argumets, I managed to nest two lapply functions to get the solution generalized to upper dimensions.
在@Arun argumets之后,我成功地嵌套了两个lapply函数来将解推广到更高维度。
lapply(seq_along(unique(x)), function(idx){lapply(seq_along(unique(r)),
function(idr) t(y*t1[,idx]*t2)%*%(t3
*t4[,idr]))})
For adding other dimensions, I will keep nesting lapply functions. Is there a cleaner way to do this?
对于添加其他维度,我将继续嵌套lapply函数。有更干净的方法来做这件事吗?
#1
8
I don't think you require data.table
for this, as you're using the whole of "y" for each group. This'll be easier to accomplish through some linear algebra:
我认为你不需要数据。这个表格,当你对每个组使用整个y。这将更容易完成一些线性代数:
t(y*outer(x, unique(x), '<=')) %*% outer(z, unique(z), '<=')
[,1] [,2] [,3] [,4] [,5]
[1,] 0.3538152 0.1762013 0.1762013 0.1762013 0.1762013
[2,] 0.3538152 -0.7308157 -1.2421102 -1.2421102 -1.2421102
[3,] 0.3538152 -0.7308157 -1.2421102 -1.1770919 -1.8315592
[4,] 0.3538152 -0.7308157 -1.2421102 -1.1770919 -4.1171477
Here's your version of code for 3-dimensions:
这是你的三维代码版本:
set.seed(1)
y=rnorm(10)
x=c(1,1,1,2,2,2,3,3,3,4)
z=c(5,5,6,6,7,7,8,8,9,9)
w=c(7,7,8,8,9,9,10,10,11,11)
n=10
s=rep(NA,length(unique(w))*length(unique(z))*length(unique(x)))
dim(s)=c(length(unique(w)),length(unique(z)), length(unique(x)))
for (i in 1:length(unique(w))) {
for (j in 1:length(unique(z))) {
for (k in 1:length(unique(x))) {
s[i,j, k]=sum(y*as.numeric((w<=unique(w)[i]))*
as.numeric((z<=unique(z)[j]))*
as.numeric((x<=unique(x)[k])))
}
}
}
Here's how you can accomplish this with the same idea as my previous answer:
以下是如何做到这一点的方法,和我之前的答案一样:
t1 <- outer(x, unique(x), '<=')
t2 <- outer(z, unique(z), '<=')
t3 <- outer(w, unique(w), '<=')
lapply(seq_along(unique(x)), function(idx) t(y*t1[,idx]*t2) %*% t3)
Here the output is a list (instead of array), but the output is identical, you may compare the results with "s". You should be able to take it from here.
这里的输出是一个列表(而不是数组),但是输出是相同的,您可以将结果与“s”进行比较。你应该可以从这里拿出来。
#2
0
Following @Arun argumets, I managed to nest two lapply functions to get the solution generalized to upper dimensions.
在@Arun argumets之后,我成功地嵌套了两个lapply函数来将解推广到更高维度。
lapply(seq_along(unique(x)), function(idx){lapply(seq_along(unique(r)),
function(idr) t(y*t1[,idx]*t2)%*%(t3
*t4[,idr]))})
For adding other dimensions, I will keep nesting lapply functions. Is there a cleaner way to do this?
对于添加其他维度,我将继续嵌套lapply函数。有更干净的方法来做这件事吗?