R在位置处分割数字向量

时间:2022-02-02 21:46:05

I am wondering about the simple task of splitting a vector into two at a certain index:

我想知道在某个索引处将矢量分成两个的简单任务:

splitAt <- function(x, pos){
  list(x[1:pos-1], x[pos:length(x)])
}

a <- c(1, 2, 2, 3)

> splitAt(a, 4)
[[1]]
[1] 1 2 2

[[2]]
[1] 3

My question: There must be some existing function for this, but I can't find it? Is maybe split a possibility? My naive implementation also does not work if pos=0 or pos>length(a).

我的问题:必须有一些现有的功能,但我找不到它?也许是分裂的可能性?如果pos = 0或pos> length(a),我的天真实现也不起作用。

3 个解决方案

#1


23  

An improvement would be:

改进将是:

splitAt <- function(x, pos) unname(split(x, cumsum(seq_along(x) %in% pos)))

which can now take a vector of positions:

现在可以采取一个位置向量:

splitAt(a, c(2, 4))
# [[1]]
# [1] 1
# 
# [[2]]
# [1] 2 2
# 
# [[3]]
# [1] 3

And it does behave properly (subjective) if pos <= 0 or pos >= length(x) in the sense that it returns the whole original vector in a single list item. If you'd like it to error out instead, use stopifnot at the top of the function.

并且如果pos <= 0或pos> = length(x),它在单个列表项中返回整个原始向量的意义上它表现得恰当(主观)。如果您希望错误输出,请在函数顶部使用stopifnot。

#2


4  

I tried to use flodel's answer, but it was too slow in my case with a very large x (and the function has to be called repeatedly). So I created the following function that is much faster, but also very ugly and doesn't behave properly. In particular, it doesn't check anything and will return buggy results at least for pos >= length(x) or pos <= 0 (you can add those checks yourself if you're unsure about your inputs and not too concerned about speed), and perhaps some other cases as well, so be careful.

我尝试使用flodel的答案,但在我的情况下使用非常大的x(并且必须重复调用该函数)太慢了。所以我创建了以下功能,这个功能要快得多,而且非常难看并且行为不正常。特别是,它不检查任何东西,并且至少对于pos> = length(x)或pos <= 0将返回错误结果(如果你不确定你的输入并且不太关心速度,你可以自己添加这些检查),也许还有其他一些案例,所以要小心。

splitAt2 <- function(x, pos) {
    out <- list()
    pos2 <- c(1, pos, length(x)+1)
    for (i in seq_along(pos2[-1])) {
        out[[i]] <- x[pos2[i]:(pos2[i+1]-1)]
    }
    return(out)
}

However, splitAt2 runs about 20 times faster with an x of length 106:

但是,splitAt2运行速度大约快20倍,x长度为106:

library(microbenchmark)
W <- rnorm(1e6)
splits <- cumsum(rep(1e5, 9))
tm <- microbenchmark(
                     splitAt(W, splits),
                     splitAt2(W, splits),
                     times=10)
tm

#3


1  

Another alternative that might be faster and/or more readable/elegant than flodel's solution:

另一种可能比flodel解决方案更快和/或更易读/更优雅的替代方案:

splitAt <- function(x, pos) {
  unname(split(x, findInterval(x, pos)))
}

#1


23  

An improvement would be:

改进将是:

splitAt <- function(x, pos) unname(split(x, cumsum(seq_along(x) %in% pos)))

which can now take a vector of positions:

现在可以采取一个位置向量:

splitAt(a, c(2, 4))
# [[1]]
# [1] 1
# 
# [[2]]
# [1] 2 2
# 
# [[3]]
# [1] 3

And it does behave properly (subjective) if pos <= 0 or pos >= length(x) in the sense that it returns the whole original vector in a single list item. If you'd like it to error out instead, use stopifnot at the top of the function.

并且如果pos <= 0或pos> = length(x),它在单个列表项中返回整个原始向量的意义上它表现得恰当(主观)。如果您希望错误输出,请在函数顶部使用stopifnot。

#2


4  

I tried to use flodel's answer, but it was too slow in my case with a very large x (and the function has to be called repeatedly). So I created the following function that is much faster, but also very ugly and doesn't behave properly. In particular, it doesn't check anything and will return buggy results at least for pos >= length(x) or pos <= 0 (you can add those checks yourself if you're unsure about your inputs and not too concerned about speed), and perhaps some other cases as well, so be careful.

我尝试使用flodel的答案,但在我的情况下使用非常大的x(并且必须重复调用该函数)太慢了。所以我创建了以下功能,这个功能要快得多,而且非常难看并且行为不正常。特别是,它不检查任何东西,并且至少对于pos> = length(x)或pos <= 0将返回错误结果(如果你不确定你的输入并且不太关心速度,你可以自己添加这些检查),也许还有其他一些案例,所以要小心。

splitAt2 <- function(x, pos) {
    out <- list()
    pos2 <- c(1, pos, length(x)+1)
    for (i in seq_along(pos2[-1])) {
        out[[i]] <- x[pos2[i]:(pos2[i+1]-1)]
    }
    return(out)
}

However, splitAt2 runs about 20 times faster with an x of length 106:

但是,splitAt2运行速度大约快20倍,x长度为106:

library(microbenchmark)
W <- rnorm(1e6)
splits <- cumsum(rep(1e5, 9))
tm <- microbenchmark(
                     splitAt(W, splits),
                     splitAt2(W, splits),
                     times=10)
tm

#3


1  

Another alternative that might be faster and/or more readable/elegant than flodel's solution:

另一种可能比flodel解决方案更快和/或更易读/更优雅的替代方案:

splitAt <- function(x, pos) {
  unname(split(x, findInterval(x, pos)))
}