I right away give an example, now suppose I have 3 arrays a,b,c such as
我马上给出一个例子,现在假设我有3个阵列a,b,c等
a = c(3,5)
b = c(6,1,8,7)
c = c(4,2,9)
I must be able to extract consecutive triplets among them i,e.,
我必须能够在它们中提取连续的三元组,例如,
c(1,2,3),c(4,5,6)
But this was just an example, I would be having a larger data set with even more than 10 arrays, hence must be able to find the consecutive series of length ten.
但这仅仅是一个例子,我将拥有一个甚至超过10个数组的更大数据集,因此必须能够找到长度为10的连续系列。
So could anyone provide an algorithm, to generally find the consecutive series of length 'n' among 'n' arrays.
因此,任何人都可以提供算法,通常在'n'数组中找到连续的长度为'n'的系列。
I am actually doing this stuff in R, so its preferable if you give your code in R. Yet algorithm from any language is more than welcomed.
我实际上是在R中做这些东西,所以如果你用R代码你的代码它更可取。但是来自任何语言的算法都非常受欢迎。
4 个解决方案
#1
7
Reorganize the data first into a list containing value and array number. Sort the list; you'd have smth like:
首先将数据重新组织到包含值和数组编号的列表中。对列表进行排序;你会喜欢:
1-2
2-3
3-1 (i.e. " there' s a three in array 1" )
4-3
5-1
6-2
7-2
8-2
9-3
Then loop the list, check if there are actually n consecutive numbers, then check if these had different array numbers
然后循环列表,检查实际上是否有n个连续数字,然后检查它们是否有不同的数组编号
#2
5
Here's one approach. This assumes there are no breaks in the sequence of observations in the number of groups. Here the data.
这是一种方法。这假设在组数的观察序列中没有中断。这里的数据。
N <- 3
a <- c(3,5)
b <- c(6,1,8,7)
c <- c(4,2,9)
Then i combine them together and order by the observations
然后我将它们组合在一起并按观察顺序排列
dd <- lattice::make.groups(a,b,c)
dd <- dd[order(dd$data),]
Now I look for rows in this table where all three groups are represented
现在我在这个表中查找表示所有三个组的行
idx <- apply(embed(as.numeric(dd$which),N), 1, function(x) {
length(unique(x))==N
})
Then we can see the triplets with
然后我们可以看到三胞胎
lapply(which(idx), function(i) {
dd[i:(i+N-1),]
})
# [[1]]
# data which
# b2 1 b
# c2 2 c
# a1 3 a
#
# [[2]]
# data which
# c1 4 c
# a2 5 a
# b1 6 b
#3
2
Here is a brute force method with expand.grid
and three vectors as in the example
这是一个使用expand.grid和示例中的三个向量的强力方法
# get all combinations
df <- expand.grid(a,b,c)
Using combn
to calculate difference for each pairwise combination.
使用combn计算每个成对组合的差异。
# get all parwise differences
myDiffs <- combn(names(df), 2, FUN=function(x) abs(x[1]-x[2]))
# subset data using `rowSums` and `which`
df[which(rowSums(myDiffs == 1) == ncol(myDiffs)-1), ]
df[which(rowSums(myDiffs == 1) == ncol(myDiffs)-1), ]
Var1 Var2 Var3
2 5 6 4
11 3 1 2
#4
1
I have hacked together a little recursive function that will find all the consecutive triplets amongst as many vectors as you pass it (need to pass at least three). It is probably a little crude, but seems to work.
我已经破解了一个小的递归函数,它会在你传递它的同时找到所有连续三元组中的所有连续三元组(需要传递至少三个)。它可能有点粗糙,但似乎有效。
The function uses the ellipsis, ...
, for passing arguments. Hence it will take however many arguments (i.e. numeric vectors) you provide and put them in the list items
. Then the smallest value amongst each passed vector is located, along with its index.
该函数使用省略号,...,用于传递参数。因此,您需要提供许多参数(即数字向量)并将它们放在列表项中。然后定位每个传递的矢量中的最小值及其索引。
Then the indeces of the vectors corresponding to the smallest triplet are created and iterated through using a for()
loop, where the output values are passed to the output vector out
. The input vectors in items
are pruned and passed again into the function in a recursive fashion. Only, when all vectors are NA
, i.e. there are no more values in the vectors, the function returns the final result.
然后,通过使用for()循环创建并迭代对应于最小三元组的向量的indeces,其中输出值被传递到输出向量out。项目中的输入向量被修剪并以递归方式再次传递到函数中。只有当所有向量都是NA时,即向量中没有更多值时,函数才返回最终结果。
library(magrittr)
# define function to find the triplets
tripl <- function(...){
items <- list(...)
# find the smallest number in each passed vector, along with its index
# output is a matrix of n-by-2, where n is the number of passed arguments
triplet.id <- lapply(items, function(x){
if(is.na(x) %>% prod) id <- c(NA, NA)
else id <- c(which(x == min(x)), x[which(x == min(x))])
}) %>% unlist %>% matrix(., ncol=2, byrow=T)
# find the smallest triplet from the passed vectors
index <- order(triplet.id[,2])[1:3]
# create empty vector for output
out <- vector()
# go through the smallest triplet's indices
for(i in index){
# .. append the coresponding item from the input vector to the out vector
# .. and remove the value from the input vector
if(length(items[[i]]) == 1) {
out <- append(out, items[[i]])
# .. if the input vector has no value left fill with NA
items[[i]] <- NA
}
else {
out <- append(out, items[[i]][triplet.id[i,1]])
items[[i]] <- items[[i]][-triplet.id[i,1]]
}
}
# recurse until all vectors are empty (NA)
if(!prod(unlist(is.na(items)))) out <- append(list(out),
do.call("tripl", c(items), quote = F))
else(out <- list(out))
# return result
return(out)
}
The function can be called by passing the input vectors as arguments.
可以通过将输入向量作为参数传递来调用该函数。
# input vectors
a = c(3,5)
b = c(6,1,8,7)
c = c(4,2,9)
# find all the triplets using our function
y <- tripl(a,b,c)
The result is a list, which contains all the neccesary information, albeit unordered.
结果是一个列表,其中包含所有必要的信息,尽管是无序的。
print(y)
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 4 5 6
#
# [[3]]
# [1] 7 9 NA
#
# [[4]]
# [1] 8 NA NA
Ordering everything can be done using sapply()
:
订购一切都可以使用sapply()完成:
# put everything in order
sapply(y, function(x){x[order(x)]}) %>% t
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 4 5 6
# [3,] 7 9 NA
# [4,] 8 NA NA
The thing is, that it will use only one value per vector to find triplets. It will therefore not find the consecutive triplet c(6,7,8)
among e.g. c(6,7,11)
, c(8,9,13)
and c(10,12,14)
. In this instance it would return c(6,8,10)
(see below).
问题是,每个向量只使用一个值来查找三元组。因此,它将不会发现例如连续的三重态c(6,7,8)。 c(6,7,11),c(8,9,13)和c(10,12,14)。在这种情况下,它将返回c(6,8,10)(见下文)。
a<-c(6,7,11)
b<-c(8,9,13)
c<-c(10,12,14)
y <- tripl(a,b,c)
sapply(y, function(x){x[order(x)]}) %>% t
# [,1] [,2] [,3]
# [1,] 6 8 10
# [2,] 7 9 12
# [3,] 11 13 14
#1
7
Reorganize the data first into a list containing value and array number. Sort the list; you'd have smth like:
首先将数据重新组织到包含值和数组编号的列表中。对列表进行排序;你会喜欢:
1-2
2-3
3-1 (i.e. " there' s a three in array 1" )
4-3
5-1
6-2
7-2
8-2
9-3
Then loop the list, check if there are actually n consecutive numbers, then check if these had different array numbers
然后循环列表,检查实际上是否有n个连续数字,然后检查它们是否有不同的数组编号
#2
5
Here's one approach. This assumes there are no breaks in the sequence of observations in the number of groups. Here the data.
这是一种方法。这假设在组数的观察序列中没有中断。这里的数据。
N <- 3
a <- c(3,5)
b <- c(6,1,8,7)
c <- c(4,2,9)
Then i combine them together and order by the observations
然后我将它们组合在一起并按观察顺序排列
dd <- lattice::make.groups(a,b,c)
dd <- dd[order(dd$data),]
Now I look for rows in this table where all three groups are represented
现在我在这个表中查找表示所有三个组的行
idx <- apply(embed(as.numeric(dd$which),N), 1, function(x) {
length(unique(x))==N
})
Then we can see the triplets with
然后我们可以看到三胞胎
lapply(which(idx), function(i) {
dd[i:(i+N-1),]
})
# [[1]]
# data which
# b2 1 b
# c2 2 c
# a1 3 a
#
# [[2]]
# data which
# c1 4 c
# a2 5 a
# b1 6 b
#3
2
Here is a brute force method with expand.grid
and three vectors as in the example
这是一个使用expand.grid和示例中的三个向量的强力方法
# get all combinations
df <- expand.grid(a,b,c)
Using combn
to calculate difference for each pairwise combination.
使用combn计算每个成对组合的差异。
# get all parwise differences
myDiffs <- combn(names(df), 2, FUN=function(x) abs(x[1]-x[2]))
# subset data using `rowSums` and `which`
df[which(rowSums(myDiffs == 1) == ncol(myDiffs)-1), ]
df[which(rowSums(myDiffs == 1) == ncol(myDiffs)-1), ]
Var1 Var2 Var3
2 5 6 4
11 3 1 2
#4
1
I have hacked together a little recursive function that will find all the consecutive triplets amongst as many vectors as you pass it (need to pass at least three). It is probably a little crude, but seems to work.
我已经破解了一个小的递归函数,它会在你传递它的同时找到所有连续三元组中的所有连续三元组(需要传递至少三个)。它可能有点粗糙,但似乎有效。
The function uses the ellipsis, ...
, for passing arguments. Hence it will take however many arguments (i.e. numeric vectors) you provide and put them in the list items
. Then the smallest value amongst each passed vector is located, along with its index.
该函数使用省略号,...,用于传递参数。因此,您需要提供许多参数(即数字向量)并将它们放在列表项中。然后定位每个传递的矢量中的最小值及其索引。
Then the indeces of the vectors corresponding to the smallest triplet are created and iterated through using a for()
loop, where the output values are passed to the output vector out
. The input vectors in items
are pruned and passed again into the function in a recursive fashion. Only, when all vectors are NA
, i.e. there are no more values in the vectors, the function returns the final result.
然后,通过使用for()循环创建并迭代对应于最小三元组的向量的indeces,其中输出值被传递到输出向量out。项目中的输入向量被修剪并以递归方式再次传递到函数中。只有当所有向量都是NA时,即向量中没有更多值时,函数才返回最终结果。
library(magrittr)
# define function to find the triplets
tripl <- function(...){
items <- list(...)
# find the smallest number in each passed vector, along with its index
# output is a matrix of n-by-2, where n is the number of passed arguments
triplet.id <- lapply(items, function(x){
if(is.na(x) %>% prod) id <- c(NA, NA)
else id <- c(which(x == min(x)), x[which(x == min(x))])
}) %>% unlist %>% matrix(., ncol=2, byrow=T)
# find the smallest triplet from the passed vectors
index <- order(triplet.id[,2])[1:3]
# create empty vector for output
out <- vector()
# go through the smallest triplet's indices
for(i in index){
# .. append the coresponding item from the input vector to the out vector
# .. and remove the value from the input vector
if(length(items[[i]]) == 1) {
out <- append(out, items[[i]])
# .. if the input vector has no value left fill with NA
items[[i]] <- NA
}
else {
out <- append(out, items[[i]][triplet.id[i,1]])
items[[i]] <- items[[i]][-triplet.id[i,1]]
}
}
# recurse until all vectors are empty (NA)
if(!prod(unlist(is.na(items)))) out <- append(list(out),
do.call("tripl", c(items), quote = F))
else(out <- list(out))
# return result
return(out)
}
The function can be called by passing the input vectors as arguments.
可以通过将输入向量作为参数传递来调用该函数。
# input vectors
a = c(3,5)
b = c(6,1,8,7)
c = c(4,2,9)
# find all the triplets using our function
y <- tripl(a,b,c)
The result is a list, which contains all the neccesary information, albeit unordered.
结果是一个列表,其中包含所有必要的信息,尽管是无序的。
print(y)
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 4 5 6
#
# [[3]]
# [1] 7 9 NA
#
# [[4]]
# [1] 8 NA NA
Ordering everything can be done using sapply()
:
订购一切都可以使用sapply()完成:
# put everything in order
sapply(y, function(x){x[order(x)]}) %>% t
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 4 5 6
# [3,] 7 9 NA
# [4,] 8 NA NA
The thing is, that it will use only one value per vector to find triplets. It will therefore not find the consecutive triplet c(6,7,8)
among e.g. c(6,7,11)
, c(8,9,13)
and c(10,12,14)
. In this instance it would return c(6,8,10)
(see below).
问题是,每个向量只使用一个值来查找三元组。因此,它将不会发现例如连续的三重态c(6,7,8)。 c(6,7,11),c(8,9,13)和c(10,12,14)。在这种情况下,它将返回c(6,8,10)(见下文)。
a<-c(6,7,11)
b<-c(8,9,13)
c<-c(10,12,14)
y <- tripl(a,b,c)
sapply(y, function(x){x[order(x)]}) %>% t
# [,1] [,2] [,3]
# [1,] 6 8 10
# [2,] 7 9 12
# [3,] 11 13 14