如何在多个阵列中找到连续数字?

时间:2022-08-04 09:26:38

I right away give an example, now suppose I have 3 arrays a,b,c such as

我马上给出一个例子,现在假设我有3个阵列a,b,c等

a = c(3,5)
b = c(6,1,8,7)
c = c(4,2,9)

I must be able to extract consecutive triplets among them i,e.,

我必须能够在它们中提取连续的三元组,例如,

c(1,2,3),c(4,5,6)

But this was just an example, I would be having a larger data set with even more than 10 arrays, hence must be able to find the consecutive series of length ten.

但这仅仅是一个例子,我将拥有一个甚至超过10个数组的更大数据集,因此必须能够找到长度为10的连续系列。

So could anyone provide an algorithm, to generally find the consecutive series of length 'n' among 'n' arrays.

因此,任何人都可以提供算法,通常在'n'数组中找到连续的长度为'n'的系列。

I am actually doing this stuff in R, so its preferable if you give your code in R. Yet algorithm from any language is more than welcomed.

我实际上是在R中做这些东西,所以如果你用R代码你的代码它更可取。但是来自任何语言的算法都非常受欢迎。

4 个解决方案

#1


7  

Reorganize the data first into a list containing value and array number. Sort the list; you'd have smth like:

首先将数据重新组织到包含值和数组编号的列表中。对列表进行排序;你会喜欢:

1-2
2-3
3-1 (i.e. " there' s a three in array 1" )
4-3
5-1
6-2
7-2
8-2
9-3

Then loop the list, check if there are actually n consecutive numbers, then check if these had different array numbers

然后循环列表,检查实际上是否有n个连续数字,然后检查它们是否有不同的数组编号

#2


5  

Here's one approach. This assumes there are no breaks in the sequence of observations in the number of groups. Here the data.

这是一种方法。这假设在组数的观察序列中没有中断。这里的数据。

N <- 3
a <- c(3,5)
b <- c(6,1,8,7)
c <- c(4,2,9)

Then i combine them together and order by the observations

然后我将它们组合在一起并按观察顺序排列

dd <- lattice::make.groups(a,b,c)
dd <- dd[order(dd$data),]

Now I look for rows in this table where all three groups are represented

现在我在这个表中查找表示所有三个组的行

idx <- apply(embed(as.numeric(dd$which),N), 1, function(x) {
    length(unique(x))==N
})

Then we can see the triplets with

然后我们可以看到三胞胎

lapply(which(idx), function(i) {
    dd[i:(i+N-1),]
})

# [[1]]
#    data which
# b2    1     b
# c2    2     c
# a1    3     a
# 
# [[2]]
#    data which
# c1    4     c
# a2    5     a
# b1    6     b

#3


2  

Here is a brute force method with expand.grid and three vectors as in the example

这是一个使用expand.grid和示例中的三个向量的强力方法

# get all combinations
df <- expand.grid(a,b,c)

Using combn to calculate difference for each pairwise combination.

使用combn计算每个成对组合的差异。

# get all parwise differences
myDiffs <- combn(names(df), 2, FUN=function(x) abs(x[1]-x[2]))

# subset data using `rowSums` and `which`
df[which(rowSums(myDiffs == 1) == ncol(myDiffs)-1), ]

df[which(rowSums(myDiffs == 1) == ncol(myDiffs)-1), ]
   Var1 Var2 Var3
2     5    6    4
11    3    1    2

#4


1  

I have hacked together a little recursive function that will find all the consecutive triplets amongst as many vectors as you pass it (need to pass at least three). It is probably a little crude, but seems to work.

我已经破解了一个小的递归函数,它会在你传递它的同时找到所有连续三元组中的所有连续三元组(需要传递至少三个)。它可能有点粗糙,但似乎有效。

The function uses the ellipsis, ..., for passing arguments. Hence it will take however many arguments (i.e. numeric vectors) you provide and put them in the list items. Then the smallest value amongst each passed vector is located, along with its index.

该函数使用省略号,...,用于传递参数。因此,您需要提供许多参数(即数字向量)并将它们放在列表项中。然后定位每个传递的矢量中的最小值及其索引。

Then the indeces of the vectors corresponding to the smallest triplet are created and iterated through using a for() loop, where the output values are passed to the output vector out. The input vectors in items are pruned and passed again into the function in a recursive fashion. Only, when all vectors are NA, i.e. there are no more values in the vectors, the function returns the final result.

然后,通过使用for()循环创建并迭代对应于最小三元组的向量的indeces,其中输出值被传递到输出向量out。项目中的输入向量被修剪并以递归方式再次传递到函数中。只有当所有向量都是NA时,即向量中没有更多值时,函数才返回最终结果。

library(magrittr)

# define function to find the triplets
tripl <- function(...){
  items <- list(...)

  # find the smallest number in each passed vector, along with its index
  # output is a matrix of n-by-2, where n is the number of passed arguments
  triplet.id <- lapply(items, function(x){
    if(is.na(x) %>% prod) id <- c(NA, NA)
    else id <- c(which(x == min(x)), x[which(x == min(x))])
  }) %>% unlist %>% matrix(., ncol=2, byrow=T)


  # find the smallest triplet from the passed vectors
  index <- order(triplet.id[,2])[1:3]
  # create empty vector for output
  out <- vector()

  # go through the smallest triplet's indices
  for(i in index){
    # .. append the coresponding item from the input vector to the out vector
    # .. and remove the value from the input vector
    if(length(items[[i]]) == 1) {
      out <- append(out, items[[i]])
      # .. if the input vector has no value left fill with NA
      items[[i]] <- NA
    }
    else {
      out <- append(out, items[[i]][triplet.id[i,1]])
      items[[i]] <- items[[i]][-triplet.id[i,1]]
    }
  }

  # recurse until all vectors are empty (NA)
  if(!prod(unlist(is.na(items)))) out <- append(list(out), 
                                                do.call("tripl", c(items), quote = F))
  else(out <- list(out))

  # return result
  return(out)
}

The function can be called by passing the input vectors as arguments.

可以通过将输入向量作为参数传递来调用该函数。

# input vectors
a = c(3,5)
b = c(6,1,8,7)
c = c(4,2,9)

# find all the triplets using our function
y <- tripl(a,b,c) 

The result is a list, which contains all the neccesary information, albeit unordered.

结果是一个列表,其中包含所有必要的信息,尽管是无序的。

print(y)
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 4 5 6
# 
# [[3]]
# [1]  7  9 NA
#
# [[4]]
# [1]  8 NA NA

Ordering everything can be done using sapply():

订购一切都可以使用sapply()完成:

# put everything in order
sapply(y, function(x){x[order(x)]}) %>% t
#       [,1] [,2] [,3]
# [1,]    1    2    3
# [2,]    4    5    6
# [3,]    7    9   NA
# [4,]    8   NA   NA

The thing is, that it will use only one value per vector to find triplets. It will therefore not find the consecutive triplet c(6,7,8) among e.g. c(6,7,11), c(8,9,13) and c(10,12,14). In this instance it would return c(6,8,10) (see below).

问题是,每个向量只使用一个值来查找三元组。因此,它将不会发现例如连续的三重态c(6,7,8)。 c(6,7,11),c(8,9,13)和c(10,12,14)。在这种情况下,它将返回c(6,8,10)(见下文)。

a<-c(6,7,11)
b<-c(8,9,13)
c<-c(10,12,14)

y <- tripl(a,b,c)
sapply(y, function(x){x[order(x)]}) %>% t
#     [,1] [,2] [,3]
# [1,]    6    8   10
# [2,]    7    9   12
# [3,]   11   13   14

#1


7  

Reorganize the data first into a list containing value and array number. Sort the list; you'd have smth like:

首先将数据重新组织到包含值和数组编号的列表中。对列表进行排序;你会喜欢:

1-2
2-3
3-1 (i.e. " there' s a three in array 1" )
4-3
5-1
6-2
7-2
8-2
9-3

Then loop the list, check if there are actually n consecutive numbers, then check if these had different array numbers

然后循环列表,检查实际上是否有n个连续数字,然后检查它们是否有不同的数组编号

#2


5  

Here's one approach. This assumes there are no breaks in the sequence of observations in the number of groups. Here the data.

这是一种方法。这假设在组数的观察序列中没有中断。这里的数据。

N <- 3
a <- c(3,5)
b <- c(6,1,8,7)
c <- c(4,2,9)

Then i combine them together and order by the observations

然后我将它们组合在一起并按观察顺序排列

dd <- lattice::make.groups(a,b,c)
dd <- dd[order(dd$data),]

Now I look for rows in this table where all three groups are represented

现在我在这个表中查找表示所有三个组的行

idx <- apply(embed(as.numeric(dd$which),N), 1, function(x) {
    length(unique(x))==N
})

Then we can see the triplets with

然后我们可以看到三胞胎

lapply(which(idx), function(i) {
    dd[i:(i+N-1),]
})

# [[1]]
#    data which
# b2    1     b
# c2    2     c
# a1    3     a
# 
# [[2]]
#    data which
# c1    4     c
# a2    5     a
# b1    6     b

#3


2  

Here is a brute force method with expand.grid and three vectors as in the example

这是一个使用expand.grid和示例中的三个向量的强力方法

# get all combinations
df <- expand.grid(a,b,c)

Using combn to calculate difference for each pairwise combination.

使用combn计算每个成对组合的差异。

# get all parwise differences
myDiffs <- combn(names(df), 2, FUN=function(x) abs(x[1]-x[2]))

# subset data using `rowSums` and `which`
df[which(rowSums(myDiffs == 1) == ncol(myDiffs)-1), ]

df[which(rowSums(myDiffs == 1) == ncol(myDiffs)-1), ]
   Var1 Var2 Var3
2     5    6    4
11    3    1    2

#4


1  

I have hacked together a little recursive function that will find all the consecutive triplets amongst as many vectors as you pass it (need to pass at least three). It is probably a little crude, but seems to work.

我已经破解了一个小的递归函数,它会在你传递它的同时找到所有连续三元组中的所有连续三元组(需要传递至少三个)。它可能有点粗糙,但似乎有效。

The function uses the ellipsis, ..., for passing arguments. Hence it will take however many arguments (i.e. numeric vectors) you provide and put them in the list items. Then the smallest value amongst each passed vector is located, along with its index.

该函数使用省略号,...,用于传递参数。因此,您需要提供许多参数(即数字向量)并将它们放在列表项中。然后定位每个传递的矢量中的最小值及其索引。

Then the indeces of the vectors corresponding to the smallest triplet are created and iterated through using a for() loop, where the output values are passed to the output vector out. The input vectors in items are pruned and passed again into the function in a recursive fashion. Only, when all vectors are NA, i.e. there are no more values in the vectors, the function returns the final result.

然后,通过使用for()循环创建并迭代对应于最小三元组的向量的indeces,其中输出值被传递到输出向量out。项目中的输入向量被修剪并以递归方式再次传递到函数中。只有当所有向量都是NA时,即向量中没有更多值时,函数才返回最终结果。

library(magrittr)

# define function to find the triplets
tripl <- function(...){
  items <- list(...)

  # find the smallest number in each passed vector, along with its index
  # output is a matrix of n-by-2, where n is the number of passed arguments
  triplet.id <- lapply(items, function(x){
    if(is.na(x) %>% prod) id <- c(NA, NA)
    else id <- c(which(x == min(x)), x[which(x == min(x))])
  }) %>% unlist %>% matrix(., ncol=2, byrow=T)


  # find the smallest triplet from the passed vectors
  index <- order(triplet.id[,2])[1:3]
  # create empty vector for output
  out <- vector()

  # go through the smallest triplet's indices
  for(i in index){
    # .. append the coresponding item from the input vector to the out vector
    # .. and remove the value from the input vector
    if(length(items[[i]]) == 1) {
      out <- append(out, items[[i]])
      # .. if the input vector has no value left fill with NA
      items[[i]] <- NA
    }
    else {
      out <- append(out, items[[i]][triplet.id[i,1]])
      items[[i]] <- items[[i]][-triplet.id[i,1]]
    }
  }

  # recurse until all vectors are empty (NA)
  if(!prod(unlist(is.na(items)))) out <- append(list(out), 
                                                do.call("tripl", c(items), quote = F))
  else(out <- list(out))

  # return result
  return(out)
}

The function can be called by passing the input vectors as arguments.

可以通过将输入向量作为参数传递来调用该函数。

# input vectors
a = c(3,5)
b = c(6,1,8,7)
c = c(4,2,9)

# find all the triplets using our function
y <- tripl(a,b,c) 

The result is a list, which contains all the neccesary information, albeit unordered.

结果是一个列表,其中包含所有必要的信息,尽管是无序的。

print(y)
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 4 5 6
# 
# [[3]]
# [1]  7  9 NA
#
# [[4]]
# [1]  8 NA NA

Ordering everything can be done using sapply():

订购一切都可以使用sapply()完成:

# put everything in order
sapply(y, function(x){x[order(x)]}) %>% t
#       [,1] [,2] [,3]
# [1,]    1    2    3
# [2,]    4    5    6
# [3,]    7    9   NA
# [4,]    8   NA   NA

The thing is, that it will use only one value per vector to find triplets. It will therefore not find the consecutive triplet c(6,7,8) among e.g. c(6,7,11), c(8,9,13) and c(10,12,14). In this instance it would return c(6,8,10) (see below).

问题是,每个向量只使用一个值来查找三元组。因此,它将不会发现例如连续的三重态c(6,7,8)。 c(6,7,11),c(8,9,13)和c(10,12,14)。在这种情况下,它将返回c(6,8,10)(见下文)。

a<-c(6,7,11)
b<-c(8,9,13)
c<-c(10,12,14)

y <- tripl(a,b,c)
sapply(y, function(x){x[order(x)]}) %>% t
#     [,1] [,2] [,3]
# [1,]    6    8   10
# [2,]    7    9   12
# [3,]   11   13   14