R在不同长度的向量中分割字符串

时间:2021-04-25 21:41:16

I have a problem in R trying to split a vector of strings into a vector of vectors. If anyone can help me, please I am stuck.

我在R中试图将字符串向量分割为向量向量时遇到问题。如果有人可以帮助我,请我被困住。

I have:

V <- c("AAAAA", "AAAAA BBBBB", "CCCCC DDDDD")

Using strsplit I get:

使用strsplit我得到:

s <- strplit(v)
s
[[1]]
[1] "AAAAA"

[[2]]
[1] "AAAAA" "BBBBB"

[[3]]
[1] "CCCCC" "DDDDD" 

However I cannot access these to compare them. I would like something like:

但是我无法访问这些来比较它们。我想要像:

 s
[1] "AAAAA"
[2] "AAAAA" "BBBBB"
[3] "CCCCC" "DDDDD" 

I would then like to see if the elements of each of these vectors are included in my validation vector (like c("AAAAA", "BBBBB, "CCCCC") and return a boolean at the end (TRUE if all elements are in, FALSE otherwise). For now my problem is getting those vectors. Any suggestion is welcome.

然后我想看看每个向量的元素是否包含在我的验证向量中(如c(“AAAAA”,“BBBBB,”CCCCC“)并在结尾返回一个布尔值(如果所有元素都在,则为TRUE,否则为FALSE。。现在我的问题是获得这些向量。欢迎提出任何建议。

5 个解决方案

#1


1  

using tidyverse, you could go with

使用tidyverse,你可以去

V <- c("AAAAA", "AAAAA BBBBB", "CCCCC DDDDD")
validation <- c("AAAAA", "BBBBB", "CCCCC")

library(purrr)
library(stringr)
str_split(V, pattern = " ") %>% 
  map_lgl(~all(.x %in% validation))
#> [1]  TRUE  TRUE FALSE

You could also include this with dplyr to obtain a clear summary of which vector is validated or not.

您还可以使用dplyr将其包括在内,以获得有效验证哪个向量的清晰摘要。

library(dplyr, warn.conflicts=F)
data_frame(V) %>%
  mutate(validate = str_split(V, pattern = " ") %>% 
           map_lgl(~all(.x %in% validation)))
#> # A tibble: 3 x 2
#>             V validate
#>         <chr>    <lgl>
#> 1       AAAAA     TRUE
#> 2 AAAAA BBBBB     TRUE
#> 3 CCCCC DDDDD    FALSE

#2


3  

strsplit returns a list you can go trough it by using lapply with a custom function

strsplit通过使用自定义函数的lapply返回一个列表,你可以通过它

V <- c("AAAAA", "AAAAA BBBBB", "CCCCC DDDDD")
s <- strsplit(V, split = " ")
val <- c("AAAAA", "BBBBB", "CCCCC")

lapply(s, function(x) x %in% val)

you can access list elements like this:

你可以访问这样的列表元素:

s[[1]]
s[[2]]

to check if all elements are present in val

检查val中是否存在所有元素

all <- lapply(s, function(x) sum(x %in% val) == length(val))
#output 
[[1]]
[1] FALSE

[[2]]
[1] FALSE

[[3]]
[1] FALSE

to convert this list to a vector

将此列表转换为向量

all <- unlist(all)

to return the original elements from V

从V返回原始元素

v[all]

#3


0  

R does not have a vector of vectors.

R没有矢量矢量。

To emulate this behavior you would usually use lists and the apply-family.

要模拟此行为,通常会使用列表和apply-family。


input_vector <- c("AAAAA", "AAAAA BBBBB", "CCCCC DDDDD")

# split the string like you did
s <- strsplit(input_vector, split = " ")
s
#> [[1]]
#> [1] "AAAAA"
#> 
#> [[2]]
#> [1] "AAAAA" "BBBBB"
#> 
#> [[3]]
#> [1] "CCCCC" "DDDDD"

# create a vector with conditions that wee look for
validation_vector <- c("AAAAA", "BBBBB")

# create a matrix of matches
res_matrix <- sapply(s, function(s_part) {
  validation_vector %in% s_part
})

# check if all validation_vector elements are true for a given input_vector-string
# by applying the 'all'-function over each column ("are all elements for a given column TRUE?")
res_vector <- apply(res_matrix, 2, all)
# for aesthetic purposes: add the name of the initial input_vector again
names(res_vector) <- input_vector

# display the result
res_vector
#>       AAAAA AAAAA BBBBB CCCCC DDDDD 
#>       FALSE        TRUE       FALSE

#4


0  

You can have a look at the *apply family of functions. For example, using sapply to apply the strsplit function to each of your list elements you get

您可以查看* apply系列函数。例如,使用sapply将strsplit函数应用于您获得的每个列表元素

vs <- sapply(V, strsplit, split = " ")

vs

$AAAAA
[1] "AAAAA"

$`AAAAA BBBBB`
[1] "AAAAA" "BBBBB"

$`CCCCC DDDDD`
[1] "CCCCC" "DDDDD"

Further to check against you validation vector you can do

进一步检查你的验证向量,你可以做

validation <- c("AAAAA", "BBBBB", "CCCCC")
vs_in_val <- sapply(vs, `%in%`, validation) 

vs_in_val

$AAAAA
[1] TRUE

$`AAAAA BBBBB`
[1] TRUE TRUE

$`CCCCC DDDDD`
[1]  TRUE FALSE

#5


0  

strsplit can help you do it if you combine it with 'lapply'.

strsplit可以帮助你做到这一点,如果你把它与'lapply'结合起来。

V <- c("AAAAA", "AAAAA BBBBB", "CCCCC DDDDD")
s <- strsplit(V," ")
sapply(s,function(x) return (sum(x %in% c("AAAAA", "BBBBB", "CCCCC"))/length(x)))
[1] 1.0 1.0 0.5

If the result returns 0,then it indicates that there is none of elements in your validation vectors.

如果结果返回0,则表示验证向量中没有元素。

If 1, all of elements in your validation vector.

如果为1,则验证向量中的所有元素。

if between 0 and 1,there is some of elements in your validation vector.

如果介于0和1之间,则验证向量中会包含一些元素。

#1


1  

using tidyverse, you could go with

使用tidyverse,你可以去

V <- c("AAAAA", "AAAAA BBBBB", "CCCCC DDDDD")
validation <- c("AAAAA", "BBBBB", "CCCCC")

library(purrr)
library(stringr)
str_split(V, pattern = " ") %>% 
  map_lgl(~all(.x %in% validation))
#> [1]  TRUE  TRUE FALSE

You could also include this with dplyr to obtain a clear summary of which vector is validated or not.

您还可以使用dplyr将其包括在内,以获得有效验证哪个向量的清晰摘要。

library(dplyr, warn.conflicts=F)
data_frame(V) %>%
  mutate(validate = str_split(V, pattern = " ") %>% 
           map_lgl(~all(.x %in% validation)))
#> # A tibble: 3 x 2
#>             V validate
#>         <chr>    <lgl>
#> 1       AAAAA     TRUE
#> 2 AAAAA BBBBB     TRUE
#> 3 CCCCC DDDDD    FALSE

#2


3  

strsplit returns a list you can go trough it by using lapply with a custom function

strsplit通过使用自定义函数的lapply返回一个列表,你可以通过它

V <- c("AAAAA", "AAAAA BBBBB", "CCCCC DDDDD")
s <- strsplit(V, split = " ")
val <- c("AAAAA", "BBBBB", "CCCCC")

lapply(s, function(x) x %in% val)

you can access list elements like this:

你可以访问这样的列表元素:

s[[1]]
s[[2]]

to check if all elements are present in val

检查val中是否存在所有元素

all <- lapply(s, function(x) sum(x %in% val) == length(val))
#output 
[[1]]
[1] FALSE

[[2]]
[1] FALSE

[[3]]
[1] FALSE

to convert this list to a vector

将此列表转换为向量

all <- unlist(all)

to return the original elements from V

从V返回原始元素

v[all]

#3


0  

R does not have a vector of vectors.

R没有矢量矢量。

To emulate this behavior you would usually use lists and the apply-family.

要模拟此行为,通常会使用列表和apply-family。


input_vector <- c("AAAAA", "AAAAA BBBBB", "CCCCC DDDDD")

# split the string like you did
s <- strsplit(input_vector, split = " ")
s
#> [[1]]
#> [1] "AAAAA"
#> 
#> [[2]]
#> [1] "AAAAA" "BBBBB"
#> 
#> [[3]]
#> [1] "CCCCC" "DDDDD"

# create a vector with conditions that wee look for
validation_vector <- c("AAAAA", "BBBBB")

# create a matrix of matches
res_matrix <- sapply(s, function(s_part) {
  validation_vector %in% s_part
})

# check if all validation_vector elements are true for a given input_vector-string
# by applying the 'all'-function over each column ("are all elements for a given column TRUE?")
res_vector <- apply(res_matrix, 2, all)
# for aesthetic purposes: add the name of the initial input_vector again
names(res_vector) <- input_vector

# display the result
res_vector
#>       AAAAA AAAAA BBBBB CCCCC DDDDD 
#>       FALSE        TRUE       FALSE

#4


0  

You can have a look at the *apply family of functions. For example, using sapply to apply the strsplit function to each of your list elements you get

您可以查看* apply系列函数。例如,使用sapply将strsplit函数应用于您获得的每个列表元素

vs <- sapply(V, strsplit, split = " ")

vs

$AAAAA
[1] "AAAAA"

$`AAAAA BBBBB`
[1] "AAAAA" "BBBBB"

$`CCCCC DDDDD`
[1] "CCCCC" "DDDDD"

Further to check against you validation vector you can do

进一步检查你的验证向量,你可以做

validation <- c("AAAAA", "BBBBB", "CCCCC")
vs_in_val <- sapply(vs, `%in%`, validation) 

vs_in_val

$AAAAA
[1] TRUE

$`AAAAA BBBBB`
[1] TRUE TRUE

$`CCCCC DDDDD`
[1]  TRUE FALSE

#5


0  

strsplit can help you do it if you combine it with 'lapply'.

strsplit可以帮助你做到这一点,如果你把它与'lapply'结合起来。

V <- c("AAAAA", "AAAAA BBBBB", "CCCCC DDDDD")
s <- strsplit(V," ")
sapply(s,function(x) return (sum(x %in% c("AAAAA", "BBBBB", "CCCCC"))/length(x)))
[1] 1.0 1.0 0.5

If the result returns 0,then it indicates that there is none of elements in your validation vectors.

如果结果返回0,则表示验证向量中没有元素。

If 1, all of elements in your validation vector.

如果为1,则验证向量中的所有元素。

if between 0 and 1,there is some of elements in your validation vector.

如果介于0和1之间,则验证向量中会包含一些元素。