R - 在数据帧列表中拆分字符串

时间:2021-09-08 18:34:44

I have never worked with lists of dataframes in R before. Maybe it's not even complicated, but I can't help myself right now.

我以前从未使用过R中的数据帧列表。也许它甚至不复杂,但我现在无法帮助自己。

So I got a list of dataframes

所以我得到了一个数据帧列表

df1 <- data.frame(v5 = c(0.5,0.6,0.7,0.96),v6 = c("Tiny|Marsian|Worker", "Tiny|Human|Student", "Tiny|Goblin|Soldier", "Tiny|Horse|Guardian"))
df2 <- data.frame(v5 = c(0.56,0.32,0.55),v6 = c("Tiny|Human|Worker", "Tiny|Marsian|Student", "Tiny|Goblin|Soldier"))

ldf <- list(df1,df2)

Each dataframe contains 6 columns (in this case only 2) and the number of rows differs in each df. Column V6 contains three different information, each seperated by a "pipe" | What I now need to do is to split these information by the "pipe" and make three individual columns out of it. As I would get it for a single df out of

每个数据帧包含6列(在这种情况下仅为2列),并且每个df中的行数不同。列V6包含三个不同的信息,每个信息由“管道”|分隔我现在需要做的是通过“管道”拆分这些信息,并从中划出三个单独的列。因为我会得到一个单独的df

library(stringr)
split = str_split_fixed(string = df1$v6, pattern = "\\|", n = 3)

And after that I'd like to append the information which now ends up in column 2 back to the individual dataframes of ldf

之后,我想将第2列中的信息追加到ldf的各个数据帧中。

In the end I want my dataframes to look like this

最后,我希望我的数据帧看起来像这样

    df1 <- data.frame(v5 = c(0.5,0.6,0.7,0.96),
v6 = c("Tiny|Marsian|Worker", "Tiny|Human|Student", "Tiny|Goblin|Soldier", "Tiny|Horse|Guardian"), 
v7=c("Marsian","Human","Goblin","Horse"))
    df2 <- data.frame(v5 = c(0.56,0.32,0.55),
v6 = c("Tiny|Human|Worker", "Tiny|Marsian|Student", "Tiny|Goblin|Soldier", 
v7 = c("Human", "Marsian", "Goblin")))

How do I achieve this? I already tried several thing with

我该如何实现这一目标?我已经尝试了几件事

x <- lapply(ldf, `[`, 6)

but have issues when using splitfuctions! Please help me

但在使用splitfuctions时有问题!请帮帮我

2 个解决方案

#1


0  

With dplyr and purrr:

使用dplyr和purrr:

library('dplyr')
library('purrr')
ldf2 <- map(ldf, mutate, v7 = str_split_fixed(string = v6, pattern = "\\|", n = 3)[, 2])

ldf2

[[1]]
   v5                  v6      v7
1 0.5 Tiny|Marsian|Worker Marsian
2 0.6  Tiny|Human|Student   Human
3 0.7 Tiny|Goblin|Soldier  Goblin

[[2]]
    v5                   v6      v7
1 0.56    Tiny|Human|Worker   Human
2 0.32 Tiny|Marsian|Student Marsian
3 0.55  Tiny|Goblin|Soldier  Goblin

mutate() adds new column to data.frame based on string splitting, andmap() is applying this mutate() to every element of ldf.

mutate()基于字符串拆分向data.frame添加新列,而map()将此mutate()应用于ldf的每个元素。

EDIT:

If you want three different columns, you shoul use:

如果你想要三个不同的列,你应该使用:

ldf2 <- map(ldf, separate, col = 'v6', into = c('Col1', 'Col2', 'Col3'), sep = '\\|')

#2


0  

With lapply,tidy::separate and do.call functions you could do:

使用lapply,tidy :: separate和do.call函数,您可以执行以下操作:

combinedDF = do.call(rbind,lapply(ldf,function(x) { 

x %>% 
tidyr::separate(v6,c("v70","v7","v72"), sep = "\\|", remove=FALSE) %>%
dplyr::select(-c(v70,v72))

}))

Without lapply/rbind (thanks to @Sotos)

没有lapply / rbind(感谢@Sotos)

bind_rows(ldf) %>% 
tidyr::separate(v6,c("v70","v7","v72"), sep = "\\|", remove=FALSE) %>% 
select(-c(v70, v72))


combinedDF
#    v5                   v6      v7
#1 0.50  Tiny|Marsian|Worker Marsian
#2 0.60   Tiny|Human|Student   Human
#3 0.70  Tiny|Goblin|Soldier  Goblin
#4 0.56    Tiny|Human|Worker   Human
#5 0.32 Tiny|Marsian|Student Marsian
#6 0.55  Tiny|Goblin|Soldier  Goblin

#1


0  

With dplyr and purrr:

使用dplyr和purrr:

library('dplyr')
library('purrr')
ldf2 <- map(ldf, mutate, v7 = str_split_fixed(string = v6, pattern = "\\|", n = 3)[, 2])

ldf2

[[1]]
   v5                  v6      v7
1 0.5 Tiny|Marsian|Worker Marsian
2 0.6  Tiny|Human|Student   Human
3 0.7 Tiny|Goblin|Soldier  Goblin

[[2]]
    v5                   v6      v7
1 0.56    Tiny|Human|Worker   Human
2 0.32 Tiny|Marsian|Student Marsian
3 0.55  Tiny|Goblin|Soldier  Goblin

mutate() adds new column to data.frame based on string splitting, andmap() is applying this mutate() to every element of ldf.

mutate()基于字符串拆分向data.frame添加新列,而map()将此mutate()应用于ldf的每个元素。

EDIT:

If you want three different columns, you shoul use:

如果你想要三个不同的列,你应该使用:

ldf2 <- map(ldf, separate, col = 'v6', into = c('Col1', 'Col2', 'Col3'), sep = '\\|')

#2


0  

With lapply,tidy::separate and do.call functions you could do:

使用lapply,tidy :: separate和do.call函数,您可以执行以下操作:

combinedDF = do.call(rbind,lapply(ldf,function(x) { 

x %>% 
tidyr::separate(v6,c("v70","v7","v72"), sep = "\\|", remove=FALSE) %>%
dplyr::select(-c(v70,v72))

}))

Without lapply/rbind (thanks to @Sotos)

没有lapply / rbind(感谢@Sotos)

bind_rows(ldf) %>% 
tidyr::separate(v6,c("v70","v7","v72"), sep = "\\|", remove=FALSE) %>% 
select(-c(v70, v72))


combinedDF
#    v5                   v6      v7
#1 0.50  Tiny|Marsian|Worker Marsian
#2 0.60   Tiny|Human|Student   Human
#3 0.70  Tiny|Goblin|Soldier  Goblin
#4 0.56    Tiny|Human|Worker   Human
#5 0.32 Tiny|Marsian|Student Marsian
#6 0.55  Tiny|Goblin|Soldier  Goblin