I have never worked with lists of dataframes in R before. Maybe it's not even complicated, but I can't help myself right now.
我以前从未使用过R中的数据帧列表。也许它甚至不复杂,但我现在无法帮助自己。
So I got a list of dataframes
所以我得到了一个数据帧列表
df1 <- data.frame(v5 = c(0.5,0.6,0.7,0.96),v6 = c("Tiny|Marsian|Worker", "Tiny|Human|Student", "Tiny|Goblin|Soldier", "Tiny|Horse|Guardian"))
df2 <- data.frame(v5 = c(0.56,0.32,0.55),v6 = c("Tiny|Human|Worker", "Tiny|Marsian|Student", "Tiny|Goblin|Soldier"))
ldf <- list(df1,df2)
Each dataframe contains 6 columns (in this case only 2) and the number of rows differs in each df. Column V6 contains three different information, each seperated by a "pipe" | What I now need to do is to split these information by the "pipe" and make three individual columns out of it. As I would get it for a single df out of
每个数据帧包含6列(在这种情况下仅为2列),并且每个df中的行数不同。列V6包含三个不同的信息,每个信息由“管道”|分隔我现在需要做的是通过“管道”拆分这些信息,并从中划出三个单独的列。因为我会得到一个单独的df
library(stringr)
split = str_split_fixed(string = df1$v6, pattern = "\\|", n = 3)
And after that I'd like to append the information which now ends up in column 2 back to the individual dataframes of ldf
之后,我想将第2列中的信息追加到ldf的各个数据帧中。
In the end I want my dataframes to look like this
最后,我希望我的数据帧看起来像这样
df1 <- data.frame(v5 = c(0.5,0.6,0.7,0.96),
v6 = c("Tiny|Marsian|Worker", "Tiny|Human|Student", "Tiny|Goblin|Soldier", "Tiny|Horse|Guardian"),
v7=c("Marsian","Human","Goblin","Horse"))
df2 <- data.frame(v5 = c(0.56,0.32,0.55),
v6 = c("Tiny|Human|Worker", "Tiny|Marsian|Student", "Tiny|Goblin|Soldier",
v7 = c("Human", "Marsian", "Goblin")))
How do I achieve this? I already tried several thing with
我该如何实现这一目标?我已经尝试了几件事
x <- lapply(ldf, `[`, 6)
but have issues when using splitfuctions! Please help me
但在使用splitfuctions时有问题!请帮帮我
2 个解决方案
#1
0
With dplyr
and purrr
:
使用dplyr和purrr:
library('dplyr')
library('purrr')
ldf2 <- map(ldf, mutate, v7 = str_split_fixed(string = v6, pattern = "\\|", n = 3)[, 2])
ldf2
[[1]]
v5 v6 v7
1 0.5 Tiny|Marsian|Worker Marsian
2 0.6 Tiny|Human|Student Human
3 0.7 Tiny|Goblin|Soldier Goblin
[[2]]
v5 v6 v7
1 0.56 Tiny|Human|Worker Human
2 0.32 Tiny|Marsian|Student Marsian
3 0.55 Tiny|Goblin|Soldier Goblin
mutate()
adds new column to data.frame
based on string splitting, andmap()
is applying this mutate()
to every element of ldf
.
mutate()基于字符串拆分向data.frame添加新列,而map()将此mutate()应用于ldf的每个元素。
EDIT:
If you want three different columns, you shoul use:
如果你想要三个不同的列,你应该使用:
ldf2 <- map(ldf, separate, col = 'v6', into = c('Col1', 'Col2', 'Col3'), sep = '\\|')
#2
0
With lapply
,tidy::separate
and do.call
functions you could do:
使用lapply,tidy :: separate和do.call函数,您可以执行以下操作:
combinedDF = do.call(rbind,lapply(ldf,function(x) {
x %>%
tidyr::separate(v6,c("v70","v7","v72"), sep = "\\|", remove=FALSE) %>%
dplyr::select(-c(v70,v72))
}))
Without lapply/rbind
(thanks to @Sotos)
没有lapply / rbind(感谢@Sotos)
bind_rows(ldf) %>%
tidyr::separate(v6,c("v70","v7","v72"), sep = "\\|", remove=FALSE) %>%
select(-c(v70, v72))
combinedDF
# v5 v6 v7
#1 0.50 Tiny|Marsian|Worker Marsian
#2 0.60 Tiny|Human|Student Human
#3 0.70 Tiny|Goblin|Soldier Goblin
#4 0.56 Tiny|Human|Worker Human
#5 0.32 Tiny|Marsian|Student Marsian
#6 0.55 Tiny|Goblin|Soldier Goblin
#1
0
With dplyr
and purrr
:
使用dplyr和purrr:
library('dplyr')
library('purrr')
ldf2 <- map(ldf, mutate, v7 = str_split_fixed(string = v6, pattern = "\\|", n = 3)[, 2])
ldf2
[[1]]
v5 v6 v7
1 0.5 Tiny|Marsian|Worker Marsian
2 0.6 Tiny|Human|Student Human
3 0.7 Tiny|Goblin|Soldier Goblin
[[2]]
v5 v6 v7
1 0.56 Tiny|Human|Worker Human
2 0.32 Tiny|Marsian|Student Marsian
3 0.55 Tiny|Goblin|Soldier Goblin
mutate()
adds new column to data.frame
based on string splitting, andmap()
is applying this mutate()
to every element of ldf
.
mutate()基于字符串拆分向data.frame添加新列,而map()将此mutate()应用于ldf的每个元素。
EDIT:
If you want three different columns, you shoul use:
如果你想要三个不同的列,你应该使用:
ldf2 <- map(ldf, separate, col = 'v6', into = c('Col1', 'Col2', 'Col3'), sep = '\\|')
#2
0
With lapply
,tidy::separate
and do.call
functions you could do:
使用lapply,tidy :: separate和do.call函数,您可以执行以下操作:
combinedDF = do.call(rbind,lapply(ldf,function(x) {
x %>%
tidyr::separate(v6,c("v70","v7","v72"), sep = "\\|", remove=FALSE) %>%
dplyr::select(-c(v70,v72))
}))
Without lapply/rbind
(thanks to @Sotos)
没有lapply / rbind(感谢@Sotos)
bind_rows(ldf) %>%
tidyr::separate(v6,c("v70","v7","v72"), sep = "\\|", remove=FALSE) %>%
select(-c(v70, v72))
combinedDF
# v5 v6 v7
#1 0.50 Tiny|Marsian|Worker Marsian
#2 0.60 Tiny|Human|Student Human
#3 0.70 Tiny|Goblin|Soldier Goblin
#4 0.56 Tiny|Human|Worker Human
#5 0.32 Tiny|Marsian|Student Marsian
#6 0.55 Tiny|Goblin|Soldier Goblin