My aim is to join a dataframe to a dataframes held within a nested list-column, eg:
我的目标是将数据框连接到嵌套列表列中的数据框,例如:
data(mtcars)
library(tidyr)
library(purrr)
mtcars_nest <- mtcars %>% rownames_to_column() %>% rename(rowname_1 = rowname) %>% select(-mpg) %>% group_by(cyl) %>% nest()
mtcars_mpg <- mtcars %>% rownames_to_column() %>% rename(rowname_2 = rowname) %>% select(rowname_2, mpg)
join_df <- function(df_nest, df_other) {
df_all <- df_nest %>% inner_join(df_other, by = c("rowname_1" = "rowname_2"))
}
join_df <- mtcars_nest %>%
mutate(new_mpg = map_df(data, join_df(., mtcars_mpg)))
This returns the following error:
这将返回以下错误:
# Error in mutate_impl(.data, dots) : Evaluation error: `by` can't contain join column `rowname_1` which is missing from LHS.
So the dataframe map_*
receives from the nested input isn't offering a column name (ie rowname_1
) to take part in the join. I can't work out why this is the case. I'm passing the data
column that contains dataframes from the nested dataframe. I want a dataframe output that can be added to a new column in the input nested dataframe, eg
因此,从嵌套输入接收的数据帧map_ *不提供参与连接的列名(即rowname_1)。我无法弄清楚为什么会这样。我正在传递包含嵌套数据帧中的数据帧的数据列。我想要一个可以添加到输入嵌套数据帧中的新列的数据帧输出,例如
| rowname_1 | cyl | disp |...|mpg|
|:----------|:----|:-----|:--|:--|
1 个解决方案
#1
4
A couple things:
几件事:
- you should use the tilde to functionize (in
purrr
) the function argument tomap*
; and - I think you should be using
map
instead ofmap_df
, and though I cannot find exactly whymap_df
doesn't work right, I can get what I think is your desired behavior without it.
你应该使用代字号来函数化(在purrr中)函数参数来映射*;和
我认为你应该使用map而不是map_df,虽然我无法确切地找到为什么map_df无法正常工作,但我可以得到我认为没有它你想要的行为。
Minor point:
- you assign to
df_all
withinjoin_df()
, and the only reason it is working is because that assignment invisibly returns what you assigned todf_all
; I suggest you should be explicit: either follow-up withreturn(df_all)
or just don't assign it, end withinner_join(...)
.
你在join_df()中分配给df_all,它唯一的工作原因是因为该赋值无形地返回你赋给df_all的内容;我建议你应该明确:要么跟着返回(df_all),要么就是不要分配它,以inner_join(...)结尾。
Try this:
library(tibble) # rownames_to_column
library(dplyr)
library(tidyr) # nest
library(purrr)
join_df <- function(df_nest, df_other) {
df_all <- inner_join(df_nest, df_other, by = c("rowname_1" = "rowname_2"))
return(df_all)
}
mtcars_nest %>%
mutate(new_mpg = map(data, ~ join_df(., mtcars_mpg)))
# # A tibble: 3 x 3
# cyl data new_mpg
# <dbl> <list> <list>
# 1 6. <tibble [7 x 10]> <tibble [7 x 11]>
# 2 4. <tibble [11 x 10]> <tibble [11 x 11]>
# 3 8. <tibble [14 x 10]> <tibble [14 x 11]>
The new_mpg
is effectively the data
column with one additional column. Since we know that we have full redundancy, you can always over-write (or remove) data
:
new_mpg实际上是一个附加列的数据列。由于我们知道我们有完全冗余,因此您可以随时覆盖(或删除)数据:
mtcars_nest %>%
mutate(data = map(data, ~ join_df(., mtcars_mpg)))
# # A tibble: 3 x 2
# cyl data
# <dbl> <list>
# 1 6. <tibble [7 x 11]>
# 2 4. <tibble [11 x 11]>
# 3 8. <tibble [14 x 11]>
and get your nested and now augmented frames.
并获得嵌套和现在增强的帧。
#1
4
A couple things:
几件事:
- you should use the tilde to functionize (in
purrr
) the function argument tomap*
; and - I think you should be using
map
instead ofmap_df
, and though I cannot find exactly whymap_df
doesn't work right, I can get what I think is your desired behavior without it.
你应该使用代字号来函数化(在purrr中)函数参数来映射*;和
我认为你应该使用map而不是map_df,虽然我无法确切地找到为什么map_df无法正常工作,但我可以得到我认为没有它你想要的行为。
Minor point:
- you assign to
df_all
withinjoin_df()
, and the only reason it is working is because that assignment invisibly returns what you assigned todf_all
; I suggest you should be explicit: either follow-up withreturn(df_all)
or just don't assign it, end withinner_join(...)
.
你在join_df()中分配给df_all,它唯一的工作原因是因为该赋值无形地返回你赋给df_all的内容;我建议你应该明确:要么跟着返回(df_all),要么就是不要分配它,以inner_join(...)结尾。
Try this:
library(tibble) # rownames_to_column
library(dplyr)
library(tidyr) # nest
library(purrr)
join_df <- function(df_nest, df_other) {
df_all <- inner_join(df_nest, df_other, by = c("rowname_1" = "rowname_2"))
return(df_all)
}
mtcars_nest %>%
mutate(new_mpg = map(data, ~ join_df(., mtcars_mpg)))
# # A tibble: 3 x 3
# cyl data new_mpg
# <dbl> <list> <list>
# 1 6. <tibble [7 x 10]> <tibble [7 x 11]>
# 2 4. <tibble [11 x 10]> <tibble [11 x 11]>
# 3 8. <tibble [14 x 10]> <tibble [14 x 11]>
The new_mpg
is effectively the data
column with one additional column. Since we know that we have full redundancy, you can always over-write (or remove) data
:
new_mpg实际上是一个附加列的数据列。由于我们知道我们有完全冗余,因此您可以随时覆盖(或删除)数据:
mtcars_nest %>%
mutate(data = map(data, ~ join_df(., mtcars_mpg)))
# # A tibble: 3 x 2
# cyl data
# <dbl> <list>
# 1 6. <tibble [7 x 11]>
# 2 4. <tibble [11 x 11]>
# 3 8. <tibble [14 x 11]>
and get your nested and now augmented frames.
并获得嵌套和现在增强的帧。