Consider this example
考虑这个例子
mydata <- data_frame(ind_1 = c(NA,NA,3,4),
ind_2 = c(2,3,4,5),
ind_3 = c(5,6,NA,NA),
y = c(28,34,25,12),
group = c('a','a','b','b'))
> mydata
# A tibble: 4 x 5
ind_1 ind_2 ind_3 y group
<dbl> <dbl> <dbl> <dbl> <chr>
1 NA 2 5 28 a
2 NA 3 6 34 a
3 3 4 NA 25 b
4 4 5 NA 12 b
Here I want, for each group
, regress y
on whatever variable is not missing in that group, and store the corresponding lm
object in a list-column
.
在这里,我希望,对于每个组,对该组中没有丢失的任何变量进行回归,并将相应的lm对象存储在列表列中。
That is:
- for group
a
, these variables correspond toind_2
andind_3
- for group
b
, they correspond toind_1
andind_2
对于组a,这些变量对应于ind_2和ind_3
对于组b,它们对应于ind_1和ind_2
I tried the following but this does not work
我尝试了以下但这不起作用
mydata %>% group_by(group) %>% nest() %>%
do(filtered_df <- . %>% select(which(colMeans(is.na(.)) == 0)),
myreg = lm(y~ names(filtered_df)))
Any ideas? Thanks!
有任何想法吗?谢谢!
2 个解决方案
#1
8
We can use map
and mutate
. We can either select
and model in one step (nestdat1
) or in separate steps using two map
's if you want to preserve the filtered data (nestdat2
):
我们可以使用map和mutate。如果要保留过滤后的数据(nestdat2),我们可以在一个步骤(nestdat1)中选择和建模,也可以使用两个地图在单独的步骤中进行建模:
library(tidyverse)
nestdat1 <- mydata %>%
group_by(group) %>%
nest() %>%
mutate(model = data %>% map(~ select_if(., funs(!any(is.na(.)))) %>%
lm(y ~ ., data = .)))
nestdat2 <- mydata %>%
group_by(group) %>%
nest() %>%
mutate(data = data %>% map(~ select_if(., funs(!any(is.na(.))))),
model = data %>% map(~ lm(y ~ ., data = .)))
Output:
They produce different data
columns:
他们生成不同的数据列:
> nestdat1 %>% pull(data)
[[1]]
# A tibble: 2 x 4
ind_1 ind_2 ind_3 y
<dbl> <dbl> <dbl> <dbl>
1 NA 2 5 28
2 NA 3 6 34
[[2]]
# A tibble: 2 x 4
ind_1 ind_2 ind_3 y
<dbl> <dbl> <dbl> <dbl>
1 3 4 NA 25
2 4 5 NA 12
> nestdat2 %>% pull(data)
[[1]]
# A tibble: 2 x 3
ind_2 ind_3 y
<dbl> <dbl> <dbl>
1 2 5 28
2 3 6 34
[[2]]
# A tibble: 2 x 3
ind_1 ind_2 y
<dbl> <dbl> <dbl>
1 3 4 25
2 4 5 12
But the same model
column:
但是相同的模型列:
> nestdat1 %>% pull(model)
[[1]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_2 ind_3
16 6 NA
[[2]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_1 ind_2
64 -13 NA
> nestdat2 %>% pull(model)
[[1]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_2 ind_3
16 6 NA
[[2]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_1 ind_2
64 -13 NA
#2
2
Here's another tidyverse
option, assign to mydata$model
if you wish to keep it in your tibble
:
这是另一个tidyverse选项,如果你希望将它保存在你的tibble中,则分配给mydata $ model:
library(tidyverse)
mydata %>%
nest(-group) %>%
pull(data) %>%
map(~lm(y ~., discard(.,anyNA)))
# [[1]]
#
# Call:
# lm(formula = y ~ ., data = discard(., anyNA))
#
# Coefficients:
# (Intercept) ind_2 ind_3
# 16 6 NA
#
#
# [[2]]
#
# Call:
# lm(formula = y ~ ., data = discard(., anyNA))
#
# Coefficients:
# (Intercept) ind_1 ind_2
# 64 -13 NA
#
#
#1
8
We can use map
and mutate
. We can either select
and model in one step (nestdat1
) or in separate steps using two map
's if you want to preserve the filtered data (nestdat2
):
我们可以使用map和mutate。如果要保留过滤后的数据(nestdat2),我们可以在一个步骤(nestdat1)中选择和建模,也可以使用两个地图在单独的步骤中进行建模:
library(tidyverse)
nestdat1 <- mydata %>%
group_by(group) %>%
nest() %>%
mutate(model = data %>% map(~ select_if(., funs(!any(is.na(.)))) %>%
lm(y ~ ., data = .)))
nestdat2 <- mydata %>%
group_by(group) %>%
nest() %>%
mutate(data = data %>% map(~ select_if(., funs(!any(is.na(.))))),
model = data %>% map(~ lm(y ~ ., data = .)))
Output:
They produce different data
columns:
他们生成不同的数据列:
> nestdat1 %>% pull(data)
[[1]]
# A tibble: 2 x 4
ind_1 ind_2 ind_3 y
<dbl> <dbl> <dbl> <dbl>
1 NA 2 5 28
2 NA 3 6 34
[[2]]
# A tibble: 2 x 4
ind_1 ind_2 ind_3 y
<dbl> <dbl> <dbl> <dbl>
1 3 4 NA 25
2 4 5 NA 12
> nestdat2 %>% pull(data)
[[1]]
# A tibble: 2 x 3
ind_2 ind_3 y
<dbl> <dbl> <dbl>
1 2 5 28
2 3 6 34
[[2]]
# A tibble: 2 x 3
ind_1 ind_2 y
<dbl> <dbl> <dbl>
1 3 4 25
2 4 5 12
But the same model
column:
但是相同的模型列:
> nestdat1 %>% pull(model)
[[1]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_2 ind_3
16 6 NA
[[2]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_1 ind_2
64 -13 NA
> nestdat2 %>% pull(model)
[[1]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_2 ind_3
16 6 NA
[[2]]
Call:
lm(formula = y ~ ., data = .)
Coefficients:
(Intercept) ind_1 ind_2
64 -13 NA
#2
2
Here's another tidyverse
option, assign to mydata$model
if you wish to keep it in your tibble
:
这是另一个tidyverse选项,如果你希望将它保存在你的tibble中,则分配给mydata $ model:
library(tidyverse)
mydata %>%
nest(-group) %>%
pull(data) %>%
map(~lm(y ~., discard(.,anyNA)))
# [[1]]
#
# Call:
# lm(formula = y ~ ., data = discard(., anyNA))
#
# Coefficients:
# (Intercept) ind_2 ind_3
# 16 6 NA
#
#
# [[2]]
#
# Call:
# lm(formula = y ~ ., data = discard(., anyNA))
#
# Coefficients:
# (Intercept) ind_1 ind_2
# 64 -13 NA
#
#