用dplyr向数据框中添加预测值列

时间:2022-07-08 19:35:48

I have a data frame with a column of models and I am trying to add a column of predicted values to it. A minimal example is :

我有一个数据框架,有一列模型,我想给它加上一列预测值。一个最小的例子是:

exampleTable <- data.frame(x = c(1:5, 1:5),
                           y = c((1:5) + rnorm(5), 2*(5:1)),
                           groups = rep(LETTERS[1:2], each = 5))

models <- exampleTable %>% group_by(groups) %>% do(model = lm(y ~ x, data = .))
exampleTable <- left_join(tbl_df(exampleTable), models)

estimates <- exampleTable %>% rowwise() %>% do(Est = predict(.$model, newdata = .["x"]))

How can I add a column of numeric predictions to exampleTable ? I tried using mutate to directly add the column to the table without success.

如何在示例中添加一列数字预测?我尝试使用mutate直接将列添加到表中,但没有成功。

> exampleTable <- exampleTable %>% rowwise() %>% mutate(data.frame(Pred = predict(.$model, newdata = .["x"])))
Error: no applicable method for 'predict' applied to an object of class "list"

Now I use bind_cols to add the estimates to exampleTable but I am looking for a better solution.

现在,我使用bind_cols将估算添加到exampleTable中,但我正在寻找更好的解决方案。

estimates <- exampleTable %>% rowwise() %>% do(data.frame(Pred = predict(.$model, newdata = .["x"])))
exampleTable <- bind_cols(exampleTable, estimates)

How can it be done in a single step ?

怎么能一步一步完成呢?

3 个解决方案

#1


3  

For the record, this is painless in data.table:

为了记录在案,这是无痛的数据。

library(data.table)
setDT(exampleTable)

# actually, the more typical usage is to set the newdata
#   argument here to .SD (especially for multivariate regressions; see:
#   https://*.com/a/32277135/3576984
exampleTable[ , estimates := predict(lm(y ~ x), data.frame(x)), by = groups]

exampleTable
#     x          y groups  estimates
#  1: 1  0.3123549      A  0.6826629
#  2: 2  2.7636593      A  1.8297796
#  3: 3  1.7771181      A  2.9768963
#  4: 4  5.2031623      A  4.1240130
#  5: 5  4.8281869      A  5.2711297
#  6: 1 10.0000000      B 10.0000000
#  7: 2  8.0000000      B  8.0000000
#  8: 3  6.0000000      B  6.0000000
#  9: 4  4.0000000      B  4.0000000
# 10: 5  2.0000000      B  2.0000000

If you're sold on data.table's clarity as I was, check out the intro vignettes!

如果你被数据说服了。桌子的清晰度,我是,看看简介!

Also, you don't really need to group by groups. Just include that as a dummy interaction. If I recall, that's the proper approach to get correct standard errors, anyway:

同样,你也不需要分组。把它作为虚拟的交互。如果我记得的话,这是得到正确标准错误的正确方法,无论如何:

exampleTable[ , estimates2 := predict(lm(y ~ x * factor(groups)), .SD)]
exampleTable[ , all.equal(estimates, estimates2)]
# [1] TRUE

#2


1  

Using the tidyverse:

使用tidyverse:

library(dplyr)
library(purrr)
library(tidyr)
library(broom)

exampleTable <- data.frame(
  x = c(1:5, 1:5),
  y = c((1:5) + rnorm(5), 2*(5:1)),
  groups = rep(LETTERS[1:2], each = 5)
)

exampleTable %>% 
  group_by(groups) %>%
  nest() %>% 
  mutate(model = data %>% map(~lm(y ~ x, data = .))) %>% 
  mutate(Pred = map2(model, data, predict)) %>% 
  unnest(Pred, data)

# A tibble: 10 × 4
   groups      Pred     x          y
   <fctr>     <dbl> <int>      <dbl>
1       A  1.284185     1  0.9305908
2       A  1.909262     2  1.9598293
3       A  2.534339     3  3.2812002
4       A  3.159415     4  2.9283637
5       A  3.784492     5  3.5717085
6       B 10.000000     1 10.0000000
7       B  8.000000     2  8.0000000
8       B  6.000000     3  6.0000000
9       B  4.000000     4  4.0000000
10      B  2.000000     5  2.0000000

#3


0  

Eh, this is only slightly better:

嗯,这只是稍微好一点:

answer = 
  exampleTable %>%
  group_by(groups) %>%
  do(lm( y ~ x , data = .) %>% 
       predict %>% 
       data_frame(prediction = .)) %>%
  bind_cols(exampleTable)

I was hoping this would work but it didn't.

我本希望这能行得通,但没有成功。

answer = 
  exampleTable %>%
  group_by(groups) %>%
  mutate(prediction = 
           lm( y ~ x , data = .) %>% 
           predict)

#1


3  

For the record, this is painless in data.table:

为了记录在案,这是无痛的数据。

library(data.table)
setDT(exampleTable)

# actually, the more typical usage is to set the newdata
#   argument here to .SD (especially for multivariate regressions; see:
#   https://*.com/a/32277135/3576984
exampleTable[ , estimates := predict(lm(y ~ x), data.frame(x)), by = groups]

exampleTable
#     x          y groups  estimates
#  1: 1  0.3123549      A  0.6826629
#  2: 2  2.7636593      A  1.8297796
#  3: 3  1.7771181      A  2.9768963
#  4: 4  5.2031623      A  4.1240130
#  5: 5  4.8281869      A  5.2711297
#  6: 1 10.0000000      B 10.0000000
#  7: 2  8.0000000      B  8.0000000
#  8: 3  6.0000000      B  6.0000000
#  9: 4  4.0000000      B  4.0000000
# 10: 5  2.0000000      B  2.0000000

If you're sold on data.table's clarity as I was, check out the intro vignettes!

如果你被数据说服了。桌子的清晰度,我是,看看简介!

Also, you don't really need to group by groups. Just include that as a dummy interaction. If I recall, that's the proper approach to get correct standard errors, anyway:

同样,你也不需要分组。把它作为虚拟的交互。如果我记得的话,这是得到正确标准错误的正确方法,无论如何:

exampleTable[ , estimates2 := predict(lm(y ~ x * factor(groups)), .SD)]
exampleTable[ , all.equal(estimates, estimates2)]
# [1] TRUE

#2


1  

Using the tidyverse:

使用tidyverse:

library(dplyr)
library(purrr)
library(tidyr)
library(broom)

exampleTable <- data.frame(
  x = c(1:5, 1:5),
  y = c((1:5) + rnorm(5), 2*(5:1)),
  groups = rep(LETTERS[1:2], each = 5)
)

exampleTable %>% 
  group_by(groups) %>%
  nest() %>% 
  mutate(model = data %>% map(~lm(y ~ x, data = .))) %>% 
  mutate(Pred = map2(model, data, predict)) %>% 
  unnest(Pred, data)

# A tibble: 10 × 4
   groups      Pred     x          y
   <fctr>     <dbl> <int>      <dbl>
1       A  1.284185     1  0.9305908
2       A  1.909262     2  1.9598293
3       A  2.534339     3  3.2812002
4       A  3.159415     4  2.9283637
5       A  3.784492     5  3.5717085
6       B 10.000000     1 10.0000000
7       B  8.000000     2  8.0000000
8       B  6.000000     3  6.0000000
9       B  4.000000     4  4.0000000
10      B  2.000000     5  2.0000000

#3


0  

Eh, this is only slightly better:

嗯,这只是稍微好一点:

answer = 
  exampleTable %>%
  group_by(groups) %>%
  do(lm( y ~ x , data = .) %>% 
       predict %>% 
       data_frame(prediction = .)) %>%
  bind_cols(exampleTable)

I was hoping this would work but it didn't.

我本希望这能行得通,但没有成功。

answer = 
  exampleTable %>%
  group_by(groups) %>%
  mutate(prediction = 
           lm( y ~ x , data = .) %>% 
           predict)