I have a data frame with a column of models and I am trying to add a column of predicted values to it. A minimal example is :
我有一个数据框架,有一列模型,我想给它加上一列预测值。一个最小的例子是:
exampleTable <- data.frame(x = c(1:5, 1:5),
y = c((1:5) + rnorm(5), 2*(5:1)),
groups = rep(LETTERS[1:2], each = 5))
models <- exampleTable %>% group_by(groups) %>% do(model = lm(y ~ x, data = .))
exampleTable <- left_join(tbl_df(exampleTable), models)
estimates <- exampleTable %>% rowwise() %>% do(Est = predict(.$model, newdata = .["x"]))
How can I add a column of numeric predictions to exampleTable
? I tried using mutate
to directly add the column to the table without success.
如何在示例中添加一列数字预测?我尝试使用mutate直接将列添加到表中,但没有成功。
> exampleTable <- exampleTable %>% rowwise() %>% mutate(data.frame(Pred = predict(.$model, newdata = .["x"])))
Error: no applicable method for 'predict' applied to an object of class "list"
Now I use bind_cols
to add the estimates
to exampleTable
but I am looking for a better solution.
现在,我使用bind_cols将估算添加到exampleTable中,但我正在寻找更好的解决方案。
estimates <- exampleTable %>% rowwise() %>% do(data.frame(Pred = predict(.$model, newdata = .["x"])))
exampleTable <- bind_cols(exampleTable, estimates)
How can it be done in a single step ?
怎么能一步一步完成呢?
3 个解决方案
#1
3
For the record, this is painless in data.table
:
为了记录在案,这是无痛的数据。
library(data.table)
setDT(exampleTable)
# actually, the more typical usage is to set the newdata
# argument here to .SD (especially for multivariate regressions; see:
# https://*.com/a/32277135/3576984
exampleTable[ , estimates := predict(lm(y ~ x), data.frame(x)), by = groups]
exampleTable
# x y groups estimates
# 1: 1 0.3123549 A 0.6826629
# 2: 2 2.7636593 A 1.8297796
# 3: 3 1.7771181 A 2.9768963
# 4: 4 5.2031623 A 4.1240130
# 5: 5 4.8281869 A 5.2711297
# 6: 1 10.0000000 B 10.0000000
# 7: 2 8.0000000 B 8.0000000
# 8: 3 6.0000000 B 6.0000000
# 9: 4 4.0000000 B 4.0000000
# 10: 5 2.0000000 B 2.0000000
If you're sold on data.table
's clarity as I was, check out the intro vignettes!
如果你被数据说服了。桌子的清晰度,我是,看看简介!
Also, you don't really need to group by groups
. Just include that as a dummy interaction. If I recall, that's the proper approach to get correct standard errors, anyway:
同样,你也不需要分组。把它作为虚拟的交互。如果我记得的话,这是得到正确标准错误的正确方法,无论如何:
exampleTable[ , estimates2 := predict(lm(y ~ x * factor(groups)), .SD)]
exampleTable[ , all.equal(estimates, estimates2)]
# [1] TRUE
#2
1
Using the tidyverse:
使用tidyverse:
library(dplyr)
library(purrr)
library(tidyr)
library(broom)
exampleTable <- data.frame(
x = c(1:5, 1:5),
y = c((1:5) + rnorm(5), 2*(5:1)),
groups = rep(LETTERS[1:2], each = 5)
)
exampleTable %>%
group_by(groups) %>%
nest() %>%
mutate(model = data %>% map(~lm(y ~ x, data = .))) %>%
mutate(Pred = map2(model, data, predict)) %>%
unnest(Pred, data)
# A tibble: 10 × 4
groups Pred x y
<fctr> <dbl> <int> <dbl>
1 A 1.284185 1 0.9305908
2 A 1.909262 2 1.9598293
3 A 2.534339 3 3.2812002
4 A 3.159415 4 2.9283637
5 A 3.784492 5 3.5717085
6 B 10.000000 1 10.0000000
7 B 8.000000 2 8.0000000
8 B 6.000000 3 6.0000000
9 B 4.000000 4 4.0000000
10 B 2.000000 5 2.0000000
#3
0
Eh, this is only slightly better:
嗯,这只是稍微好一点:
answer =
exampleTable %>%
group_by(groups) %>%
do(lm( y ~ x , data = .) %>%
predict %>%
data_frame(prediction = .)) %>%
bind_cols(exampleTable)
I was hoping this would work but it didn't.
我本希望这能行得通,但没有成功。
answer =
exampleTable %>%
group_by(groups) %>%
mutate(prediction =
lm( y ~ x , data = .) %>%
predict)
#1
3
For the record, this is painless in data.table
:
为了记录在案,这是无痛的数据。
library(data.table)
setDT(exampleTable)
# actually, the more typical usage is to set the newdata
# argument here to .SD (especially for multivariate regressions; see:
# https://*.com/a/32277135/3576984
exampleTable[ , estimates := predict(lm(y ~ x), data.frame(x)), by = groups]
exampleTable
# x y groups estimates
# 1: 1 0.3123549 A 0.6826629
# 2: 2 2.7636593 A 1.8297796
# 3: 3 1.7771181 A 2.9768963
# 4: 4 5.2031623 A 4.1240130
# 5: 5 4.8281869 A 5.2711297
# 6: 1 10.0000000 B 10.0000000
# 7: 2 8.0000000 B 8.0000000
# 8: 3 6.0000000 B 6.0000000
# 9: 4 4.0000000 B 4.0000000
# 10: 5 2.0000000 B 2.0000000
If you're sold on data.table
's clarity as I was, check out the intro vignettes!
如果你被数据说服了。桌子的清晰度,我是,看看简介!
Also, you don't really need to group by groups
. Just include that as a dummy interaction. If I recall, that's the proper approach to get correct standard errors, anyway:
同样,你也不需要分组。把它作为虚拟的交互。如果我记得的话,这是得到正确标准错误的正确方法,无论如何:
exampleTable[ , estimates2 := predict(lm(y ~ x * factor(groups)), .SD)]
exampleTable[ , all.equal(estimates, estimates2)]
# [1] TRUE
#2
1
Using the tidyverse:
使用tidyverse:
library(dplyr)
library(purrr)
library(tidyr)
library(broom)
exampleTable <- data.frame(
x = c(1:5, 1:5),
y = c((1:5) + rnorm(5), 2*(5:1)),
groups = rep(LETTERS[1:2], each = 5)
)
exampleTable %>%
group_by(groups) %>%
nest() %>%
mutate(model = data %>% map(~lm(y ~ x, data = .))) %>%
mutate(Pred = map2(model, data, predict)) %>%
unnest(Pred, data)
# A tibble: 10 × 4
groups Pred x y
<fctr> <dbl> <int> <dbl>
1 A 1.284185 1 0.9305908
2 A 1.909262 2 1.9598293
3 A 2.534339 3 3.2812002
4 A 3.159415 4 2.9283637
5 A 3.784492 5 3.5717085
6 B 10.000000 1 10.0000000
7 B 8.000000 2 8.0000000
8 B 6.000000 3 6.0000000
9 B 4.000000 4 4.0000000
10 B 2.000000 5 2.0000000
#3
0
Eh, this is only slightly better:
嗯,这只是稍微好一点:
answer =
exampleTable %>%
group_by(groups) %>%
do(lm( y ~ x , data = .) %>%
predict %>%
data_frame(prediction = .)) %>%
bind_cols(exampleTable)
I was hoping this would work but it didn't.
我本希望这能行得通,但没有成功。
answer =
exampleTable %>%
group_by(groups) %>%
mutate(prediction =
lm( y ~ x , data = .) %>%
predict)