为dataframe子集创建预测,并将其附加到原始文件

时间:2021-03-28 18:37:21

I am using R 3.3.2.

我用的是R 3。2。

I would like to predict scores of institutions for various subrankings based on their scores in previous years. Then I need to add these predicted scores as new rows to the original dataframe. My input is a csv file

我想根据前几年的成绩预测不同支系的几十所院校。然后我需要将这些预测的分数作为新的行添加到原始的dataframe中。我的输入是一个csv文件

I want to use the least squares linear model and found that "lm" and "predict" does exactly what I need.

我想使用最小二乘线性模型,并发现“lm”和“预测”完全符合我的需要。

I know this a pretty beginner question, but hope someone can help me. Please see below the data and code with two solutions I've started.

我知道这是一个很初级的问题,但希望有人能帮助我。请参见下面的数据和代码,其中包含我已经启动的两个解决方案。

score<-c(63.6,  60.3,   60.4,   53.4,   46.5,   65.8,   45.8,   65.9,   
44.9,   60, 83.5,   81.7,   81.2,   78.8,   83.3,   79.4,   83.2,   77.3,   
79.4)

year<-c(2013,   2014,   2015,   2016,   2014,   2014,   2015,   2015,   
2016,   2016,   2011,   2012,   2013,   2014,   2014,   2015,   2015,   
2016,   2016)

institution<-c(1422,    1422,   1422,   1422,   1384,   1422,   1384,   
1422,   1384,   1422,   1384,   1384,   1384,   1422,   1384,   1422,   
1384,   1422,   1384)

subranking<-c('CMP',    'CMP',  'CMP',  'CMP',  'SSC',  'SSC',  'SSC',  
'SSC',  'SSC',  'SSC',  'ETC',  'ETC',  'ETC',  'ETC',  'ETC',  'ETC',  
'ETC',  'ETC',  'ETC')

d <- data.frame(score, year, institution,subranking)


#-----------SOLUTION 1 -------------------

p<- unique(d$institution)
for (i in (1:length(p))){
  x<- d$score[d$institution==p[i]]
  y<- d$year[d$institution==p[i]]
  model<- lm(x~y)
  result<-predict(model, data.frame(y = c(2017,2018,2019,2020)))
  z<- cbind(result,data.frame(y = c(2017,2018,2019,2020)))
  print(z)
}

##----------SOLUTION 2 -------------------

calculate_predicted_scores <- function(scores, years) {predicted_scores <-0
mod = lm(scores ~ years)
predicted_scores<-predict(mod, data.frame(years = c(2017,2018,2019,2020)))
return(predicted_scores)
}

To illustrate, this is what I want to get at the end - the yellow rows are the predictions:

为了说明这一点,这就是我最后想要得到的——黄色的行是预测:

为dataframe子集创建预测,并将其附加到原始文件

1 个解决方案

#1


2  

You can try dplyr with broom as described in this very helpful answer

您可以尝试dplyr与扫帚在这个非常有用的答案描述

library(dplyr)
library(broom)
pred_per_group = d %>% group_by(subranking, institution) %>%
  do(predicted_scores=predict(lm(score ~ year, data=.), data.frame(year = c(2017,2018,2019, 2020))))
pred_df = tidy(pred_per_group, predicted_scores)

Then, add the resulting data frame with predicitons to yours with rbind.

然后,使用rbind将结果数据帧添加到您的数据帧中。

pred_df <- data.frame(score=pred_df$x, year=rep(c(2017,2018,2019,2020), 5), institution=pred_df$institution, subranking=pred_df$subranking)
result <- rbind(d, pred_df)

EDIT on 3 Aug : as you wanted to conclude your own pursuit of coding I would go about it as follows:

编辑8月3日:当你想要结束你自己对编码的追求时,我将这样做:

p<- unique(d$institution)
r <- unique(d$subranking)
for (i in (1:length(p))){
  for(j in seq_along(r)){
  score<- d$score[d$institution==p[i] & d$subranking==r[j]]
  year<- d$year[d$institution==p[i] & d$subranking==r[j]]
  if(length(score)== 0){
    print(sprintf("No level for the following combination: Institution: %s and Subrank: %s", p[i], r[j]))
  } else{
  model<- lm(score~year)
  result<-predict(model, data.frame(year = c(2017,2018,2019,2020)))
  z<- cbind(result,data.frame(year = c(2017,2018,2019,2020)))
  print(sprintf("For Institution: %s and Subrank: %s the Score is:",p[i], r[j]))
  print(z)
  }
  }
}

giving

[1] "For Institution: 1422 and Subrank: CMP the Score is:"
  result year
1  51.80 2017
2  48.75 2018
3  45.70 2019
4  42.65 2020
[1] "For Institution: 1422 and Subrank: SSC the Score is:"
  result year
1   58.1 2017
2   55.2 2018
3   52.3 2019
4   49.4 2020
[1] "For Institution: 1422 and Subrank: ETC the Score is:"
  result year
1  77.00 2017
2  76.25 2018
3  75.50 2019
4  74.75 2020
[1] "No level for the following combination: Institution: 1384 and Subrank: CMP"
[1] "For Institution: 1384 and Subrank: SSC the Score is:"
    result year
1 44.13333 2017
2 43.33333 2018
3 42.53333 2019
4 41.73333 2020
[1] "For Institution: 1384 and Subrank: ETC the Score is:"
     result year
1 80.66000 2017
2 80.26286 2018
3 79.86571 2019
4 79.46857 2020

#1


2  

You can try dplyr with broom as described in this very helpful answer

您可以尝试dplyr与扫帚在这个非常有用的答案描述

library(dplyr)
library(broom)
pred_per_group = d %>% group_by(subranking, institution) %>%
  do(predicted_scores=predict(lm(score ~ year, data=.), data.frame(year = c(2017,2018,2019, 2020))))
pred_df = tidy(pred_per_group, predicted_scores)

Then, add the resulting data frame with predicitons to yours with rbind.

然后,使用rbind将结果数据帧添加到您的数据帧中。

pred_df <- data.frame(score=pred_df$x, year=rep(c(2017,2018,2019,2020), 5), institution=pred_df$institution, subranking=pred_df$subranking)
result <- rbind(d, pred_df)

EDIT on 3 Aug : as you wanted to conclude your own pursuit of coding I would go about it as follows:

编辑8月3日:当你想要结束你自己对编码的追求时,我将这样做:

p<- unique(d$institution)
r <- unique(d$subranking)
for (i in (1:length(p))){
  for(j in seq_along(r)){
  score<- d$score[d$institution==p[i] & d$subranking==r[j]]
  year<- d$year[d$institution==p[i] & d$subranking==r[j]]
  if(length(score)== 0){
    print(sprintf("No level for the following combination: Institution: %s and Subrank: %s", p[i], r[j]))
  } else{
  model<- lm(score~year)
  result<-predict(model, data.frame(year = c(2017,2018,2019,2020)))
  z<- cbind(result,data.frame(year = c(2017,2018,2019,2020)))
  print(sprintf("For Institution: %s and Subrank: %s the Score is:",p[i], r[j]))
  print(z)
  }
  }
}

giving

[1] "For Institution: 1422 and Subrank: CMP the Score is:"
  result year
1  51.80 2017
2  48.75 2018
3  45.70 2019
4  42.65 2020
[1] "For Institution: 1422 and Subrank: SSC the Score is:"
  result year
1   58.1 2017
2   55.2 2018
3   52.3 2019
4   49.4 2020
[1] "For Institution: 1422 and Subrank: ETC the Score is:"
  result year
1  77.00 2017
2  76.25 2018
3  75.50 2019
4  74.75 2020
[1] "No level for the following combination: Institution: 1384 and Subrank: CMP"
[1] "For Institution: 1384 and Subrank: SSC the Score is:"
    result year
1 44.13333 2017
2 43.33333 2018
3 42.53333 2019
4 41.73333 2020
[1] "For Institution: 1384 and Subrank: ETC the Score is:"
     result year
1 80.66000 2017
2 80.26286 2018
3 79.86571 2019
4 79.46857 2020