使用R插入数据帧中的缺失数据

时间:2020-12-05 22:51:29

I have a dataframe which is similar to the one below:

我有一个类似于下面的数据框:

Country Ccode Year Happiness Power   
1  France    FR 2000      1000  1000  
2  France    FR 2001        NA    NA
3  France    FR 2002        NA    NA
4  France    FR 2003      1600  2200
5  France    FR 2004        NA    NA
6      UK    UK 2000      1000  1000  
7      UK    UK 2001        NA    NA
8      UK    UK 2002      1000  1000  
9      UK    UK 2003      1000  1000
10     UK    UK 2004      1000  1000 

I have previously used the following code to get the differences:

我之前使用以下代码来获取差异:

df <- df %>%
  arrange(country, year) %>%  #sort data
  group_by(country) %>%
  mutate_if(is.numeric, funs(d = . - lag(.)))

I would like expand on this code by calculating the difference between the data points of Happiness and Power, divide it by the difference in years between the data points and calculate the values to replace the NA's with, resulting in the following output.

我希望通过计算幸福和功率的数据点之间的差异来扩展此代码,将其除以数据点之间的年份差异并计算用于替换NA的值,从而产生以下输出。

Country Ccode Year Happiness Power   
1  France    FR 2000      1000  1000  
2  France    FR 2001      1200  1400    
3  France    FR 2002      1400  1800
4  France    FR 2003      1600  2200
5  France    FR 2004        NA    NA
6      UK    UK 2000      1000  1000  
7      UK    UK 2001        0      0
8      UK    UK 2002      1000  1000  
9      UK    UK 2003      1000  1000
10     UK    UK 2004      1000  1000  

What would be an efficient way of carrying out this task?

什么是执行这项任务的有效方法?

EDIT: Please note that also France 2004 is NA. The extend function does seem to properly deal with such a situation.

编辑:请注意,法国2004年也是NA。 extend函数确实可以正确处理这种情况。

EDIT 2: Adding the group_by(country) seems to mess things up for unknown reasons:It seems that the code is trying to convert a character to a numeric, although I do not really understand why. When I convert the column to character, the error becomes an evaluation error. Any suggestions?

编辑2:添加group_by(国家/地区)似乎因为未知原因搞得一团糟:似乎代码正在尝试将字符转换为数字,尽管我真的不明白为什么。当我将列转换为字符时,错误将成为评估错误。有什么建议么?

> TRcomplete<-TRcomplete%>%
+     group_by(country) %>%
+     mutate_at(70:73,~na.fill(.x,"extend"))
Error in mutate_impl(.data, dots) : 
  Column `F116.s` can't be converted from character to numeric
> TRcomplete$F116.s <- as.numeric(TRcomplete$F116.s)
> TRcomplete<-TRcomplete%>%
+     group_by(country) %>%
+     mutate_at(70:73,~na.fill(.x,"extend"))
Error in mutate_impl(.data, dots) : 
  Column `F116.s` can't be converted from character to numeric
> TRcomplete$F116.s <- as.numeric(as.character(TRcomplete$F116.s))
> TRcomplete<-TRcomplete%>%
+     group_by(country) %>%
+     mutate_at(70:73,~na.fill(.x,"extend"))
Error in mutate_impl(.data, dots) : 
  Column `F116.s` can't be converted from character to numeric
> TRcomplete$F116.s <- as.character(TRcomplete$F116.s))
Error: unexpected ')' in "TRcomplete$F116.s <- as.character(TRcomplete$F116.s))"
> TRcomplete$F116.s <- as.character(TRcomplete$F116.s)
> str(TRcomplete$F116.s)
 chr [1:6984] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA ...
> TRcomplete<-TRcomplete%>%
+     group_by(country) %>%
+     mutate_at(70:73,~na.fill(.x,"extend"))
Error in mutate_impl(.data, dots) : 
  Evaluation error: need at least two non-NA values to interpolate.

1 个解决方案

#1


4  

You can use na.fill with fill="extend" from the zoo library

你可以在动物园图书馆中使用na.fill和fill =“extend”

rapply(df, zoo::na.fill,"integer",fill="extend",how="replace")
  Country Ccode Year Happiness Power
1  France    FR 2000      1000  1000
2  France    FR 2001      1200  1400
3  France    FR 2003      1400  1800
4  France    FR 2004      1600  2200
5      UK    UK 2000      1000  1000
6      UK    UK 2001      1000  1000
7      UK    UK 2003      1000  1000
8      UK    UK 2004      1000  1000

EDIT:

library(tidyverse)
library(zoo)
df%>%
  group_by(Country)%>%
  mutate_at(4:5,~na.fill(.x,"extend"))

  Country Ccode Year Happiness Power
1  France    FR 2000      1000  1000
2  France    FR 2001      1200  1400
3  France    FR 2003      1400  1800
4  France    FR 2004      1600  2200
5      UK    UK 2000      1000  1000
6      UK    UK 2001      1000  1000
7      UK    UK 2003      1000  1000
8      UK    UK 2004      1000  1000

If all the elements in the group are NA then:

如果组中的所有元素都是NA,那么:

df%>% 
  group_by(Country)%>% 
  mutate_if(is.numeric,~if(all(is.na(.x))) NA else na.fill(.x,"extend"))

#1


4  

You can use na.fill with fill="extend" from the zoo library

你可以在动物园图书馆中使用na.fill和fill =“extend”

rapply(df, zoo::na.fill,"integer",fill="extend",how="replace")
  Country Ccode Year Happiness Power
1  France    FR 2000      1000  1000
2  France    FR 2001      1200  1400
3  France    FR 2003      1400  1800
4  France    FR 2004      1600  2200
5      UK    UK 2000      1000  1000
6      UK    UK 2001      1000  1000
7      UK    UK 2003      1000  1000
8      UK    UK 2004      1000  1000

EDIT:

library(tidyverse)
library(zoo)
df%>%
  group_by(Country)%>%
  mutate_at(4:5,~na.fill(.x,"extend"))

  Country Ccode Year Happiness Power
1  France    FR 2000      1000  1000
2  France    FR 2001      1200  1400
3  France    FR 2003      1400  1800
4  France    FR 2004      1600  2200
5      UK    UK 2000      1000  1000
6      UK    UK 2001      1000  1000
7      UK    UK 2003      1000  1000
8      UK    UK 2004      1000  1000

If all the elements in the group are NA then:

如果组中的所有元素都是NA,那么:

df%>% 
  group_by(Country)%>% 
  mutate_if(is.numeric,~if(all(is.na(.x))) NA else na.fill(.x,"extend"))