862 2006-05-19 6.241603 5.774208 863 2006-05-20 NA NA 864 2006-05-21 NA NA 865 2006-05-22 6.383929 5.906426 866 2006-05-23 6.782068 6.268758 867 2006-05-24 6.534616 6.013767 868 2006-05-25 6.370312 5.856366 869 2006-05-26 6.225175 5.781617 870 2006-05-27 NA NA
I have a data frame x like above with some NA, which i want to fill using neighboring non-NA values like for 2006-05-20 it will be avg of 19&22
我有一个数据框x像上面的一些NA,我想用邻近的非NA值填充,如2006-05-20它将是平均19和22
How do it is the question?
问题是怎么回事?
2 个解决方案
#1
Properly formatted your data looks like this
正确格式化您的数据如下所示
862 2006-05-19 6.241603 5.774208 863 2006-05-20 NA NA 864 2006-05-21 NA NA 865 2006-05-22 6.383929 5.906426 866 2006-05-23 6.782068 6.268758 867 2006-05-24 6.534616 6.013767 868 2006-05-25 6.370312 5.856366 869 2006-05-26 6.225175 5.781617 870 2006-05-27 NA NA
and is of a time-series nature. So I would load into an object of class zoo
(from the zoo package) as that allows you to pick a number of strategies -- see below. Which one you pick depends on the nature of your data and application. In general, the field of 'figuring missing data out' is called data imputationand there is a rather large literature.
并且具有时间序列性质。所以我会加载到类动物园的对象(来自动物园包),因为这允许你选择一些策略 - 见下文。您选择哪一个取决于您的数据和应用程序的性质。一般而言,“将数据丢失”的领域称为数据插补,并且存在相当大的文献。
R> x <- zoo(X[,3:4], order.by=as.Date(X[,2]))R> x x y2006-05-19 6.242 5.7742006-05-20 NA NA2006-05-21 NA NA2006-05-22 6.384 5.9062006-05-23 6.782 6.2692006-05-24 6.535 6.0142006-05-25 6.370 5.8562006-05-26 6.225 5.7822006-05-27 NA NAR> na.locf(x) # last observation carried forward x y2006-05-19 6.242 5.7742006-05-20 6.242 5.7742006-05-21 6.242 5.7742006-05-22 6.384 5.9062006-05-23 6.782 6.2692006-05-24 6.535 6.0142006-05-25 6.370 5.8562006-05-26 6.225 5.7822006-05-27 6.225 5.782R> na.approx(x) # approximation based on before/after values x y2006-05-19 6.242 5.7742006-05-20 6.289 5.8182006-05-21 6.336 5.8622006-05-22 6.384 5.9062006-05-23 6.782 6.2692006-05-24 6.535 6.0142006-05-25 6.370 5.8562006-05-26 6.225 5.782R> na.spline(x) # spline fit ... x y2006-05-19 6.242 5.7742006-05-20 5.585 5.1592006-05-21 5.797 5.3582006-05-22 6.384 5.9062006-05-23 6.782 6.2692006-05-24 6.535 6.0142006-05-25 6.370 5.8562006-05-26 6.225 5.7822006-05-27 5.973 5.716R>
#2
Depending on the data tidyr::fill()
might be an option:
根据数据,tidyr :: fill()可能是一个选项:
library(tidyverse)df %>% fill(x) # single column xdf %>% fill(x, y) # multiple columns, x and ydf %>% fill(x, .direction = 'up') # filling from the bottom up rather than top down
#1
Properly formatted your data looks like this
正确格式化您的数据如下所示
862 2006-05-19 6.241603 5.774208 863 2006-05-20 NA NA 864 2006-05-21 NA NA 865 2006-05-22 6.383929 5.906426 866 2006-05-23 6.782068 6.268758 867 2006-05-24 6.534616 6.013767 868 2006-05-25 6.370312 5.856366 869 2006-05-26 6.225175 5.781617 870 2006-05-27 NA NA
and is of a time-series nature. So I would load into an object of class zoo
(from the zoo package) as that allows you to pick a number of strategies -- see below. Which one you pick depends on the nature of your data and application. In general, the field of 'figuring missing data out' is called data imputationand there is a rather large literature.
并且具有时间序列性质。所以我会加载到类动物园的对象(来自动物园包),因为这允许你选择一些策略 - 见下文。您选择哪一个取决于您的数据和应用程序的性质。一般而言,“将数据丢失”的领域称为数据插补,并且存在相当大的文献。
R> x <- zoo(X[,3:4], order.by=as.Date(X[,2]))R> x x y2006-05-19 6.242 5.7742006-05-20 NA NA2006-05-21 NA NA2006-05-22 6.384 5.9062006-05-23 6.782 6.2692006-05-24 6.535 6.0142006-05-25 6.370 5.8562006-05-26 6.225 5.7822006-05-27 NA NAR> na.locf(x) # last observation carried forward x y2006-05-19 6.242 5.7742006-05-20 6.242 5.7742006-05-21 6.242 5.7742006-05-22 6.384 5.9062006-05-23 6.782 6.2692006-05-24 6.535 6.0142006-05-25 6.370 5.8562006-05-26 6.225 5.7822006-05-27 6.225 5.782R> na.approx(x) # approximation based on before/after values x y2006-05-19 6.242 5.7742006-05-20 6.289 5.8182006-05-21 6.336 5.8622006-05-22 6.384 5.9062006-05-23 6.782 6.2692006-05-24 6.535 6.0142006-05-25 6.370 5.8562006-05-26 6.225 5.782R> na.spline(x) # spline fit ... x y2006-05-19 6.242 5.7742006-05-20 5.585 5.1592006-05-21 5.797 5.3582006-05-22 6.384 5.9062006-05-23 6.782 6.2692006-05-24 6.535 6.0142006-05-25 6.370 5.8562006-05-26 6.225 5.7822006-05-27 5.973 5.716R>
#2
Depending on the data tidyr::fill()
might be an option:
根据数据,tidyr :: fill()可能是一个选项:
library(tidyverse)df %>% fill(x) # single column xdf %>% fill(x, y) # multiple columns, x and ydf %>% fill(x, .direction = 'up') # filling from the bottom up rather than top down