如何分析R中的不规则时间序列

时间:2021-10-01 05:54:22

I have a zoo time series in R:

我在R里有一个动物园时间序列:

d <- structure(c(50912, 50912, 50912, 50912, 50913, 50913, 50914, 
50914, 50914, 50915, 50915, 50915, 50916, 50916, 50916, 50917, 
50917, 50917, 50918, 50918, 2293.8, 2302.64, 2310.5, 2324.02, 
2312.25, 2323.93, 2323.83, 2338.67, 2323.1, 2320.77, 2329.73, 
2319.63, 2330.86, 2323.38, 2322.92, 2317.71, 2322.76, 2286.64, 
2294.83, 2305.06, 55.9, 62.8, 66.4, 71.9, 59.8, 65.7, 61.9, 67.9, 
38.5, 36.7, 43.2, 30.3, 42.4, 33.5, 48.8, 52.7, 61.2, 30, 41.7, 
50, 8.6, 9.7, 10.3, 11.1, 9.2, 10.1, 9.6, 10.4, 5.9, 5.6, 6.6, 
4.7, 6.5, 5.2, 7.5, 8.1, 9.5, 4.6, 6.4, 7.7, 9.29591864400155, 
10.6585128174944, 10.4386464748912, 11.5738448647708, 10.9486074772952, 
10.9546547052814, 10.3733963771546, 9.15627378048238, 8.22993822910891, 
5.69045896511178, 6.95269658370746, 7.78781665368086, 7.20089569039135, 
4.9759716583555, 8.99378907920762, 10.0924594632635, 10.3909638115674, 
6.28203685114275, 9.16021859457356, 7.56829801052175, 0.695918644001553, 
0.9585128174944, 0.138646474891241, 0.473844864770827, 1.74860747729523, 
0.854654705281426, 0.773396377154565, -1.24372621951762, 2.32993822910891, 
0.0904589651117833, 0.352696583707458, 3.08781665368086, 0.700895690391349, 
-0.224028341644497, 1.49378907920762, 1.99245946326349, 0.890963811567351, 
1.68203685114275, 2.76021859457356, -0.131701989478247), .Dim = c(20L, 
6L), .Dimnames = list(NULL, c("station_id", "ztd", "zwd", "iwv", 
"radiosonde", "error")), index = structure(c(892094400, 892116000, 
892137600, 892159200, 892180800, 892245600, 892267200, 892288800, 
892332000, 892353600, 892375200, 892418400, 892440000, 892461600, 
892504800, 892526400, 892548000, 892591200, 892612800, 892634400
), class = c("POSIXct", "POSIXt")), class = "zoo")

I want to perform some of the analyses that the ts package allows me to do, such as decomposing the time-series into the trend and seasonality, and looking at the auto-correlation function. However, trying to do any of these gives an error of: Error in na.fail.default(as.ts(x)) : missing values in object.

我想执行ts包允许我做的一些分析,例如将时间序列分解为趋势和季节性,以及查看自相关函数。但是,尝试执行上述任何操作都会出现以下错误:na.fail.default(as.ts(x))中的错误:缺少对象中的值。

Looking into this in more depth, it seems that all of these functions work on ts objects that have, by definition, regularly-spaced observations. My observations aren't, so I end up with a lot of NAs and everything fails.

更深入地研究这一点,似乎所有这些函数都在ts对象上工作,根据定义,这些对象具有规则间隔的观察。我的观察结果不是,所以我最终得到了很多NAs,一切都失败了。

Is there a way to analyse the irregular time-series in R? Or do I need to convert them to be regular somehow? If so, is there a simple way to do this?

有没有办法分析R中的不规则时间序列?或者我是否需要将它们转换为常规?如果是这样,有没有一种简单的方法可以做到这一点?

1 个解决方案

#1


13  

I have analysed such irregular data in the past using an additive model to "decompose" the seasonal and trend components. As this is a regression-based approach you need to model the residuals as a time series process to account for lack of independence in the residuals.

我过去使用加法模型分析了这些不规则数据,以“分解”季节和趋势成分。由于这是一种基于回归的方法,因此您需要将残差建模为时间序列过程,以说明残差中缺乏独立性。

I used the mgcv package for these analysis. Essentially the model fitted is:

我使用mgcv包进行这些分析。基本上拟合的模型是:

require(mgcv)
require(nlme)
mod <- gamm(response ~ s(dayOfYear, bs = "cc") + s(timeOfSampling), data = foo,
            correlation = corCAR1(form = ~ timeOfSampling))

Which fits a cyclic spline in the day of the year variable dayOfYear for the seasonal term and the trend is represented by timeOfSampling which is a numeric variable. The residuals are modelled here as a continuous-time AR(1) using the timeOfSampling variable as the time component of the CAR(1). This assumes that with increasing temporal separation, the correlation between residuals drops off exponentially.

这适用于季节变量dayOfYear中的循环样条用于季节性术语,趋势由timeOfSampling表示,该变量是数字变量。这里使用timeOfSampling变量作为CAR(1)的时间分量将残差建模为连续时间AR(1)。这假设随着时间间隔的增加,残差之间的相关性呈指数下降。

I have written some blog posts on some of these ideas:

我写了一些关于其中一些想法的博客文章:

  1. Smoothing temporally correlated data
  2. 平滑时间相关的数据
  3. Additive modelling and the HadCRUT3v global mean temperature series
  4. 添加剂建模和HadCRUT3v全局平均温度系列

which contain additional R code for you to follow.

其中包含额外的R代码供您遵循。

#1


13  

I have analysed such irregular data in the past using an additive model to "decompose" the seasonal and trend components. As this is a regression-based approach you need to model the residuals as a time series process to account for lack of independence in the residuals.

我过去使用加法模型分析了这些不规则数据,以“分解”季节和趋势成分。由于这是一种基于回归的方法,因此您需要将残差建模为时间序列过程,以说明残差中缺乏独立性。

I used the mgcv package for these analysis. Essentially the model fitted is:

我使用mgcv包进行这些分析。基本上拟合的模型是:

require(mgcv)
require(nlme)
mod <- gamm(response ~ s(dayOfYear, bs = "cc") + s(timeOfSampling), data = foo,
            correlation = corCAR1(form = ~ timeOfSampling))

Which fits a cyclic spline in the day of the year variable dayOfYear for the seasonal term and the trend is represented by timeOfSampling which is a numeric variable. The residuals are modelled here as a continuous-time AR(1) using the timeOfSampling variable as the time component of the CAR(1). This assumes that with increasing temporal separation, the correlation between residuals drops off exponentially.

这适用于季节变量dayOfYear中的循环样条用于季节性术语,趋势由timeOfSampling表示,该变量是数字变量。这里使用timeOfSampling变量作为CAR(1)的时间分量将残差建模为连续时间AR(1)。这假设随着时间间隔的增加,残差之间的相关性呈指数下降。

I have written some blog posts on some of these ideas:

我写了一些关于其中一些想法的博客文章:

  1. Smoothing temporally correlated data
  2. 平滑时间相关的数据
  3. Additive modelling and the HadCRUT3v global mean temperature series
  4. 添加剂建模和HadCRUT3v全局平均温度系列

which contain additional R code for you to follow.

其中包含额外的R代码供您遵循。