I have a very big dataframe in R, containing weather data with the following format.
我在R中有一个非常大的数据框,包含具有以下格式的天气数据。
valid temp
1 17/08/2014 00:20 14
2 17/08/2014 00:50 14
3 17/08/2014 01:20 13.5
4 17/08/2014 01:50 13
5 17/08/2014 02:20 12
6 17/08/2014 02:50 10
I would like to convert these sub-hourly data to hourly, like the following.
我想将这些小时数据转换为每小时,如下所示。
valid tmpc
1 2014-08-17 00:00:00 14
2 2014-08-17 01:00:00 13.75
3 2014-08-17 02:00:00 12.5
The class of df$valid is 'factor'. I have tried first converting them to Date through POSIXct, but it gives only NA values. I have also tried changing the system locale and still I get NAs.
df $ valid的类是'factor'。我尝试过首先通过POSIXct将它们转换为Date,但它只给出了NA值。我也尝试过更改系统区域设置,但仍然可以使用NA。
2 个解决方案
#1
0
We can do this in base R
by converting to POSIXlt
, set the minute
to 0, convert it back to POSIXct
and aggregate
to get the mean
of 'temp'
我们可以通过转换为POSIXlt在基数R中执行此操作,将分钟设置为0,将其转换回POSIXct并聚合以获得'temp'的平均值
df1$valid <- strptime(df1$valid, "%d/%m/%Y %H:%M")
df1$valid$min <- 0
df1$valid <- as.POSIXct(df1$valid)
aggregate(temp~valid, df1, FUN = mean)
#2
0
Option 1: The lubridate
solution using ceiling_date
or round_date
. It's not clear according to your data frame and results if what you want is to round or ceiling. For instance, in the first row you are rounding and in the third using ceiling. Anyways here the example:
选项1:使用ceiling_date或round_date的lubridate解决方案。根据您的数据框架和结果,如果您想要的是圆形或天花板,则不清楚。例如,在第一行中您是四舍五入,在第三行中使用上限。无论如何这里的例子:
library(lubridate)
df <- data.frame(i = 1, valid= "17/08/2014 01:28", temp = 14)
df$valid <- dmy_hm(df$valid)
df$valid_round <- ceiling_date(df$valid , unit="hours")
Results:
i valid temp valid_round
1 1 2014-08-17 01:28:00 14 2014-08-17 02:00:00
Option 2: using the base
functions. Use: df$valid <- as.POSIXct(strptime(df$valid, "%d/%m/%Y %H:%M", tz ="UTC")) and then round it.
选项2:使用基本功能。使用:df $ valid < - as.POSIXct(strptime(df $ valid,“%d /%m /%Y%H:%M”,tz =“UTC”))然后将其四舍五入。
#1
0
We can do this in base R
by converting to POSIXlt
, set the minute
to 0, convert it back to POSIXct
and aggregate
to get the mean
of 'temp'
我们可以通过转换为POSIXlt在基数R中执行此操作,将分钟设置为0,将其转换回POSIXct并聚合以获得'temp'的平均值
df1$valid <- strptime(df1$valid, "%d/%m/%Y %H:%M")
df1$valid$min <- 0
df1$valid <- as.POSIXct(df1$valid)
aggregate(temp~valid, df1, FUN = mean)
#2
0
Option 1: The lubridate
solution using ceiling_date
or round_date
. It's not clear according to your data frame and results if what you want is to round or ceiling. For instance, in the first row you are rounding and in the third using ceiling. Anyways here the example:
选项1:使用ceiling_date或round_date的lubridate解决方案。根据您的数据框架和结果,如果您想要的是圆形或天花板,则不清楚。例如,在第一行中您是四舍五入,在第三行中使用上限。无论如何这里的例子:
library(lubridate)
df <- data.frame(i = 1, valid= "17/08/2014 01:28", temp = 14)
df$valid <- dmy_hm(df$valid)
df$valid_round <- ceiling_date(df$valid , unit="hours")
Results:
i valid temp valid_round
1 1 2014-08-17 01:28:00 14 2014-08-17 02:00:00
Option 2: using the base
functions. Use: df$valid <- as.POSIXct(strptime(df$valid, "%d/%m/%Y %H:%M", tz ="UTC")) and then round it.
选项2:使用基本功能。使用:df $ valid < - as.POSIXct(strptime(df $ valid,“%d /%m /%Y%H:%M”,tz =“UTC”))然后将其四舍五入。