I have one week of data with a reading every 5 seconds. An example of data is below.
我有一周的数据,每5秒读一次。下面是一个数据示例。
9/1/2012 00:00:00 1
9/1/2012 00:00:05 2
9/1/2012 00:00:10 3
I want to calculate the hourly average for each day. Then make a multi-line plot of "average hourly reading vs. hour" with lines representing different dates.
我想计算每一天的每小时平均值。然后使用代表不同日期的线条制作“平均每小时读数与小时数”的多线图。
The one I have here is for weekly average
我在这里的是每周平均值
data$date = as.POSIXct(strptime(data$date,
format = "%d/%m/%Y %H:%M","GMT"))
means <- aggregate(data["nox"], format(data["date"],"%Y-%U"),
mean, na.rm = TRUE)
For daily average, it is
对于每日平均值,它是
data$date = as.POSIXct(strptime(data$date,
format = "%d/%m/%Y %H:%M","GMT"))
means <- aggregate(data["nox"], format(data["date"],"%Y-%j"),
mean, na.rm = TRUE)
Any one knows how to calculate the hourly average for each day.
任何人都知道如何计算每天的每小时平均值。
3 个解决方案
#1
8
I like @DWin's answer, but I had also remembered seeing once a help file for ?cut.Date
which can also be used in this case. I've made up some data so you can see the results over a few hours:
我喜欢@DWin的答案,但我还记得曾经看过一个帮助文件?cut.Date,它也适用于这种情况。我已经编制了一些数据,因此您可以在几个小时内看到结果:
set.seed(1)
data <- data.frame(date = seq(from = ISOdatetime(2012, 01, 01, 00, 00, 00),
length.out = 4320, by=5),
nox = sample(1:20, 4320, replace=TRUE))
hr.means <- aggregate(data["nox"],
list(hour = cut(data$date, breaks="hour")),
mean, na.rm = TRUE)
hr.means
# hour nox
# 1 2012-01-01 00:00:00 10.60694
# 2 2012-01-01 01:00:00 10.13194
# 3 2012-01-01 02:00:00 10.33333
# 4 2012-01-01 03:00:00 10.38194
# 5 2012-01-01 04:00:00 10.51111
# 6 2012-01-01 05:00:00 10.26944
#2
5
It would only require changing your format specification in the by-vector:
它只需要在向量中更改格式规范:
hr.means <- aggregate(dat["V1"], format(dat["date"],"%Y-%m-%d %H"),
mean, na.rm = TRUE)
hr.means
#---------
date V2
1 2012-01-09 00 2
#3
0
I got here from here so I have the data in a slightly different form, but using lubridate you can easy parse your data format as well.
我从这里到这里,所以我的数据略有不同,但使用lubridate你也可以轻松解析数据格式。
library(tibble)
library(dplyr)
library(lubridate)
tbl <- tribble(
~TIME, ~MEASURE,
"2018-01-01 06:58:50", 05,
"2018-01-01 07:00:00", 10,
"2018-01-01 07:04:45", 20,
"2018-01-01 07:04:55", 25,
"2018-01-01 07:21:00", 20,
"2018-01-01 07:58:04", 18,
"2018-01-01 07:59:59", 12,
"2018-01-01 08:00:00", 17,
"2018-01-01 08:01:04", 30
) %>% mutate(TIME = ymd_hms(TIME))
With the data in a form where you can manipulate the date/time, you can summarise it per date+hour or just per hour over all dates as this:
使用可以操作日期/时间的表格中的数据,您可以按日期+小时或每小时在所有日期汇总,如下所示:
# if you want per date
tbl %>%
mutate(date = date(TIME), hour = hour(TIME)) %>%
group_by(date, hour) %>% summarise(m = mean(MEASURE))
# if you want per hour over all dates
tbl %>%
mutate(hour = hour(TIME)) %>%
group_by(hour) %>% summarise(m = mean(MEASURE))
To plot it using points and lines with ggplot2, you can do
要使用ggplot2使用点和线来绘制它,你可以这样做
library(ggplot2)
tbl %>%
mutate(hour = hour(TIME)) %>%
group_by(hour) %>% summarise(m = mean(MEASURE)) %>%
ggplot(aes(x = hour, y = m)) + geom_point() + geom_line()
#1
8
I like @DWin's answer, but I had also remembered seeing once a help file for ?cut.Date
which can also be used in this case. I've made up some data so you can see the results over a few hours:
我喜欢@DWin的答案,但我还记得曾经看过一个帮助文件?cut.Date,它也适用于这种情况。我已经编制了一些数据,因此您可以在几个小时内看到结果:
set.seed(1)
data <- data.frame(date = seq(from = ISOdatetime(2012, 01, 01, 00, 00, 00),
length.out = 4320, by=5),
nox = sample(1:20, 4320, replace=TRUE))
hr.means <- aggregate(data["nox"],
list(hour = cut(data$date, breaks="hour")),
mean, na.rm = TRUE)
hr.means
# hour nox
# 1 2012-01-01 00:00:00 10.60694
# 2 2012-01-01 01:00:00 10.13194
# 3 2012-01-01 02:00:00 10.33333
# 4 2012-01-01 03:00:00 10.38194
# 5 2012-01-01 04:00:00 10.51111
# 6 2012-01-01 05:00:00 10.26944
#2
5
It would only require changing your format specification in the by-vector:
它只需要在向量中更改格式规范:
hr.means <- aggregate(dat["V1"], format(dat["date"],"%Y-%m-%d %H"),
mean, na.rm = TRUE)
hr.means
#---------
date V2
1 2012-01-09 00 2
#3
0
I got here from here so I have the data in a slightly different form, but using lubridate you can easy parse your data format as well.
我从这里到这里,所以我的数据略有不同,但使用lubridate你也可以轻松解析数据格式。
library(tibble)
library(dplyr)
library(lubridate)
tbl <- tribble(
~TIME, ~MEASURE,
"2018-01-01 06:58:50", 05,
"2018-01-01 07:00:00", 10,
"2018-01-01 07:04:45", 20,
"2018-01-01 07:04:55", 25,
"2018-01-01 07:21:00", 20,
"2018-01-01 07:58:04", 18,
"2018-01-01 07:59:59", 12,
"2018-01-01 08:00:00", 17,
"2018-01-01 08:01:04", 30
) %>% mutate(TIME = ymd_hms(TIME))
With the data in a form where you can manipulate the date/time, you can summarise it per date+hour or just per hour over all dates as this:
使用可以操作日期/时间的表格中的数据,您可以按日期+小时或每小时在所有日期汇总,如下所示:
# if you want per date
tbl %>%
mutate(date = date(TIME), hour = hour(TIME)) %>%
group_by(date, hour) %>% summarise(m = mean(MEASURE))
# if you want per hour over all dates
tbl %>%
mutate(hour = hour(TIME)) %>%
group_by(hour) %>% summarise(m = mean(MEASURE))
To plot it using points and lines with ggplot2, you can do
要使用ggplot2使用点和线来绘制它,你可以这样做
library(ggplot2)
tbl %>%
mutate(hour = hour(TIME)) %>%
group_by(hour) %>% summarise(m = mean(MEASURE)) %>%
ggplot(aes(x = hour, y = m)) + geom_point() + geom_line()