I have some data that I need to analyse easily. I want to create a graph of the average usage per day of a week. The data is in a data.table
with the following structure:
我有一些数据需要我轻松分析。我想创建一周中每天平均使用量的图表。数据位于data.table中,具有以下结构:
time value
2014-10-22 23:59:54 7433033.0
2014-10-23 00:00:12 7433034.0
2014-10-23 00:00:31 7433035.0
2014-10-23 00:00:49 7433036.0
...
2014-10-23 23:59:21 7443032.0
2014-10-23 23:59:40 7443033.0
2014-10-23 23:59:59 7443034.0
2014-10-24 00:00:19 7443035.0
Since the value is cumulative, I would need the maximum value of a day, minus the minimum value of that day, and then average all the values with the same days.
由于该值是累积的,我需要一天的最大值,减去当天的最小值,然后平均所有具有相同天数的值。
I already know how to get the day of the week (using as.POSIXlt
and $wday
). So how can I get the daily difference? Once I have the data in a structure like:
我已经知道如何获得星期几(使用as.POSIXlt和$ wday)。那么如何才能获得日常差异呢?一旦我将数据放在如下结构中:
dayOfWeek value
0 10
1 20
2 50
I should be able to find the mean
myself using some functions.
我应该能够使用某些功能找到自己的意思。
Here is a sample:
这是一个示例:
library(data.table)
data <- fread("http://pastebin.com/raw.php?i=GXGiCAiu", header=T)
#get the difference per day
#create average per day of week
3 个解决方案
#1
This is actually a trickier problem than it seemed at first glance. I think you need two separate aggregations, one to aggregate the cumulative usage values within each calendar day by taking the difference of the range, and then a second to aggregate the per-calendar-day usage values by weekday. You can extract the weekday with weekdays()
, calculate the daily difference with diff()
on the range()
, calculate the mean with mean()
, and aggregate with aggregate()
:
这实际上比初看起来更棘手。我认为您需要两个单独的聚合,一个用于通过获取范围的差异来聚合每个日历日内的累积使用值,然后用一秒来按工作日聚合每个日历日使用值。您可以使用工作日()提取工作日,使用范围()上的diff()计算每日差异,使用mean()计算平均值,并使用aggregate()进行聚合:
set.seed(1);
N <- as.integer(60*60*24/19*14);
df <- data.frame(time=seq(as.POSIXct('2014-10-23 00:00:12',tz='UTC'),by=19,length.out=N)+rnorm(N,0,0.5), value=seq(7433034,by=1,length.out=N)+rnorm(N,0,0.5) );
head(df);
## time value
## 1 2014-10-23 00:00:11 7433034
## 2 2014-10-23 00:00:31 7433035
## 3 2014-10-23 00:00:49 7433036
## 4 2014-10-23 00:01:09 7433037
## 5 2014-10-23 00:01:28 7433039
## 6 2014-10-23 00:01:46 7433039
tail(df);
## time value
## 63658 2014-11-05 23:58:14 7496691
## 63659 2014-11-05 23:58:33 7496692
## 63660 2014-11-05 23:58:51 7496693
## 63661 2014-11-05 23:59:11 7496694
## 63662 2014-11-05 23:59:31 7496695
## 63663 2014-11-05 23:59:49 7496697
df2 <- aggregate(value~date,cbind(df,date=as.Date(df$time)),function(x) diff(range(x)));
df2;
## date value
## 1 2014-10-23 4547.581
## 2 2014-10-24 4546.679
## 3 2014-10-25 4546.410
## 4 2014-10-26 4545.726
## 5 2014-10-27 4546.602
## 6 2014-10-28 4545.194
## 7 2014-10-29 4546.136
## 8 2014-10-30 4546.454
## 9 2014-10-31 4545.712
## 10 2014-11-01 4546.901
## 11 2014-11-02 4544.684
## 12 2014-11-03 4546.378
## 13 2014-11-04 4547.061
## 14 2014-11-05 4547.082
df3 <- aggregate(value~dayOfWeek,cbind(df2,dayOfWeek=weekdays(df2$date)),mean);
df3;
## dayOfWeek value
## 1 Friday 4546.196
## 2 Monday 4546.490
## 3 Saturday 4546.656
## 4 Sunday 4545.205
## 5 Thursday 4547.018
## 6 Tuesday 4546.128
## 7 Wednesday 4546.609
#2
There are many ways to do this with R. You can use ave
from base R or data.table
or dplyr
packages. These solutions all add the summaries as columns of your data.
使用R可以有很多方法。您可以使用基本R或data.table或dplyr包中的ave。这些解决方案都将摘要添加为数据列。
data
df <- data.frame(dayOfWeek = c(0L, 0L, 1L, 1L, 2L),
value = c(10L, 5L, 20L, 60L, 50L))
base r
df$min <- ave(df$value, df$dayOfWeek, FUN = min)
df$max <- ave(df$value, df$dayOfWeek, FUN = max)
data.table
require(data.table)
setDT(df)[, ":="(min = min(value), max = max(value)), by = dayOfWeek][]
dplyr
require(dplyr)
df %>% group_by(dayOfWeek) %>% mutate(min = min(value), max = max(value))
If you just want the summaries, you can also use the following:
如果您只想要摘要,还可以使用以下内容:
# base
aggregate(value~dayOfWeek, df, FUN = min)
aggregate(value~dayOfWeek, df, FUN = max)
# data.table
setDT(df)[, list(min = min(value), max = max(value)), by = dayOfWeek]
# dplyr
df %>% group_by(dayOfWeek) %>% summarise(min(value), max(value))
#3
Came across this looking for something else. I think you were looking for the difference and mean per Monday, Tuesday, etc. Sticking with data.table allows a quick all in one call to get the mean per day of week and the difference per day of the week. This gives an output of 7 rows and three columns.
遇到这个寻找别的东西。我认为你正在寻找每周一,周二等的差异和平均值。坚持使用data.table允许快速一次性调用以获得每周的平均值和一周中每天的差异。这给出了7行和3列的输出。
library(data.table)
data <- fread("http://pastebin.com/raw.php?i=GXGiCAiu", header=T)
data_summary <- data[,list(mean = mean(value),
diff = max(value)-min(value)),
by = list(date = format(as.POSIXct(time), format = "%A"))]
This gives an output of 7 rows and three columns.
这给出了7行和3列的输出。
date mean diff
1: Thursday 7470107 166966
2: Friday 7445945 6119
3: Saturday 7550000 100000
4: Sunday 7550000 100000
5: Monday 7550000 100000
6: Tuesday 7550000 100000
7: Wednesday 7550000 100000
#1
This is actually a trickier problem than it seemed at first glance. I think you need two separate aggregations, one to aggregate the cumulative usage values within each calendar day by taking the difference of the range, and then a second to aggregate the per-calendar-day usage values by weekday. You can extract the weekday with weekdays()
, calculate the daily difference with diff()
on the range()
, calculate the mean with mean()
, and aggregate with aggregate()
:
这实际上比初看起来更棘手。我认为您需要两个单独的聚合,一个用于通过获取范围的差异来聚合每个日历日内的累积使用值,然后用一秒来按工作日聚合每个日历日使用值。您可以使用工作日()提取工作日,使用范围()上的diff()计算每日差异,使用mean()计算平均值,并使用aggregate()进行聚合:
set.seed(1);
N <- as.integer(60*60*24/19*14);
df <- data.frame(time=seq(as.POSIXct('2014-10-23 00:00:12',tz='UTC'),by=19,length.out=N)+rnorm(N,0,0.5), value=seq(7433034,by=1,length.out=N)+rnorm(N,0,0.5) );
head(df);
## time value
## 1 2014-10-23 00:00:11 7433034
## 2 2014-10-23 00:00:31 7433035
## 3 2014-10-23 00:00:49 7433036
## 4 2014-10-23 00:01:09 7433037
## 5 2014-10-23 00:01:28 7433039
## 6 2014-10-23 00:01:46 7433039
tail(df);
## time value
## 63658 2014-11-05 23:58:14 7496691
## 63659 2014-11-05 23:58:33 7496692
## 63660 2014-11-05 23:58:51 7496693
## 63661 2014-11-05 23:59:11 7496694
## 63662 2014-11-05 23:59:31 7496695
## 63663 2014-11-05 23:59:49 7496697
df2 <- aggregate(value~date,cbind(df,date=as.Date(df$time)),function(x) diff(range(x)));
df2;
## date value
## 1 2014-10-23 4547.581
## 2 2014-10-24 4546.679
## 3 2014-10-25 4546.410
## 4 2014-10-26 4545.726
## 5 2014-10-27 4546.602
## 6 2014-10-28 4545.194
## 7 2014-10-29 4546.136
## 8 2014-10-30 4546.454
## 9 2014-10-31 4545.712
## 10 2014-11-01 4546.901
## 11 2014-11-02 4544.684
## 12 2014-11-03 4546.378
## 13 2014-11-04 4547.061
## 14 2014-11-05 4547.082
df3 <- aggregate(value~dayOfWeek,cbind(df2,dayOfWeek=weekdays(df2$date)),mean);
df3;
## dayOfWeek value
## 1 Friday 4546.196
## 2 Monday 4546.490
## 3 Saturday 4546.656
## 4 Sunday 4545.205
## 5 Thursday 4547.018
## 6 Tuesday 4546.128
## 7 Wednesday 4546.609
#2
There are many ways to do this with R. You can use ave
from base R or data.table
or dplyr
packages. These solutions all add the summaries as columns of your data.
使用R可以有很多方法。您可以使用基本R或data.table或dplyr包中的ave。这些解决方案都将摘要添加为数据列。
data
df <- data.frame(dayOfWeek = c(0L, 0L, 1L, 1L, 2L),
value = c(10L, 5L, 20L, 60L, 50L))
base r
df$min <- ave(df$value, df$dayOfWeek, FUN = min)
df$max <- ave(df$value, df$dayOfWeek, FUN = max)
data.table
require(data.table)
setDT(df)[, ":="(min = min(value), max = max(value)), by = dayOfWeek][]
dplyr
require(dplyr)
df %>% group_by(dayOfWeek) %>% mutate(min = min(value), max = max(value))
If you just want the summaries, you can also use the following:
如果您只想要摘要,还可以使用以下内容:
# base
aggregate(value~dayOfWeek, df, FUN = min)
aggregate(value~dayOfWeek, df, FUN = max)
# data.table
setDT(df)[, list(min = min(value), max = max(value)), by = dayOfWeek]
# dplyr
df %>% group_by(dayOfWeek) %>% summarise(min(value), max(value))
#3
Came across this looking for something else. I think you were looking for the difference and mean per Monday, Tuesday, etc. Sticking with data.table allows a quick all in one call to get the mean per day of week and the difference per day of the week. This gives an output of 7 rows and three columns.
遇到这个寻找别的东西。我认为你正在寻找每周一,周二等的差异和平均值。坚持使用data.table允许快速一次性调用以获得每周的平均值和一周中每天的差异。这给出了7行和3列的输出。
library(data.table)
data <- fread("http://pastebin.com/raw.php?i=GXGiCAiu", header=T)
data_summary <- data[,list(mean = mean(value),
diff = max(value)-min(value)),
by = list(date = format(as.POSIXct(time), format = "%A"))]
This gives an output of 7 rows and three columns.
这给出了7行和3列的输出。
date mean diff
1: Thursday 7470107 166966
2: Friday 7445945 6119
3: Saturday 7550000 100000
4: Sunday 7550000 100000
5: Monday 7550000 100000
6: Tuesday 7550000 100000
7: Wednesday 7550000 100000