R is new for me and i'm working with a (private)dataset.
R对我来说是新的,我正在使用(私有)数据集。
I have the following problem, i have a lot of time series:
我有以下问题,我有很多时间序列:
2015-04-27 12:29:48
2015-04-27 12:31:48
2015-04-27 12:34:50
2015-04-27 12:50:43
2015-04-27 12:53:55
2015-04-28 00:00:00
2015-04-28 00:00:10
All the timeseries have a value:
所有时间序列都有一个值:
Datetime value
2015-04-27 12:29:48 0.0
2015-04-27 12:31:48 0.0
2015-04-27 12:34:50 1.1
2015-04-27 12:50:43 45.0
2015-04-27 12:53:55 0.0
2015-04-28 00:00:00 1.0
2015-04-28 00:00:10 2.0
I want to skip all the hours and minutes, and sum it all together like this:
我想跳过所有的小时和分钟,并将它们总结如下:
Datetime value
2015-04-27 46.1
2015-04-28 3.0
The first thing i did was transform the column datetime:
我做的第一件事就是改变列日期时间:
energy$datetime <- as.POSIXlt(energy$datetime)
I tried several stuff with the summarize function:
我用sumrize函数尝试了几个东西:
df %>% group_by(energy$datetime) %>% summarize (energy$newname(energy$value))
But that isn't working.
但这不起作用。
I also read competitive stuff on the internet (e.g.: http://r.789695.n4.nabble.com/How-to-sum-and-group-data-by-DATE-in-data-frame-td903708.html) but it doesn't make sense to me (yep, i'm a noob).
我还在互联网上阅读有竞争力的东西(例如:http://r.789695.n4.nabble.com/How-to-sum-and-group-data-by-DATE-in-data-frame-td903708.html )但它对我没有意义(是的,我是一个菜鸟)。
Hopefully someone could help me!
希望有人可以帮助我!
4 个解决方案
#1
8
Use as.Date() then aggregate().
使用as.Date()然后使用aggregate()。
energy$Date <- as.Date(energy$Datetime)
aggregate(energy$value, by=list(energy$Date), sum)
EDIT
Emma made a good point about column names. You can preserve column names in aggregate by using the following instead.
Emma对列名称提出了一个很好的观点。您可以使用以下代码保留聚合中的列名称。
aggregate(energy["value"], by=energy["Date"], sum)
#2
2
using data.table
Test$Datetime <- as.Date(Test$Datetime)
DT<- data.table(Test )
DT[,sum(value),by = Datetime]
Datetime V1
1: 2015-04-27 46.1
2: 2015-04-28 3.0
#3
0
you are on the right path - try : summarise(newVal = sum(energy$value) )
for your summarise call.df<- energy %>% group_by(datetime) %>% summarise(sum =sum(value)) )
你走在正确的道路上 - 尝试:总结(newVal = sum(能量$值))进行总结性调用。 df < - energy%>%group_by(datetime)%>%summary(sum = sum(value)))
#4
0
Using the tidyverse, specifically lubridate and dplyr:
使用tidyverse,特别是lubridate和dplyr:
library(lubridate)
library(tidyverse)
set.seed(10)
df <- tibble(Datetime = sample(seq(as.POSIXct("2015-04-27"), as.POSIXct("2015-04-29"), by = "min"), 10),
value = sample(1:100, 10)) %>%
arrange(Datetime)
df
#> # A tibble: 10 x 2
#> Datetime value
#> <dttm> <int>
#> 1 2015-04-27 04:04:00 35
#> 2 2015-04-27 10:48:00 41
#> 3 2015-04-27 13:02:00 25
#> 4 2015-04-27 13:09:00 5
#> 5 2015-04-27 14:43:00 57
#> 6 2015-04-27 20:29:00 12
#> 7 2015-04-27 20:34:00 77
#> 8 2015-04-28 00:22:00 66
#> 9 2015-04-28 05:29:00 37
#> 10 2015-04-28 09:14:00 58
df %>%
mutate(date_col = date(Datetime)) %>%
group_by(date_col) %>%
summarize(value = sum(value))
#> # A tibble: 2 x 2
#> date_col value
#> <date> <int>
#> 1 2015-04-27 252
#> 2 2015-04-28 161
Created on 2018-08-01 by the reprex package (v0.2.0).
由reprex包创建于2018-08-01(v0.2.0)。
#1
8
Use as.Date() then aggregate().
使用as.Date()然后使用aggregate()。
energy$Date <- as.Date(energy$Datetime)
aggregate(energy$value, by=list(energy$Date), sum)
EDIT
Emma made a good point about column names. You can preserve column names in aggregate by using the following instead.
Emma对列名称提出了一个很好的观点。您可以使用以下代码保留聚合中的列名称。
aggregate(energy["value"], by=energy["Date"], sum)
#2
2
using data.table
Test$Datetime <- as.Date(Test$Datetime)
DT<- data.table(Test )
DT[,sum(value),by = Datetime]
Datetime V1
1: 2015-04-27 46.1
2: 2015-04-28 3.0
#3
0
you are on the right path - try : summarise(newVal = sum(energy$value) )
for your summarise call.df<- energy %>% group_by(datetime) %>% summarise(sum =sum(value)) )
你走在正确的道路上 - 尝试:总结(newVal = sum(能量$值))进行总结性调用。 df < - energy%>%group_by(datetime)%>%summary(sum = sum(value)))
#4
0
Using the tidyverse, specifically lubridate and dplyr:
使用tidyverse,特别是lubridate和dplyr:
library(lubridate)
library(tidyverse)
set.seed(10)
df <- tibble(Datetime = sample(seq(as.POSIXct("2015-04-27"), as.POSIXct("2015-04-29"), by = "min"), 10),
value = sample(1:100, 10)) %>%
arrange(Datetime)
df
#> # A tibble: 10 x 2
#> Datetime value
#> <dttm> <int>
#> 1 2015-04-27 04:04:00 35
#> 2 2015-04-27 10:48:00 41
#> 3 2015-04-27 13:02:00 25
#> 4 2015-04-27 13:09:00 5
#> 5 2015-04-27 14:43:00 57
#> 6 2015-04-27 20:29:00 12
#> 7 2015-04-27 20:34:00 77
#> 8 2015-04-28 00:22:00 66
#> 9 2015-04-28 05:29:00 37
#> 10 2015-04-28 09:14:00 58
df %>%
mutate(date_col = date(Datetime)) %>%
group_by(date_col) %>%
summarize(value = sum(value))
#> # A tibble: 2 x 2
#> date_col value
#> <date> <int>
#> 1 2015-04-27 252
#> 2 2015-04-28 161
Created on 2018-08-01 by the reprex package (v0.2.0).
由reprex包创建于2018-08-01(v0.2.0)。