Using R version 3.1.3 I'm attempting to count of events in event log data.
使用R版本3.1.3,我尝试在事件日志数据中计数事件。
I have a data set of timstamped events. I've cleaned the data, and have it loaded into a data.table for easier manipulation.
我有一个时间戳事件的数据集。我已经清理了数据,并将它加载到数据中。表更容易操作。
Colnames are OrderDate, EventDate, OrderID, EventTypeID, LocationID and EncounterID,
colname是OrderDate、EventDate、OrderID、EventTypeID、LocationID和EncounterID,
These events are aggregated as: EncounterID's have multiple orderID, each orderID has multiple eventID
这些事件聚合为:EncounterID有多个orderID,每个orderID有多个eventID
Examples of data would be:
数据的例子如下:
library(data.table)
DT <- fread("OrderDate,EventDate,OrderID,EventTypeID,LocationID,EncounterID
1/12/2012 5:40,01/12/2012 05:40,100001,12344,1,5998887
1/12/2012 5:40,01/12/2012 05:49,100001,12345,1,5998887
1/12/2012 5:40,01/12/2012 06:40,100001,12345,1,5998887
1/12/2012 5:45,01/12/2012 05:45,100002,12344,1,5998887
1/12/2012 5:45,01/12/2012 05:49,100002,12345,1,5998887
1/12/2012 5:45,01/12/2012 06:40,100002,12345,1,5998887
1/12/2012 5:46,01/12/2012 05:46,100003,12344,2,5948887
1/12/2012 5:46,01/12/2012 05:49,100003,12345,2,5948887
1/12/2013 7:40,01/12/2013 07:40,123001,12345,2,6008887
1/12/2013 7:40,01/12/2013 07:41,123001,12346,2,6008887
1/12/2013 7:40,01/12/2013 07:50,123001,12345,2,6008887
1/12/2013 7:40,01/12/2013 07:55,123001,12345,2,6008887")
DT$OrderDate <- as.POSIXct(DT$OrderDate, format="%d/%m/%Y %H:%M")
DT$EventDate <- as.POSIXct(DT$EventDate, format="%d/%m/%Y %H:%M")
My ultimate goal is to explore this data visually using ggplot2, looking at the count of various combinations by month... but I'm having trouble aggregating the data using data.table's
我的最终目标是使用ggplot2来可视化地研究这些数据,按月查看各种组合的计数……但是我在使用data.table来聚合数据时遇到了麻烦
My specific question (one example) How can I generate a table of of the following: Month-Year, LocationID, Count_of_Orders
我的特定问题(一个示例)如何生成以下表:Month-Year、LocationID、Count_of_Orders
If I do the following:
如果我做以下事情:
DT[,.N,by=.(month(OrderDate),year(OrderDate))]
I get a count of all the eventID's, but I need the Count of OrderID's per month per locationID.
我得到所有事件的计数,但是我需要每个地区每个月的OrderID计数。
month year N
1: 12 2012 8
2: 12 2013 4
BUT - what I'm looking for is results of N by Month-year by LocationID:
但是-我要找的是N按月,按年,按位置分类的结果:
Month-Year,LocationID,Count_of_orders
01-12,1,2
01-12,2,1
01-13,1,0
01-13,2,1
NOTE: Notice, that for any location that doesn't have orders in a month, they should be listed with count zero. The locations would therefore need to be determined by generating a list of unique locationIDs.
注意:注意,对于一个月内没有订单的任何位置,都应该用count 0列出。因此,需要通过生成唯一的locationid列表来确定位置。
Can someone please provide solutions?
有人能提供解决方案吗?
Thanks
谢谢
1 个解决方案
#1
2
I'm assuming your date/times are in POSIXct
format (since you call month
/year
). Then,
我假设你的日期/时间是正的(因为你叫月/年)。然后,
d[, month.year := format(OrderDate, '%m-%y')]
setkey(d, month.year, LocationID, OrderID)
unique(d)[CJ(unique(month.year), unique(LocationID)), .N, by = .EACHI]
# month.year LocationID N
#1: 01-12 1 2
#2: 01-12 2 1
#3: 01-13 1 0
#4: 01-13 2 1
I used the fact that unique
by default will pick unique entries by the key, and would also preserve the key, so I can do the next join easily.
我使用的事实是,unique默认情况下将根据键选择唯一的条目,并且还将保存键,因此我可以轻松地执行下一个join。
#1
2
I'm assuming your date/times are in POSIXct
format (since you call month
/year
). Then,
我假设你的日期/时间是正的(因为你叫月/年)。然后,
d[, month.year := format(OrderDate, '%m-%y')]
setkey(d, month.year, LocationID, OrderID)
unique(d)[CJ(unique(month.year), unique(LocationID)), .N, by = .EACHI]
# month.year LocationID N
#1: 01-12 1 2
#2: 01-12 2 1
#3: 01-13 1 0
#4: 01-13 2 1
I used the fact that unique
by default will pick unique entries by the key, and would also preserve the key, so I can do the next join easily.
我使用的事实是,unique默认情况下将根据键选择唯一的条目,并且还将保存键,因此我可以轻松地执行下一个join。