Following is the aggregation query :
下面是聚合查询:
[
{
"$match": {
"UserId": {
"$in": [
5
]
},
"WorkflowStartTime": {
"$gte": ISODate('2015-04-09T00:00:00.000Z'),
"$lte": ISODate('2015-04-16T00:00:00.000Z')
}
}
},
{
"$group": {
"_id": {
"Task": "$TaskId",
"WorkflowId": "$WorkflowInstanceId"
},
"TaskName": {
"$first": "$Task"
},
"StartTime": {
"$first": "$StartTime"
},
"EndTime": {
"$last": "$EndTime"
},
"LastExecutionTime": {
"$last": "$StartTime"
},
"WorkflowName": {
"$first": "$WorkflowName"
}
}
},
{
"$project": {
"_id": 1,
"LastExecutionTime": 1,
"TaskName": 1,
"AverageExecutionTime": {
"$subtract": [
"$EndTime",
"$StartTime"
]
},
"WorkflowName": 1
}
},
{
"$group": {
"_id": "$_id.Task",
"LastExecutionTime": {
"$last": "$LastExecutionTime"
},
"AverageExecutionTime": {
"$avg": "$AverageExecutionTime"
},
"TaskName": {
"$first": "$TaskName"
},
"TotalInstanceCount": {
"$sum": 1
},
"WorkflowName": {
"$first": "$WorkflowName"
}
}
},
{
"$project": {
"Id": "$_id",
"_id": 0,
"Name": "$TaskName",
"LastExecutionDate": {
"$substr": [
"$LastExecutionTime",
0,
30
]
},
"AverageExecutionTimeInMilliSeconds": "$AverageExecutionTime",
"TotalInstanceCount": "$TotalInstanceCount",
"WorkflowName": 1
}
}
]
My collection documents are as follows :
我的收集文件如下:
{
"_id" : ObjectId("550ff07ce4b09bf056df4ac1"),
"OutputData" : "xyz",
"InputData" : null,
"Location" : null,
"ChannelName" : "XYZ",
"UserId" : 5,
"TaskId" : 95,
"ChannelId" : 5,
"Status" : "Success",
"TaskTypeId" : 7,
"WorkflowId" : 37,
"Task" : "XYZ",
"WorkflowStartTime" : ISODate("2015-03-23T05:09:26Z"),
"EndTime" : ISODate("2015-03-23T05:22:44Z"),
"StartTime" : ISODate("2015-03-23T05:22:44Z"),
"TaskType" : "TRIGGER",
"WorkflowInstanceId" : "23-3-2015-95d17f17-2580-4fe3-b627-12e862af08ce",
"StackTrace" : null,
"WorkflowName" : "XYZ data workflow"
}
I have a index on {WorkflowStartTime:1,UserId:1, StartTime:1}
我在{WorkflowStartTime:1,UserId:1, StartTime:1}上有一个索引
Their are hardly 900000 records in collection, and as it is i am using a subset of data while quering using date range still it taking around 1.5 to 1.7 seconds. I have used aggregation framework with other collections with huge data and the performance is very good. Don't know what is wrong with this query as its showing very slow output, i expect it to be in mills as its a real time analytics query. Any pointer on it appreciated.
它们在收集中几乎没有90万的记录,而且我使用的是数据的子集,而在使用日期范围的时候,仍然需要大约1.5到1.7秒。我使用了聚合框架和其他具有大量数据的集合,性能非常好。不知道这个查询有什么问题,因为它显示的输出非常慢,我希望它是米尔斯实时分析查询。任何关于它的指示,我都很感激。
Output when {explain : true } added to aggregation query
当{explain: true}添加到聚合查询时输出
{
"stages": [
{
"$cursor": {
"query": {
"UserId": {
"$in": [
5
]
},
"WorkflowStartTime": {
"$gte": "ISODate(2015-04-09T00:00:00Z)",
"$lte": "ISODate(2015-04-16T00:00:00Z)"
}
},
"fields": {
"EndTime": 1,
"StartTime": 1,
"Task": 1,
"TaskId": 1,
"WorkflowInstanceId": 1,
"WorkflowName": 1,
"_id": 0
},
"plan": {
"cursor": "BtreeCursor ",
"isMultiKey": false,
"scanAndOrder": false,
"indexBounds": {
"WorkflowStartTime": [
[
"ISODate(2015-04-16T00:00:00Z)",
"ISODate(2015-04-09T00:00:00Z)"
]
],
"UserId": [
[
5,
5
]
]
},
"allPlans": [
{
"cursor": "BtreeCursor ",
"isMultiKey": false,
"scanAndOrder": false,
"indexBounds": {
"WorkflowStartTime": [
[
"ISODate(2015-04-16T00:00:00Z)",
"ISODate(2015-04-09T00:00:00Z)"
]
],
"UserId": [
[
5,
5
]
]
}
}
]
}
}
},
{
"$group": {
"_id": {
"Task": "$TaskId",
"WorkflowId": "$WorkflowInstanceId"
},
"TaskName": {
"$first": "$Task"
},
"StartTime": {
"$first": "$StartTime"
},
"EndTime": {
"$last": "$EndTime"
},
"LastExecutionTime": {
"$last": "$StartTime"
},
"WorkflowName": {
"$first": "$WorkflowName"
}
}
},
{
"$project": {
"_id": true,
"LastExecutionTime": true,
"TaskName": true,
"AverageExecutionTime": {
"$subtract": [
"$EndTime",
"$StartTime"
]
},
"WorkflowName": true
}
},
{
"$group": {
"_id": "$_id.Task",
"LastExecutionTime": {
"$last": "$LastExecutionTime"
},
"AverageExecutionTime": {
"$avg": "$AverageExecutionTime"
},
"TaskName": {
"$first": "$TaskName"
},
"TotalInstanceCount": {
"$sum": {
"$const": 1
}
},
"WorkflowName": {
"$first": "$WorkflowName"
}
}
},
{
"$project": {
"_id": false,
"Id": "$_id",
"Name": "$TaskName",
"LastExecutionDate": {
"$substr": [
"$LastExecutionTime",
{
"$const": 0
},
{
"$const": 30
}
]
},
"AverageExecutionTimeInMilliSeconds": "$AverageExecutionTime",
"TotalInstanceCount": "$TotalInstanceCount",
"WorkflowName": true
}
}
],
"ok": 1
}
1 个解决方案
#1
2
The aggregation don't use any Index. You need create a new Index:
聚合不使用任何索引。您需要创建一个新的索引:
{UserId:1,WorkflowStartTime:1}
If all is good, the agregation + explain must appear this line:
如果一切顺利,agregation + explain必须出现以下一行:
"winningPlan" :...
#1
2
The aggregation don't use any Index. You need create a new Index:
聚合不使用任何索引。您需要创建一个新的索引:
{UserId:1,WorkflowStartTime:1}
If all is good, the agregation + explain must appear this line:
如果一切顺利,agregation + explain必须出现以下一行:
"winningPlan" :...