如何加速聚合查询?

时间:2021-01-04 23:36:17

Following is the aggregation query :

下面是聚合查询:

[
  {
    "$match": {
      "UserId": {
        "$in": [
          5
        ]
      },
      "WorkflowStartTime": {
        "$gte": ISODate('2015-04-09T00:00:00.000Z'),
        "$lte": ISODate('2015-04-16T00:00:00.000Z')
      }
    }
  },
  {
    "$group": {
      "_id": {
        "Task": "$TaskId",
        "WorkflowId": "$WorkflowInstanceId"
      },
      "TaskName": {
        "$first": "$Task"
      },
      "StartTime": {
        "$first": "$StartTime"
      },
      "EndTime": {
        "$last": "$EndTime"
      },
      "LastExecutionTime": {
        "$last": "$StartTime"
      },
      "WorkflowName": {
        "$first": "$WorkflowName"
      }
    }
  },
  {
    "$project": {
      "_id": 1,
      "LastExecutionTime": 1,
      "TaskName": 1,
      "AverageExecutionTime": {
        "$subtract": [
          "$EndTime",
          "$StartTime"
        ]
      },
      "WorkflowName": 1
    }
  },
  {
    "$group": {
      "_id": "$_id.Task",
      "LastExecutionTime": {
        "$last": "$LastExecutionTime"
      },
      "AverageExecutionTime": {
        "$avg": "$AverageExecutionTime"
      },
      "TaskName": {
        "$first": "$TaskName"
      },
      "TotalInstanceCount": {
        "$sum": 1
      },
      "WorkflowName": {
        "$first": "$WorkflowName"
      }
    }
  },
  {
    "$project": {
      "Id": "$_id",
      "_id": 0,
      "Name": "$TaskName",
      "LastExecutionDate": {
        "$substr": [
          "$LastExecutionTime",
          0,
          30
        ]
      },
      "AverageExecutionTimeInMilliSeconds": "$AverageExecutionTime",
      "TotalInstanceCount": "$TotalInstanceCount",
      "WorkflowName": 1
    }
  }
]

My collection documents are as follows :

我的收集文件如下:

{
        "_id" : ObjectId("550ff07ce4b09bf056df4ac1"),
        "OutputData" : "xyz",
        "InputData" : null,
        "Location" : null,
        "ChannelName" : "XYZ",
        "UserId" : 5,
        "TaskId" : 95,
        "ChannelId" : 5,
        "Status" : "Success",
        "TaskTypeId" : 7,
        "WorkflowId" : 37,
        "Task" : "XYZ",
        "WorkflowStartTime" : ISODate("2015-03-23T05:09:26Z"),
        "EndTime" : ISODate("2015-03-23T05:22:44Z"),
        "StartTime" : ISODate("2015-03-23T05:22:44Z"),
        "TaskType" : "TRIGGER",
        "WorkflowInstanceId" : "23-3-2015-95d17f17-2580-4fe3-b627-12e862af08ce",
        "StackTrace" : null,
        "WorkflowName" : "XYZ data workflow"
}

I have a index on {WorkflowStartTime:1,UserId:1, StartTime:1}

我在{WorkflowStartTime:1,UserId:1, StartTime:1}上有一个索引

Their are hardly 900000 records in collection, and as it is i am using a subset of data while quering using date range still it taking around 1.5 to 1.7 seconds. I have used aggregation framework with other collections with huge data and the performance is very good. Don't know what is wrong with this query as its showing very slow output, i expect it to be in mills as its a real time analytics query. Any pointer on it appreciated.

它们在收集中几乎没有90万的记录,而且我使用的是数据的子集,而在使用日期范围的时候,仍然需要大约1.5到1.7秒。我使用了聚合框架和其他具有大量数据的集合,性能非常好。不知道这个查询有什么问题,因为它显示的输出非常慢,我希望它是米尔斯实时分析查询。任何关于它的指示,我都很感激。

Output when {explain : true } added to aggregation query

当{explain: true}添加到聚合查询时输出

{
  "stages": [


       {
          "$cursor": {
            "query": {
              "UserId": {
                "$in": [
                  5
                ]
              },
              "WorkflowStartTime": {
                "$gte": "ISODate(2015-04-09T00:00:00Z)",
                "$lte": "ISODate(2015-04-16T00:00:00Z)"
              }
            },
            "fields": {
              "EndTime": 1,
              "StartTime": 1,
              "Task": 1,
              "TaskId": 1,
              "WorkflowInstanceId": 1,
              "WorkflowName": 1,
              "_id": 0
            },
            "plan": {
              "cursor": "BtreeCursor ",
              "isMultiKey": false,
              "scanAndOrder": false,
              "indexBounds": {
                "WorkflowStartTime": [
                  [
                    "ISODate(2015-04-16T00:00:00Z)",
                    "ISODate(2015-04-09T00:00:00Z)"
                  ]
                ],
                "UserId": [
                  [
                    5,
                    5
                  ]
                ]
              },
              "allPlans": [
                {
                  "cursor": "BtreeCursor ",
                  "isMultiKey": false,
                  "scanAndOrder": false,
                  "indexBounds": {
                    "WorkflowStartTime": [
                      [
                        "ISODate(2015-04-16T00:00:00Z)",
                        "ISODate(2015-04-09T00:00:00Z)"
                      ]
                    ],
                    "UserId": [
                      [
                        5,
                        5
                      ]
                    ]
                  }
                }
              ]
            }
          }
        },
        {
          "$group": {
            "_id": {
              "Task": "$TaskId",
              "WorkflowId": "$WorkflowInstanceId"
            },
            "TaskName": {
              "$first": "$Task"
            },
            "StartTime": {
              "$first": "$StartTime"
            },
            "EndTime": {
              "$last": "$EndTime"
            },
            "LastExecutionTime": {
              "$last": "$StartTime"
            },
            "WorkflowName": {
              "$first": "$WorkflowName"
            }
          }
        },
        {
          "$project": {
            "_id": true,
            "LastExecutionTime": true,
            "TaskName": true,
            "AverageExecutionTime": {
              "$subtract": [
                "$EndTime",
                "$StartTime"
              ]
            },
            "WorkflowName": true
          }
        },
        {
          "$group": {
            "_id": "$_id.Task",
            "LastExecutionTime": {
              "$last": "$LastExecutionTime"
            },
            "AverageExecutionTime": {
              "$avg": "$AverageExecutionTime"
            },
            "TaskName": {
              "$first": "$TaskName"
            },
            "TotalInstanceCount": {
              "$sum": {
                "$const": 1
              }
            },
            "WorkflowName": {
              "$first": "$WorkflowName"
            }
          }
        },
        {
          "$project": {
            "_id": false,
            "Id": "$_id",
            "Name": "$TaskName",
            "LastExecutionDate": {
              "$substr": [
                "$LastExecutionTime",
                {
                  "$const": 0
                },
                {
                  "$const": 30
                }
              ]
            },
            "AverageExecutionTimeInMilliSeconds": "$AverageExecutionTime",
            "TotalInstanceCount": "$TotalInstanceCount",
            "WorkflowName": true
          }
        }
      ],
      "ok": 1
    }

1 个解决方案

#1


2  

The aggregation don't use any Index. You need create a new Index:

聚合不使用任何索引。您需要创建一个新的索引:

{UserId:1,WorkflowStartTime:1}

If all is good, the agregation + explain must appear this line:

如果一切顺利,agregation + explain必须出现以下一行:

    "winningPlan" :...

#1


2  

The aggregation don't use any Index. You need create a new Index:

聚合不使用任何索引。您需要创建一个新的索引:

{UserId:1,WorkflowStartTime:1}

If all is good, the agregation + explain must appear this line:

如果一切顺利,agregation + explain必须出现以下一行:

    "winningPlan" :...