I'm helping maintain a program that's essentially a friendly read-only front-end for a big and complicated MySQL database -- the program builds ad-hoc SELECT queries from users' input, sends the queries to the DB, gets the results, post-processes them, and displays them nicely back to the user.
我正在帮助维护一个程序,它本质上是一个大型复杂的MySQL数据库的一个友好的只读前端——程序会从用户的输入中构建特定的SELECT查询,将查询发送给数据库,获取结果,然后处理它们,并将它们巧妙地显示给用户。
I'd like to add some form of reasonable/heuristic prediction for the constructed query's expected performance -- sometimes users inadvertently make queries that are inevitably going to take a very long time (because they'll return huge result sets, or because they're "going against the grain" of the way the DB is indexed) and I'd like to be able to display to the user some "somewhat reliable" information/guess about how long the query is going to take. It doesn't have to be perfect, as long as it doesn't get so badly and frequently out of whack with reality as to cause a "cry wolf" effect where users learn to disregard it;-) Based on this info, a user might decide to go get a coffee (if the estimate is 5-10 minutes), go for lunch (if it's 30-60 minutes), kill the query and try something else instead (maybe tighter limits on the info they're requesting), etc, etc.
我想添加某种形式的合理/启发式构造查询的预期性能的预测,有时用户无意中使查询不可避免地需要很长时间(因为它们会返回大型结果集,或因为他们“格格不入”的数据库索引),我想能够显示给用户一些“有点可靠”信息/猜测查询会花多长时间。它不一定是完美的,只要它不会如此糟糕,经常与现实不一致,导致“谎报军情”效应,用户学会忽视它;-)基于这一信息,用户可能决定去得到一个咖啡(如果估计是5 - 10分钟),去吃午饭(如果是30 - 60分钟),杀死查询和尝试一些其他信息(可能更严格限制他们请求),等等等等。
I'm not very familiar with MySQL's EXPLAIN statement -- I see a lot of information around on how to use it to optimize a query or a DB's schema, indexing, etc, but not much on how to use it for my more limited purpose -- simply make a prediction, taking the DB as a given (of course if the predictions are reliable enough I may eventually switch to using them also to choose between alternate forms a query could take, but, that's for the future: for now, I'd be plenty happy just to show the performance guesstimates to the users for the above-mentioned purposes).
我不是很熟悉MySQL的EXPLAIN语句,我看到很多信息在如何使用它来优化查询或DB模式、索引、等,但在如何使用它给我更多的有限目的——简单地作出预测,以DB为给定的(当然如果预测是可靠的足够我最终可能切换到使用它们也选择替代形式可以查询,但是,这是未来:现在,我非常乐意为上述目的向用户展示性能猜测)。
Any pointers...?
任何指针……?
3 个解决方案
#1
20
EXPLAIN won't give you any indication of how long a query will take. At best you could use it to guess which of two queries might be faster, but unless one of them is obviously badly written then even that is going to be very hard.
EXPLAIN不会告诉您查询需要多长时间。最好的情况下,您可以使用它来猜测两个查询中哪个可能更快,但除非其中一个明显写得很糟糕,否则即使这样做也会非常困难。
You should also be aware that if you're using sub-queries, even running EXPLAIN can be slow (almost as slow as the query itself in some cases).
您还应该注意,如果使用子查询,甚至运行EXPLAIN都可能很慢(在某些情况下几乎和查询本身一样慢)。
As far as I'm aware, MySQL doesn't provide any way to estimate the time a query will take to run. Could you log the time each query takes to run, then build an estimate based on the history of past similar queries?
据我所知,MySQL没有提供任何方法来估计查询运行的时间。您是否可以记录每个查询运行所需的时间,然后基于过去类似查询的历史构建一个估计?
#2
11
I think if you want to have a chance of building something reasonably reliable out of this, what you should do is build a statistical model out of table sizes and broken-down EXPLAIN result components correlated with query processing times. Trying to build a query execution time predictor based on thinking about the contents of an EXPLAIN is just going to spend way too long giving embarrassingly poor results before it gets refined to vague usefulness.
我认为,如果您希望有机会从中构建一些相当可靠的东西,那么您应该做的是根据表大小和与查询处理时间相关的故障解释结果组件构建一个统计模型。试图基于对EXPLAIN内容的思考构建查询执行时间预测器,只会花费太长时间给出令人尴尬的糟糕结果,直到它变得模糊有用。
#3
3
MySQL EXPLAIN has a column called Key
. If there is something in this column, this is a very good indication, it means that the query will use an index.
MySQL EXPLAIN有一个名为Key的列。如果在这个列中有什么东西,这是一个非常好的指示,这意味着查询将使用索引。
Queries that use indicies are generally safe to use since they were likely thought out by the database designer when (s)he designed the database.
使用独立查询通常是安全的,因为当数据库设计人员设计数据库时,他们可能会考虑使用独立查询。
However
然而
There is another field called Extra
. This field sometimes contains the text using_filesort
.
还有另外一个字段叫Extra。这个字段有时包含文本using_filesort。
This is very very bad. This literally means MySQL knows that the query will have a result set larger than the available memory, and therefore will start to swap the data to disk in order to sort it.
这很糟糕。这意味着MySQL知道查询的结果集将大于可用内存,因此将开始将数据交换到磁盘,以便对其进行排序。
Conclusion
结论
Instead of trying to predict the time a query takes, simply look at these two indicators. If a query is using_filesort
, deny the user. And depending on how strict you want to be, if the query is not using any keys, you should also deny it.
与其试图预测查询所需的时间,不如看看这两个指标。如果查询是using_filesort,则拒绝用户。根据您想要的严格程度,如果查询不使用任何键,您也应该拒绝它。
Read more about the resultset of the MySQL EXPLAIN statement
阅读更多关于MySQL解释语句的resultset的信息
#1
20
EXPLAIN won't give you any indication of how long a query will take. At best you could use it to guess which of two queries might be faster, but unless one of them is obviously badly written then even that is going to be very hard.
EXPLAIN不会告诉您查询需要多长时间。最好的情况下,您可以使用它来猜测两个查询中哪个可能更快,但除非其中一个明显写得很糟糕,否则即使这样做也会非常困难。
You should also be aware that if you're using sub-queries, even running EXPLAIN can be slow (almost as slow as the query itself in some cases).
您还应该注意,如果使用子查询,甚至运行EXPLAIN都可能很慢(在某些情况下几乎和查询本身一样慢)。
As far as I'm aware, MySQL doesn't provide any way to estimate the time a query will take to run. Could you log the time each query takes to run, then build an estimate based on the history of past similar queries?
据我所知,MySQL没有提供任何方法来估计查询运行的时间。您是否可以记录每个查询运行所需的时间,然后基于过去类似查询的历史构建一个估计?
#2
11
I think if you want to have a chance of building something reasonably reliable out of this, what you should do is build a statistical model out of table sizes and broken-down EXPLAIN result components correlated with query processing times. Trying to build a query execution time predictor based on thinking about the contents of an EXPLAIN is just going to spend way too long giving embarrassingly poor results before it gets refined to vague usefulness.
我认为,如果您希望有机会从中构建一些相当可靠的东西,那么您应该做的是根据表大小和与查询处理时间相关的故障解释结果组件构建一个统计模型。试图基于对EXPLAIN内容的思考构建查询执行时间预测器,只会花费太长时间给出令人尴尬的糟糕结果,直到它变得模糊有用。
#3
3
MySQL EXPLAIN has a column called Key
. If there is something in this column, this is a very good indication, it means that the query will use an index.
MySQL EXPLAIN有一个名为Key的列。如果在这个列中有什么东西,这是一个非常好的指示,这意味着查询将使用索引。
Queries that use indicies are generally safe to use since they were likely thought out by the database designer when (s)he designed the database.
使用独立查询通常是安全的,因为当数据库设计人员设计数据库时,他们可能会考虑使用独立查询。
However
然而
There is another field called Extra
. This field sometimes contains the text using_filesort
.
还有另外一个字段叫Extra。这个字段有时包含文本using_filesort。
This is very very bad. This literally means MySQL knows that the query will have a result set larger than the available memory, and therefore will start to swap the data to disk in order to sort it.
这很糟糕。这意味着MySQL知道查询的结果集将大于可用内存,因此将开始将数据交换到磁盘,以便对其进行排序。
Conclusion
结论
Instead of trying to predict the time a query takes, simply look at these two indicators. If a query is using_filesort
, deny the user. And depending on how strict you want to be, if the query is not using any keys, you should also deny it.
与其试图预测查询所需的时间,不如看看这两个指标。如果查询是using_filesort,则拒绝用户。根据您想要的严格程度,如果查询不使用任何键,您也应该拒绝它。
Read more about the resultset of the MySQL EXPLAIN statement
阅读更多关于MySQL解释语句的resultset的信息