I am trying to understand performance of an SQL query using MySQL. With only indexes on the PK, the query failed to complete in over 10mins. I have added indexes on all the columns used in the where clauses (timestamp, hostname, path, type) and the query now completes in approx 50seconds -- however this still seems a long time for what does not seem an overly complex query.
我正在尝试使用MySQL了解SQL查询的性能。由于PK上只有索引,查询在10分钟内无法完成。我已经在where子句(时间戳、主机名、路径、类型)中使用的所有列上添加了索引,现在查询在大约50秒内完成——但是对于看起来不太复杂的查询来说,这似乎仍然需要很长时间。
So, I'd like to understand what it is about the query that is causing this. My assumption is that my inner subquery is in someway causing an explosion in the number of comparisons necessary.
所以,我想了解是什么导致了这个问题。我的假设是,我的内部子查询在某种程度上导致了必要的比较数量的激增。
There are two tables involved:
涉及两张表:
storage (~5,000 rows / 4.6MB ) and machines (12 rows, <4k)
存储(~5,000行/ 4.6MB)和机器(12行,<4k)
The query is as follows:
查询内容如下:
SELECT T.hostname, T.path, T.used_pct,
T.used_gb, T.avail_gb, T.timestamp, machines.type AS type
FROM storage AS T
JOIN machines ON T.hostname = machines.hostname
WHERE timestamp = ( SELECT max(timestamp) FROM storage AS st
WHERE st.hostname = T.hostname AND
st.path = T.path)
AND (machines.type = 'nfs')
ORDER BY used_pct DESC
An EXPLAIN EXTENDED for the query returns the following:
为查询扩展的解释将返回以下内容:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY machines ref hostname,type type 768 const 1 100.00 Using where; Using temporary; Using filesort
1 PRIMARY T ref fk_hostname fk_hostname 768 monitoring.machines.hostname 4535 100.00 Using where
2 DEPENDENT SUBQUERY st ref fk_hostname,path path 1002 monitoring.T.path 648 100.00 Using where
Noticing that the 'extra' column for Row 1 includes 'using filesort' and question: MySQL explain Query understanding states that "Using filesort is a sorting algorithm where MySQL isn't able to use an index for sorting and therefore can't do the complete sort in memory."
注意到第1行的“额外”列包括“使用文件排序”和问题:MySQL explain Query understanding说“使用文件排序是一种排序算法,MySQL不能使用索引进行排序,因此无法在内存中完成完整的排序”。
What is the nature of this query which is causing slow performance?
导致性能缓慢的查询的性质是什么?
Why is it necessary for MySQL to use 'filesort' for this query?
为什么MySQL必须为这个查询使用'filesort' ?
1 个解决方案
#1
3
Indexes don't get populated, they are there as soon as you create them. That's why inserts and updates become slower the more indexes you have on a table.
索引不会被填充,只要创建它们,它们就在那里。这就是为什么插入和更新速度越慢的原因。
Your query runs fast after the first time because the whole result of the query is put into cache. To see how fast the query is without using the cache you can do
您的查询在第一次之后运行得很快,因为查询的整个结果被放入缓存中。要查看查询在不使用缓存的情况下有多快,您可以这样做
SELECT SQL_NO_CACHE T.hostname ...
MySQL uses filesort
usually for ORDER BY
or in your case to determine the maximum value for timestamp. Instead of going through all possible values and memorizing which value is the greatest, MySQL sorts the values descending and picks the first one.
MySQL通常使用filesort进行排序,根据您的情况来确定时间戳的最大值。MySQL没有遍历所有可能的值并记住哪个值最大,而是对值进行降序排序并选择第一个。
So, why is your query slow? Two things jumped into my eye.
那么,为什么查询速度慢呢?有两件事映入我的眼帘。
1) Your subquery
1)你的子查询
WHERE timestamp = ( SELECT max(timestamp) FROM storage AS st
WHERE st.hostname = T.hostname AND
st.path = T.path)
gets evaluated for every (hostname, path). Have a try with an index on timestamp (btw, I discourage naming columns like keywords / datatypes). If that alone doesn't help, try to rewrite your query. There are two excellent examples in the MySQL manual: The Rows Holding the Group-wise Maximum of a Certain Column.
获取每个(主机名、路径)的计算值。尝试使用时间戳上的索引(顺便说一句,我不鼓励像关键字/数据类型这样的命名列)。如果单靠这一点没有帮助,请尝试重写查询。MySQL手册中有两个很好的示例:包含特定列的组最大行的行。
2) This is a minor issue, but it seems you are joining on char/varchar fields. Numbers / IDs are much faster.
2)这是一个小问题,但似乎您正在加入char/varchar字段。数字/ id要快得多。
#1
3
Indexes don't get populated, they are there as soon as you create them. That's why inserts and updates become slower the more indexes you have on a table.
索引不会被填充,只要创建它们,它们就在那里。这就是为什么插入和更新速度越慢的原因。
Your query runs fast after the first time because the whole result of the query is put into cache. To see how fast the query is without using the cache you can do
您的查询在第一次之后运行得很快,因为查询的整个结果被放入缓存中。要查看查询在不使用缓存的情况下有多快,您可以这样做
SELECT SQL_NO_CACHE T.hostname ...
MySQL uses filesort
usually for ORDER BY
or in your case to determine the maximum value for timestamp. Instead of going through all possible values and memorizing which value is the greatest, MySQL sorts the values descending and picks the first one.
MySQL通常使用filesort进行排序,根据您的情况来确定时间戳的最大值。MySQL没有遍历所有可能的值并记住哪个值最大,而是对值进行降序排序并选择第一个。
So, why is your query slow? Two things jumped into my eye.
那么,为什么查询速度慢呢?有两件事映入我的眼帘。
1) Your subquery
1)你的子查询
WHERE timestamp = ( SELECT max(timestamp) FROM storage AS st
WHERE st.hostname = T.hostname AND
st.path = T.path)
gets evaluated for every (hostname, path). Have a try with an index on timestamp (btw, I discourage naming columns like keywords / datatypes). If that alone doesn't help, try to rewrite your query. There are two excellent examples in the MySQL manual: The Rows Holding the Group-wise Maximum of a Certain Column.
获取每个(主机名、路径)的计算值。尝试使用时间戳上的索引(顺便说一句,我不鼓励像关键字/数据类型这样的命名列)。如果单靠这一点没有帮助,请尝试重写查询。MySQL手册中有两个很好的示例:包含特定列的组最大行的行。
2) This is a minor issue, but it seems you are joining on char/varchar fields. Numbers / IDs are much faster.
2)这是一个小问题,但似乎您正在加入char/varchar字段。数字/ id要快得多。