使用索引时的MySQL查询优化

时间:2021-04-10 00:08:12

I'm busy to optimize and understand the group by function in MySQL. From the answer on SO i learned from Gordon:

我忙着在MySQL中对组进行优化和理解。从答案中,我从戈登那里学到:

All columns in the select should either be columns in the group by or use aggregate functions (sum(), avg(), and so on).

select中的所有列要么是组中的列,要么使用聚合函数(sum()、avg()等)。

I have to following table and query

我必须遵循表格和查询

Table

+-----------------+-----------+------------+-------------+
|Id (primary key) | ip(index) | lastattack | create_date |
+-----------------+-----------+------------+-------------+

Query

查询

  SELECT ip,
         lastattack
    FROM blacklist
   WHERE ip = 'xxx.xxx.xxx.xxx'
GROUP BY ip

When I execute the above query I receive the following info from EXPLAIN

当我执行上述查询时,我从EXPLAIN获得以下信息

+----+-------------+-------+------+---------------+-----+---------+-----+------+----------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+---------------+-----+---------+-----+------+----------+-------+
| 1  | SIMPLE      | ipall | ref  | idx           | idx | 257     |const| 2    | 100.00   | Using index condition |
+----+-------------+-------+------+---------------+-----+---------+-----+------+----------+-------+

When i execute the query like Gordon told me i receive the following in extra

当我像Gordon告诉我的那样执行查询时,我收到了以下额外的查询

  SELECT ip,
         lastattack
    FROM blacklist
   WHERE ip = 'xxx.xxx.xxx.xxx'
GROUP BY ip, lastattack

Using index condition; Using where; Using temporary; Using filesort

使用索引条件;使用的地方;使用临时的;使用filesort

People told me to avoid temporary or filesort.

人们告诉我要避免临时的或文件共享。

1 个解决方案

#1


2  

Your query, as it stands, is incorrect - or rather, it is ambiguous. Suppose you have:

您的查询是不正确的,或者更确切地说,它是不明确的。假设你有:

192.168.0.1    Attack1    2017-10-01 23:30
192.168.0.1    Attack2    2017-10-01 23:35

Which value of lastattack should be output? You do not provide enough data to the server, which can't read your mind and surmise that if the field is called "lastattack", you probably want the one with greatest timestamp.

应该输出lastattack的哪个值?您没有向服务器提供足够的数据,服务器无法读取您的思想,并推测如果该字段被称为“lastattack”,那么您可能需要具有最大时间戳的字段。

This is what Gordon Linoff was saying - "All columns in the select should either be columns in the GROUP BY, or use aggregate functions"; here, lastattack is neither, as you do not GROUP BY lastattack (but only by IP), and you do not aggregate it (you SELECT lastattack, not AVG(lastattack) or SOME_AGGREGATE_FUNCTION(lastattack)).

这就是Gordon Linoff所说的——“select中的所有列要么是GROUP BY中的列,要么使用聚合函数”;在这里,lastattack不是,因为您不使用lastattack(而是IP)分组,也不聚合它(您选择lastattack,而不是AVG(lastattack)或SOME_AGGREGATE_FUNCTION(lastattack)))。

You might still get the correct value - but you might not. In practice, records will be retrieved in deterministic order, and chances are that's the order you want. But other DB implementations might fetch the first value they encounter and leave you with the first attack instead of the last.

您可能仍然得到正确的值,但您可能不会。在实践中,记录将以确定的顺序检索,这很可能是您想要的顺序。但是,其他的DB实现可能会获取它们遇到的第一个值,并将第一次攻击留给您,而不是最后一次。

To obtain the result you want, you need first to establish the date of the last attack:

要获得您想要的结果,您需要首先确定最后一次攻击的日期:

SELECT ip, MAX(attackdate) AS maxdate FROM blacklist GROUP BY ip;

This gives you a table with the correct timestamps. To get the last attack you need a JOIN (which risks duplicates if two attacks came in the same second, so that you can't determine which was the last one):

这将为您提供一个具有正确时间戳的表。为了获得最后一次攻击,您需要一个连接(如果两个攻击同时发生,那么就有重复的风险,所以您不能确定最后一个攻击是哪一个):

SELECT a.ip, a.maxdate, b.lastattack
    FROM (
        SELECT ip, MAX(attackdate) AS maxdate FROM blacklist GROUP BY ip
    ) AS a
JOIN blacklist AS b ON (a.ip = b.ip AND a.maxdate = b.lastattack)

You need an index on ip and attackdate for the inner query, and that should also work for the outer query. You might want to create an index on ip, attackdate and lastattack in this order to see whether that changes anything.

对于内部查询,您需要一个关于ip的索引和attackdate,这对于外部查询也应该有效。您可能希望在此基础上创建一个关于ip、attackdate和lastattack的索引,以查看它是否更改了任何内容。

#1


2  

Your query, as it stands, is incorrect - or rather, it is ambiguous. Suppose you have:

您的查询是不正确的,或者更确切地说,它是不明确的。假设你有:

192.168.0.1    Attack1    2017-10-01 23:30
192.168.0.1    Attack2    2017-10-01 23:35

Which value of lastattack should be output? You do not provide enough data to the server, which can't read your mind and surmise that if the field is called "lastattack", you probably want the one with greatest timestamp.

应该输出lastattack的哪个值?您没有向服务器提供足够的数据,服务器无法读取您的思想,并推测如果该字段被称为“lastattack”,那么您可能需要具有最大时间戳的字段。

This is what Gordon Linoff was saying - "All columns in the select should either be columns in the GROUP BY, or use aggregate functions"; here, lastattack is neither, as you do not GROUP BY lastattack (but only by IP), and you do not aggregate it (you SELECT lastattack, not AVG(lastattack) or SOME_AGGREGATE_FUNCTION(lastattack)).

这就是Gordon Linoff所说的——“select中的所有列要么是GROUP BY中的列,要么使用聚合函数”;在这里,lastattack不是,因为您不使用lastattack(而是IP)分组,也不聚合它(您选择lastattack,而不是AVG(lastattack)或SOME_AGGREGATE_FUNCTION(lastattack)))。

You might still get the correct value - but you might not. In practice, records will be retrieved in deterministic order, and chances are that's the order you want. But other DB implementations might fetch the first value they encounter and leave you with the first attack instead of the last.

您可能仍然得到正确的值,但您可能不会。在实践中,记录将以确定的顺序检索,这很可能是您想要的顺序。但是,其他的DB实现可能会获取它们遇到的第一个值,并将第一次攻击留给您,而不是最后一次。

To obtain the result you want, you need first to establish the date of the last attack:

要获得您想要的结果,您需要首先确定最后一次攻击的日期:

SELECT ip, MAX(attackdate) AS maxdate FROM blacklist GROUP BY ip;

This gives you a table with the correct timestamps. To get the last attack you need a JOIN (which risks duplicates if two attacks came in the same second, so that you can't determine which was the last one):

这将为您提供一个具有正确时间戳的表。为了获得最后一次攻击,您需要一个连接(如果两个攻击同时发生,那么就有重复的风险,所以您不能确定最后一个攻击是哪一个):

SELECT a.ip, a.maxdate, b.lastattack
    FROM (
        SELECT ip, MAX(attackdate) AS maxdate FROM blacklist GROUP BY ip
    ) AS a
JOIN blacklist AS b ON (a.ip = b.ip AND a.maxdate = b.lastattack)

You need an index on ip and attackdate for the inner query, and that should also work for the outer query. You might want to create an index on ip, attackdate and lastattack in this order to see whether that changes anything.

对于内部查询,您需要一个关于ip的索引和attackdate,这对于外部查询也应该有效。您可能希望在此基础上创建一个关于ip、attackdate和lastattack的索引,以查看它是否更改了任何内容。