如何优化具有2个不同内连接的Mysql查询? (InnoDB)

时间:2020-12-26 04:17:11

I have a query which I use InnoDB storage engine.

我有一个查询,我使用InnoDB存储引擎。

I want to optimize it. It takes too much time to execute. I have 5 million data in my database. Now it takes 250 seconds to execute.

我想优化它。执行需要太多时间。我的数据库中有500万个数据。现在需要250秒才能执行。

INSERT INTO dynamicgroups (adressid) 

    SELECT SQL_NO_CACHE DISTINCT(addressid) FROM (
        SELECT cluster_0.addressid FROM (
            SELECT DISTINCT addressid FROM (
                SELECT group_all.addressid FROM (
                    SELECT g.addressid FROM table2.635_emadresmgroups g 
                        INNER JOIN table2.emaildata f_0
                               ON f_0.addressid = g.addressid
                        WHERE  (f_0.birthday > date(DATE_SUB(NOW(),INTERVAL 18 MONTH))
                            AND f_0.birthday < CURDATE() )
                ) group_all
            ) AS groups

        ) AS cluster_0

        INNER JOIN(
            SELECT DISTINCT addressid FROM (
                SELECT group_all.addressid FROM (
                    SELECT g.addressid FROM table2.635_emadresmgroups g 
                        INNER JOIN table2.emaildata f_0
                               ON f_0.addressid = g.addressid
                        WHERE  (marriage_date = ''
                             OR marriage_date = '1900-01-01'
                             OR marriage_date = '0000-00-00' )
                ) group_all
            ) AS groups
        ) AS cluster_1 ON cluster_1.addressid = cluster_0.addressid

        INNER JOIN(
            SELECT DISTINCT addressid FROM (
                SELECT group_all.addressid FROM (
                    SELECT g.addressid FROM table2.635_emadresmgroups g 
                        INNER JOIN table2.emaildata f_0
                                ON f_0.addressid = g.addressid
                        WHERE  (f_0.city = '34' )
                ) group_all
            ) AS groups
        ) AS cluster_2 ON cluster_2.addressid = cluster_1.addressid 
    ) AS t

3 个解决方案

#1


Your queries all seem to be variations of this query:

您的查询似乎都是此查询的变体:

SELECT g.addressid
FROM table2.635_emadresmgroups g INNER JOIN
     table2.emaildata f_0
     ON f_0.addressid = g.addressid
WHERE  (f_0.birthday > date(DATE_SUB(NOW(),INTERVAL 18 MONTH)) AND f_0.birthday < CURDATE() )

I would suggest approaching this using group by and having:

我建议使用以下小组来接近这个:

SELECT g.addressid
FROM table2.635_emadresmgroups g INNER JOIN
     table2.emaildata f_0
     ON f_0.addressid = g.addressid
GROUP BY g.addressid
HAVING SUM(f_0.birthday > date(DATE_SUB(NOW(), INTERVAL 18 MONTH)) AND f_0.birthday < CURDATE() ) > 0 AND
       SUM(marriage_date = '' OR marriage_date = '1900-01-01'  OR marriage_date = '0000-00-00' ) > 0 AND
       SUM(f_0.city = '34' ) > 0;

Depending on the volume of data, filtering before the group by can also help:

根据数据量,在分组之前进行过滤也可以帮助:

SELECT g.addressid
FROM table2.635_emadresmgroups g INNER JOIN
     table2.emaildata f_0
     ON f_0.addressid = g.addressid
WHERE (f_0.birthday > date(DATE_SUB(NOW(), INTERVAL 18 MONTH)) AND f_0.birthday < CURDATE() ) OR
      (marriage_date = ''  OR marriage_date = '1900-01-01' OR marriage_date = '0000-00-00' ) OR
      (f_0.city = '34' )
GROUP BY g.addressid
HAVING SUM(f_0.birthday > date(DATE_SUB(NOW(), INTERVAL 18 MONTH)) AND f_0.birthday < CURDATE() ) > 0 AND
       SUM(marriage_date = '' OR marriage_date = '1900-01-01'  OR marriage_date = '0000-00-00' ) > 0 AND
       SUM(f_0.city = '34' ) > 0;

#2


Even though the EXPLAIN operator isn't implemented as well as others.. I'd suggest you use it for your query.

即使EXPLAIN运算符没有像其他人一样实现..我建议你使用它来进行查询。

After that you can analyse what the result that EXPLAIN give and decide which columns should be indexed.

之后,您可以分析EXPLAIN提供的结果,并确定应将哪些列编入索引。

For more information I'd suggest viewing these sources:

有关更多信息,我建议您查看以下来源:

MySQL syntax: EXPLAIN

MySQL语法:EXPLAIN

MySQL using: EXPLAIN

MySQL使用:EXPLAIN

Furthermore, the last 2 selects appear to be very similar, maybe you can make a temporary table or a view out of these, so that you don't have to run the entire select twice?

此外,最后2个选项看起来非常相似,也许您可​​以制作一个临时表或一个视图,这样您就不必运行整个选择两次?

#3


marriage_date -- Make it NULLable and use NULL instead of '', etc. That will avoid an inefficient OR and might lead to usability of an INDEX.

marriage_date - 使其为NULLable并使用NULL而不是''等。这将避免OR效率低下并可能导致INDEX的可用性。

Please provide SHOW CREATE TABLE so we can assess the current indexes.

请提供SHOW CREATE TABLE,以便我们评估当前的索引。

What version are you running? Until very recently this construct was very inefficient:

你正在运行什么版本?直到最近,这种结构效率非常低:

FROM ( SELECT ... )
JOIN ( SELECT ... )

The workaround was to put the subqueries into tmp tables and add an INDEX.

解决方法是将子查询放入tmp表并添加INDEX。

This may help in your case, since you seem to be using the JOINs for filtering: Turn JOIN ( SELECT ... ) into WHERE EXISTS ( SELECT * ... ).

这可能对您的情况有所帮助,因为您似乎正在使用JOIN进行过滤:将JOIN(SELECT ...)转换为WHERE EXISTS(SELECT * ...)。

Please describe, in English, what the query is trying to do.

请用英语描述查询尝试做什么。

Another approach, building on Gordon's suggestion of having a common SELECT: Put that common SELECT into a TEMPORARY table; add index(es), then query from it.

另一种方法,建立在Gordon建议的共同SELECT上:将常用SELECT放入TEMPORARY表中;添加索引,然后从中查询。

#1


Your queries all seem to be variations of this query:

您的查询似乎都是此查询的变体:

SELECT g.addressid
FROM table2.635_emadresmgroups g INNER JOIN
     table2.emaildata f_0
     ON f_0.addressid = g.addressid
WHERE  (f_0.birthday > date(DATE_SUB(NOW(),INTERVAL 18 MONTH)) AND f_0.birthday < CURDATE() )

I would suggest approaching this using group by and having:

我建议使用以下小组来接近这个:

SELECT g.addressid
FROM table2.635_emadresmgroups g INNER JOIN
     table2.emaildata f_0
     ON f_0.addressid = g.addressid
GROUP BY g.addressid
HAVING SUM(f_0.birthday > date(DATE_SUB(NOW(), INTERVAL 18 MONTH)) AND f_0.birthday < CURDATE() ) > 0 AND
       SUM(marriage_date = '' OR marriage_date = '1900-01-01'  OR marriage_date = '0000-00-00' ) > 0 AND
       SUM(f_0.city = '34' ) > 0;

Depending on the volume of data, filtering before the group by can also help:

根据数据量,在分组之前进行过滤也可以帮助:

SELECT g.addressid
FROM table2.635_emadresmgroups g INNER JOIN
     table2.emaildata f_0
     ON f_0.addressid = g.addressid
WHERE (f_0.birthday > date(DATE_SUB(NOW(), INTERVAL 18 MONTH)) AND f_0.birthday < CURDATE() ) OR
      (marriage_date = ''  OR marriage_date = '1900-01-01' OR marriage_date = '0000-00-00' ) OR
      (f_0.city = '34' )
GROUP BY g.addressid
HAVING SUM(f_0.birthday > date(DATE_SUB(NOW(), INTERVAL 18 MONTH)) AND f_0.birthday < CURDATE() ) > 0 AND
       SUM(marriage_date = '' OR marriage_date = '1900-01-01'  OR marriage_date = '0000-00-00' ) > 0 AND
       SUM(f_0.city = '34' ) > 0;

#2


Even though the EXPLAIN operator isn't implemented as well as others.. I'd suggest you use it for your query.

即使EXPLAIN运算符没有像其他人一样实现..我建议你使用它来进行查询。

After that you can analyse what the result that EXPLAIN give and decide which columns should be indexed.

之后,您可以分析EXPLAIN提供的结果,并确定应将哪些列编入索引。

For more information I'd suggest viewing these sources:

有关更多信息,我建议您查看以下来源:

MySQL syntax: EXPLAIN

MySQL语法:EXPLAIN

MySQL using: EXPLAIN

MySQL使用:EXPLAIN

Furthermore, the last 2 selects appear to be very similar, maybe you can make a temporary table or a view out of these, so that you don't have to run the entire select twice?

此外,最后2个选项看起来非常相似,也许您可​​以制作一个临时表或一个视图,这样您就不必运行整个选择两次?

#3


marriage_date -- Make it NULLable and use NULL instead of '', etc. That will avoid an inefficient OR and might lead to usability of an INDEX.

marriage_date - 使其为NULLable并使用NULL而不是''等。这将避免OR效率低下并可能导致INDEX的可用性。

Please provide SHOW CREATE TABLE so we can assess the current indexes.

请提供SHOW CREATE TABLE,以便我们评估当前的索引。

What version are you running? Until very recently this construct was very inefficient:

你正在运行什么版本?直到最近,这种结构效率非常低:

FROM ( SELECT ... )
JOIN ( SELECT ... )

The workaround was to put the subqueries into tmp tables and add an INDEX.

解决方法是将子查询放入tmp表并添加INDEX。

This may help in your case, since you seem to be using the JOINs for filtering: Turn JOIN ( SELECT ... ) into WHERE EXISTS ( SELECT * ... ).

这可能对您的情况有所帮助,因为您似乎正在使用JOIN进行过滤:将JOIN(SELECT ...)转换为WHERE EXISTS(SELECT * ...)。

Please describe, in English, what the query is trying to do.

请用英语描述查询尝试做什么。

Another approach, building on Gordon's suggestion of having a common SELECT: Put that common SELECT into a TEMPORARY table; add index(es), then query from it.

另一种方法,建立在Gordon建议的共同SELECT上:将常用SELECT放入TEMPORARY表中;添加索引,然后从中查询。