改进带有许多左连接的SQL查询的性能

时间:2022-11-15 04:20:53

I have a database structure serving up news articles with the following associations:

我有一个数据库结构,提供与以下关联的新闻文章:

  • HABTM news_categories
  • HABTM news_categories
  • HABTM tags
  • HABTM标签
  • HABTM uploads
  • HABTM上传

I have written an SQL query to pull all this together:

我写了一个SQL查询,将所有这些整合在一起:

 SELECT `news_articles`.*, 
 GROUP_CONCAT(DISTINCT tags.title) AS `tags`, 
 GROUP_CONCAT(DISTINCT tags.id) AS `tag_ids`,
 GROUP_CONCAT(DISTINCT news_categories.title) AS `news_categories`,
 GROUP_CONCAT(DISTINCT news_categories.id) AS `news_category_ids`,
 GROUP_CONCAT(DISTINCT news_categories.slug) AS `news_category_slugs`, 
 `news_articles_uploads`.`caption` AS `upload_caption`,
 `uploads`.`title` AS `upload_title`, 
 `uploads`.`basename` AS `upload_basename`,
 `uploads`.`extension` AS `upload_extension`,
 `uploads`.`path` AS `upload_path`
 FROM `news_articles`
 LEFT JOIN `news_articles_tags` ON news_articles_tags.news_article_id = news_articles.id
 LEFT JOIN `tags` ON news_articles_tags.tag_id = tags.id
 LEFT JOIN `news_articles_news_categories` ON news_articles_news_categories.news_article_id = news_articles.id
 LEFT JOIN `news_categories` ON news_articles_news_categories.news_category_id = news_categories.id
 LEFT JOIN `news_articles_uploads` ON (news_articles_uploads.news_article_id = news_articles.id AND news_articles_uploads.order = 0)
 LEFT JOIN `uploads` ON news_articles_uploads.upload_id = uploads.id 
 WHERE (news_categories.slug IN ("category-one","category-two","category-three","category-four","category-five")) AND (news_articles.published = 1)
 GROUP BY `news_articles`.`id`
 ORDER BY `news_articles`.`lead_article` DESC, `news_articles`.`created` DESC LIMIT 20;

The problem is that whilst the query runs, it is slow, and during busy periods the CPU usage is getting pretty out of hand!

问题是,当查询运行时,它是缓慢的,并且在繁忙的时候,CPU的使用变得非常失控!

Here is an EXPLAIN for the above query (right-click open in new tab to see full size):

下面是对上述查询的解释(右键单击new选项卡打开,查看完整大小):

改进带有许多左连接的SQL查询的性能

You can find the schema here: http://pastie.org/private/qoe2qo16rbqr5mptb4bug

您可以在这里找到模式:http://pastie.org/private/qoe2qo16rbqr5mptb4bug

The server is running MySQL 5.1.55 and the website uses Zend Framework to execute the query and PHP 5.2.8.

服务器运行MySQL 5.1.55,网站使用Zend Framework执行查询和PHP 5.2.8。

I've been through the MySQL slow query log and added missing indexes to the best of my knowledge but the query still shows up as taking 1-3 seconds to execute. If anyone has any ideas I've be very grateful. Thanks in advance.

我已经浏览了MySQL的慢速查询日志,并尽我所知添加了缺失的索引,但是查询仍然显示为执行时间为1-3秒。如果有人有什么想法,我非常感激。提前谢谢。

2 个解决方案

#1


3  

Since your "WHERE" clause originally included "AND" for your news categories within a specified list, that would FORCE the joins to get there as INNER joins, not LEFT JOINs. Also, I would try adding the "STRAIGHT_JOIN" clause. This typically forces the engine to do the joining in the order specifically stated, instead of trying to think of its own alternative for you... especially when the other tables are more "lookup" references.

由于您的“WHERE”子句最初包含“和”,在指定的列表中包含您的新闻类别,这将迫使连接以内部连接的方式到达那里,而不是左连接。另外,我将尝试添加“STRAIGHT_JOIN”子句。这通常会迫使引擎按照特定的顺序进行连接,而不是试图为您考虑自己的选择……特别是当其他表是更多的“查找”引用时。

I would also apply the index as suggested by Jordan.

我还将按照约旦的建议应用该指数。

SELECT STRAIGHT_JOIN
      NA.*, 
      GROUP_CONCAT(DISTINCT tags.title) AS `tags`, 
      GROUP_CONCAT(DISTINCT tags.id) AS tag_ids,
      GROUP_CONCAT(DISTINCT NC.title) AS news_categories,
      GROUP_CONCAT(DISTINCT NC.id) AS news_category_ids,
      GROUP_CONCAT(DISTINCT NC.slug) AS news_category_slugs, 
      NAUp.`caption` AS upload_caption,
      Up1.`title` AS upload_title, 
      Up1.`basename` AS upload_basename,
      Up1.`extension` AS upload_extension,
      Up1.`path` AS upload_path
   FROM 
      news_articles NA
         INNER JOIN news_articles_news_categories NACats
            ON NA.id = NACats.news_article_id

            INNER JOIN news_categories NC
               ON NACats.news_category_id = NC.id
               AND NC.slug IN ( "category-one",
                                "category-two",
                                "category-three",
                                "category-four",
                                "category-five" )


         LEFT JOIN news_articles_tags NATags
            ON NA.ID = NATags.news_article_id

            LEFT JOIN tags
               ON NATags.tag_id = tags.id

         LEFT JOIN news_articles_uploads NAUp
            ON    NA.ID = NAUp.news_article_id 
              AND NAUp.order = 0

            LEFT JOIN uploads Up1
               ON NAUp.upload_id = Up1.id 

   WHERE 
      NA.Published = 1
   GROUP BY 
      NA.ID
   ORDER BY 
      NA.lead_article DESC, 
      NA.created DESC 
   LIMIT 20;

#2


3  

Do you have an index on news_articles.lead_article, news_articles.created, an index that has both columns.

在news_articles上有索引吗?lead_article news_articles。创建了一个包含两列的索引。

create index news_articles_x1 on news_articles (lead_articles, created);

Without it, you won't be taking advantage of the order by limit clause, it would still scan and sort the whole table.

如果没有它,您就不会利用order by limit子句,它仍然会扫描并排序整个表。

Also I would question whether you need all of this data in one blast?

我还想问你是否需要所有这些数据在一次爆炸中?

#1


3  

Since your "WHERE" clause originally included "AND" for your news categories within a specified list, that would FORCE the joins to get there as INNER joins, not LEFT JOINs. Also, I would try adding the "STRAIGHT_JOIN" clause. This typically forces the engine to do the joining in the order specifically stated, instead of trying to think of its own alternative for you... especially when the other tables are more "lookup" references.

由于您的“WHERE”子句最初包含“和”,在指定的列表中包含您的新闻类别,这将迫使连接以内部连接的方式到达那里,而不是左连接。另外,我将尝试添加“STRAIGHT_JOIN”子句。这通常会迫使引擎按照特定的顺序进行连接,而不是试图为您考虑自己的选择……特别是当其他表是更多的“查找”引用时。

I would also apply the index as suggested by Jordan.

我还将按照约旦的建议应用该指数。

SELECT STRAIGHT_JOIN
      NA.*, 
      GROUP_CONCAT(DISTINCT tags.title) AS `tags`, 
      GROUP_CONCAT(DISTINCT tags.id) AS tag_ids,
      GROUP_CONCAT(DISTINCT NC.title) AS news_categories,
      GROUP_CONCAT(DISTINCT NC.id) AS news_category_ids,
      GROUP_CONCAT(DISTINCT NC.slug) AS news_category_slugs, 
      NAUp.`caption` AS upload_caption,
      Up1.`title` AS upload_title, 
      Up1.`basename` AS upload_basename,
      Up1.`extension` AS upload_extension,
      Up1.`path` AS upload_path
   FROM 
      news_articles NA
         INNER JOIN news_articles_news_categories NACats
            ON NA.id = NACats.news_article_id

            INNER JOIN news_categories NC
               ON NACats.news_category_id = NC.id
               AND NC.slug IN ( "category-one",
                                "category-two",
                                "category-three",
                                "category-four",
                                "category-five" )


         LEFT JOIN news_articles_tags NATags
            ON NA.ID = NATags.news_article_id

            LEFT JOIN tags
               ON NATags.tag_id = tags.id

         LEFT JOIN news_articles_uploads NAUp
            ON    NA.ID = NAUp.news_article_id 
              AND NAUp.order = 0

            LEFT JOIN uploads Up1
               ON NAUp.upload_id = Up1.id 

   WHERE 
      NA.Published = 1
   GROUP BY 
      NA.ID
   ORDER BY 
      NA.lead_article DESC, 
      NA.created DESC 
   LIMIT 20;

#2


3  

Do you have an index on news_articles.lead_article, news_articles.created, an index that has both columns.

在news_articles上有索引吗?lead_article news_articles。创建了一个包含两列的索引。

create index news_articles_x1 on news_articles (lead_articles, created);

Without it, you won't be taking advantage of the order by limit clause, it would still scan and sort the whole table.

如果没有它,您就不会利用order by limit子句,它仍然会扫描并排序整个表。

Also I would question whether you need all of this data in one blast?

我还想问你是否需要所有这些数据在一次爆炸中?