查询速度慢-如何转换成连接?

时间:2022-04-14 04:17:25

I currently have the current MySQL query taking up to 10 seconds to run in my application:

我目前的MySQL查询在我的应用程序中运行时间高达10秒:

SELECT tagid, tag FROM tags WHERE tagid IN 
(SELECT DISTINCT tagid FROM news_tags WHERE newsid IN 
(SELECT newsid FROM news_tags WHERE tagid IN (16,32)
GROUP BY newsid HAVING COUNT(newsid)>=2)) 
AND tagid NOT IN (16,32) ORDER BY level, tagid

The tables used are:

使用的表有:

  • table news_tags, with columns newsid, tagid
  • 表news_tags,列newsid, tagid
  • table tags, with columns tagid, tag, level
  • 表标签,列标签,标签,级别

The purpose of the query is to find "news" items which have been tagged with tags with tagid 16 and 32, then find other tags these news items have also been tagged with, for the purposes of allowing a user to further narrow down the "news" items with more specific tag combinations. The ultimate goal is to grab the remaining relevant tag and tagid columns from the tags table.

查询的目的是找到“新闻”项目已标记与标记tagid 16和32,然后找到其他标记这些新闻也被标记,为了让用户进一步缩小“新闻”项目更具体的标记组合。最终目标是从tags表中获取其余相关的标记和tagid列。

I have tried different attempts at an equivalent JOIN but have failed to select all remaining tagids on the news items which have the provided tags attached to them.

我尝试过在一个等价的连接上尝试不同的尝试,但是没有在新闻条目上选择所有剩余的标记,这些新闻条目上都有提供的标记。

Here is my EXPLAIN SQL results, in case they point to another cause of slowness which I'm missing:

下面是我的EXPLAIN SQL结果,如果它们指出了另一个导致缓慢的原因,我就会遗漏:

id|select_type       |table    |type          |possible_keys|key    |key_len|ref |rows|Extra
 1|PRIMARY           |tags     |range         |PRIMARY      |PRIMARY|      4|NULL|  55|Using where; Using filesort
 2|DEPENDENT SUBQUERY|news_tags|index_subquery|tagid        |tagid  |      4|func|  26|Using index; Using where
 3|DEPENDENT SUBQUERY|news_tags|index         |tagid        |PRIMARY|      8|NULL|  11|Using where; Using index

Just to clarify the problem: I wanted remaining tags for news items tagged with BOTH tags 16 and 32, not either 16 or 32. Sorry for any confusion.

只是为了澄清问题:我想要保留标记为标记为16和32的新闻项目的标记,而不是标记为16或32。为所有的困惑。

3 个解决方案

#1


2  

SELECT DISTINCT tags.tagid, tags.tag
FROM
       tags                             -- tags from the ...
  JOIN news_tags AS n0 USING (tagid)    -- ... news items tagged with ...
  JOIN news_tags AS n1 USING (newsid)   -- ... tagid = 16 and ...
  JOIN news_tags AS n2 USING (newsid)   -- ... tagid = 32
WHERE
  n1.tagid = 16 AND n2.tagid = 32
  AND tags.tagid NOT IN (16,32)         -- not the tags we already know about
ORDER BY tags.level, tags.tagid

#2


1  

Edit: My query is strictly based on the sql OP provided, was just trying to speed up the query as was asked in question title.

编辑:我的查询严格地基于所提供的sql OP,只是试图加快查询的速度。

SELECT DISTINCT t.tagid, t.tag FROM tags AS t
JOIN            news_tags AS nt1 USING (tagid) 
JOIN            news_tags AS nt2 USING (newsid)
WHERE           nt2.tagid IN (16, 32) AND t.tagid NOT IN (16, 32) 
GROUP BY        nt2.newsid HAVING COUNT(nt2.newsid)>=2
ORDER BY        t.level, t.tagid

#3


0  

I did eventually come up with a fast query that solved this problem using JOINS instead of IN statements:

我最终提出了一个快速查询,使用连接而不是语句来解决这个问题:

SELECT tags.tagid,tags.tag FROM tags 
INNER JOIN (SELECT DISTINCT news_tags.tagid FROM news_tags
INNER JOIN (SELECT newsid FROM news_tags WHERE tagid IN (16,32) 
GROUP BY newsid HAVING count(newsid) >= 2) tagged_news 
ON news_tags.newsid = tagged_news.newsid 
WHERE news_tags.tagid NOT IN (16,32)) rem_tags
ON tags.tagid = rem_tags.tagid
ORDER BY level, tagid

This is obviously no where near as clean or as elegant as eggyal's solution, so I adopted his solution in the end in my application.

这显然不像eggyal的解决方案那么干净和优雅,所以我在最后的应用中采用了他的解决方案。

I'd love to hear more objective reasons (other than elegance) for why eggyval's solution would be preferred to the above SQL statement, both to find the optimal SQL statement for the problem and also to learn for future. Appreciate all the help so far guys.

我希望听到更客观的原因(除了优雅之外)来解释为什么eggyval的解决方案会比上面的SQL语句更受欢迎,这样既可以找到问题的最优SQL语句,也可以为将来学习。感谢所有的帮助,伙计们。

#1


2  

SELECT DISTINCT tags.tagid, tags.tag
FROM
       tags                             -- tags from the ...
  JOIN news_tags AS n0 USING (tagid)    -- ... news items tagged with ...
  JOIN news_tags AS n1 USING (newsid)   -- ... tagid = 16 and ...
  JOIN news_tags AS n2 USING (newsid)   -- ... tagid = 32
WHERE
  n1.tagid = 16 AND n2.tagid = 32
  AND tags.tagid NOT IN (16,32)         -- not the tags we already know about
ORDER BY tags.level, tags.tagid

#2


1  

Edit: My query is strictly based on the sql OP provided, was just trying to speed up the query as was asked in question title.

编辑:我的查询严格地基于所提供的sql OP,只是试图加快查询的速度。

SELECT DISTINCT t.tagid, t.tag FROM tags AS t
JOIN            news_tags AS nt1 USING (tagid) 
JOIN            news_tags AS nt2 USING (newsid)
WHERE           nt2.tagid IN (16, 32) AND t.tagid NOT IN (16, 32) 
GROUP BY        nt2.newsid HAVING COUNT(nt2.newsid)>=2
ORDER BY        t.level, t.tagid

#3


0  

I did eventually come up with a fast query that solved this problem using JOINS instead of IN statements:

我最终提出了一个快速查询,使用连接而不是语句来解决这个问题:

SELECT tags.tagid,tags.tag FROM tags 
INNER JOIN (SELECT DISTINCT news_tags.tagid FROM news_tags
INNER JOIN (SELECT newsid FROM news_tags WHERE tagid IN (16,32) 
GROUP BY newsid HAVING count(newsid) >= 2) tagged_news 
ON news_tags.newsid = tagged_news.newsid 
WHERE news_tags.tagid NOT IN (16,32)) rem_tags
ON tags.tagid = rem_tags.tagid
ORDER BY level, tagid

This is obviously no where near as clean or as elegant as eggyal's solution, so I adopted his solution in the end in my application.

这显然不像eggyal的解决方案那么干净和优雅,所以我在最后的应用中采用了他的解决方案。

I'd love to hear more objective reasons (other than elegance) for why eggyval's solution would be preferred to the above SQL statement, both to find the optimal SQL statement for the problem and also to learn for future. Appreciate all the help so far guys.

我希望听到更客观的原因(除了优雅之外)来解释为什么eggyval的解决方案会比上面的SQL语句更受欢迎,这样既可以找到问题的最优SQL语句,也可以为将来学习。感谢所有的帮助,伙计们。