使用MySQL中的特定关键字对结果进行分组?

时间:2022-12-10 14:23:40

I have a page tagged with multiple tags with the keyword I am searching and sometimes it is not tagged with that keyword, so when it has that tags, it will return a result like this below,

我有一个带有多个标签的页面,标签上有我正在搜索的关键字,有时它没有标签,所以当它有这些标签时,它会返回如下所示的结果,

query,

查询,

SELECT*
FROM root_pages AS p

LEFT JOIN root_mm_pages_tags AS mm
ON mm.page_id = p.page_id

LEFT JOIN root_tags AS t
ON t.tag_id =  mm.tag_id
AND t.tag_name LIKE '%story%'

WHERE p.page_title LIKE '%article title 8%'
AND p.page_hide != '1'

ORDER BY (t.tag_name+0) ASC

result,

结果,

page_id     page_url            tag_name    
17          article title 8     NULL
17          article title 8     NULL
17          article title 8     sys-rsv-story-1

so I have to use GROUP BY to solve this problem,

所以我必须用GROUP BY来解决这个问题,

SELECT*
FROM root_pages AS p

LEFT JOIN root_mm_pages_tags AS mm
ON mm.page_id = p.page_id

LEFT JOIN root_tags AS t
ON t.tag_id =  mm.tag_id
AND t.tag_name LIKE '%story%'

WHERE p.page_title LIKE '%article title 8%'
AND p.page_hide != '1'

GROUP BY p.page_id
ORDER BY (t.tag_name+0) ASC

and it returns something like this,

它会像这样返回,

page_id     page_url            tag_name    
17          article title 8     NULL

But I am after this result which it has the keyword that I am searching for,

但是我在这个结果之后它有我搜索的关键词,

page_id     page_url            tag_name    
17          article title 8     sys-rsv-story-1

So, is it possible to group a result by a keyword? Or other better queries to archive this?

那么,是否可以使用关键字对结果进行分组?或者其他更好的查询来存档它?

Also, it should not return the result if that keyword isn't there, but it still does,

同样,如果这个关键字不存在,它也不应该返回结果,但它仍然是,

page_id     page_url            tag_name    
    17          article title 8     NULL
    17          article title 8     NULL

EDIT:

编辑:

My new solution,

我的新解决方案,

 SELECT*
FROM root_pages AS p

INNER JOIN root_mm_pages_tags AS mm
ON mm.page_id = p.page_id

INNER JOIN root_tags AS t
ON t.tag_id =  mm.tag_id

WHERE p.page_title LIKE '%{group1}%'
AND t.tag_name LIKE '%story%'
AND p.page_hide != '1'

AND EXISTS (
    SELECT page_url
    FROM root_pages AS p

    LEFT JOIN root_mm_pages_tags AS mm
    ON mm.page_id = p.page_id

    LEFT JOIN root_tags AS t
    ON t.tag_id =  mm.tag_id

    WHERE page_url = 'article title 1d'
    AND t.tag_name LIKE '%story%'
    AND p.page_hide != '1'
)

ORDER BY (t.tag_name+0) ASC

3 个解决方案

#1


1  

Oucha.

Oucha。

Your SQl queries are quite strange I think.

我认为您的SQl查询非常奇怪。

Several thing to notice:

几件要注意的:

  • using bar LIKE '%foo%' is very hard for the SQL engine, he must sequentially scan all rows and search the substring 'foo' in the column bar. Index usage is not available. So avoid it if you can. Use at least bar LIKE 'foo%' if you can (index available if you have the start). And in you case you could have pages with a title 'article title 80' matching, are you sure you do not simply need a p.page_title = 'article title 8'?
  • 对于SQL引擎来说,使用'%foo%'这样的bar是非常困难的,他必须顺序扫描所有行并在列栏中搜索'foo'子字符串。索引使用不可用。所以尽量避免。如果可以,至少使用'foo%'这样的bar(如果有开始,可以使用索引)。在你的例子中,你可以有一个标题为“文章标题80”匹配的页面,你确定你不只是需要一个p。page_title = 'article title 8'?
  • why do you make a +0 in the order by instruction? Do you really want to prevent index usage?
  • 你为什么用指令把a +0按顺序排列?您真的想防止索引的使用吗?
  • p.page_hide != '1', p.page_hide isn't a tinyint? it's a string? why using UTF8 encoded characters to store 0 or 1?
  • p。page_hide ! = ' 1 ',p。page_hide不是一个非常小的整数?它是一个字符串?为什么要使用UTF8编码字符来存储0或1?

But this is not the problem.

但这不是问题所在。

One of your problem is that using a group by GROUP BY p.page_id is in fact wrong in SQL but MySQL hides this fact. A group by instruction should contain at least every element which is not an aggegate in the SELECT part (an aggregate is count or sum, or avg, etc). Here you group by id and you obtain a random thing, MySQL thinks you know what you're doing and you're sure every other field in the select is the same when the id is the same (which is not the case, the tag_name differs).

你的问题之一是,用一组一组的p。page_id在SQL中实际上是错误的,但是MySQL隐藏了这个事实。一个由指令组成的组至少应该包含在选择部分中不是aggegate的所有元素(集合是count或sum,或avg等)。这里根据id进行分组,得到一个随机的东西,MySQL认为您知道自己在做什么,并且您确信当id相同时,select中的所有其他字段都是相同的(不是这样,tag_name是不同的)。

And if you have several tags matching your keyword ('story' here) don't you want the page to be listed several times? with all tags?

如果你有几个标签匹配你的关键字(这里是“story”),你不希望这个页面被列出好几次吗?所有标签吗?

So.

所以。

You want to select a page, where you have a tag. I would say use EXISTS keyword and make things simplier.

您想要选择一个页面,其中有一个标记。我会说使用exist关键字使事情更简单。

It may be something like that:

可能是这样的:

SELECT * 
 FROM root_pages AS p
WHERE p.page_title = 'article title 8'
 AND p.page_hide != 1
 -- exists will return true as soon as the engine find one matching row
 AND EXISTS (
  SELECT mm.page_id
  FROM root_mm_pages_tags AS mm
    LEFT JOIN root_tags AS t
      ON t.tag_id =  mm.tag_id
  -- here we make a correlation between the subquery and the main query
  WHERE mm.page_id = p.page_id
  AND t.tag_name LIKE '%story%'
)

But with this query you only obtain the page name, not the tag result. And if you want to list all matching tags for a page you need another query, quite near of what you have:

但是使用这个查询,您只能获得页面名称,而不是标记结果。如果你想列出一个页面的所有匹配标签,你需要另一个查询,非常接近你所拥有的:

SELECT p.page_id, p.page_name, t.tag_name
 FROM root_pages AS p
   INNER JOIN root_mm_pages_tags AS mm
       ON mm.page_id = p.page_id
     INNER JOIN root_tags AS t
         ON (t.tag_id =  mm.tag_id 
         AND t.tag_name LIKE '%story%')
WHERE p.page_title = 'article title 8'
 AND p.page_hide != 1

With the first INNER JOIN I'm only keeping pages which have tags. With the second INNER JOIN I'm only keeping rows from root_mm_pages having a matching tag in root_tags. I think your NULL came from rows in this tables linked to other unmatching tags (so having NULL field in root_tags table result for you query). So do not use LEFT JOIN if you only want matchings results.

对于第一个内部连接,我只保留有标记的页面。对于第二个内部连接,我只保留root_mm_pages中的行在root_tags中具有匹配标记。我认为您的NULL来自这个表中与其他不匹配标记相关联的行(因此在root_tags表结果中有NULL字段供您查询)。因此,如果只想要匹配结果,就不要使用左连接。

If you want only one result for each table you will need a GROUP BY p.page_id, p.page_name and you will need to add an aggregate function on the remaining field t.tag_name. You could use GROUP_CONTACT(t.tag_name ORDER BY t.tag_name ASC SEPARATOR ",") to obtain a list of all matching tags for this table.

如果每个表只需要一个结果,那么就需要一个p的组。page_id,p。page_name,您需要在其余字段t.tag_name上添加聚合函数。您可以使用GROUP_CONTACT(t。tag_name ORDER BY t。获取该表所有匹配标记的列表。

EDIT

编辑

So it seems in fact you want pages with matching title OR pages with matching keyword. In this case you should use LEFT JOIN, and you will have NULL values. If you do not need the tag in the result the EXISTS keyword is still your best friend, just replace the AND EXISTS with OR EXISTS. It's the fastest solution.

所以看起来你想要标题匹配的页面或者关键词匹配的页面。在这种情况下,您应该使用左连接,并且您将拥有空值。如果您不需要结果中的标记,那么exist关键字仍然是您最好的朋友,只需用或存在替换和。这是最快的解决方案。

If you need the matching tags in the result or NULL when they were no tags you have 2 solutions. A UNION query mixing results from a simple query on titles and a query on tags with inner joins, or doing the nice group by with GROUP_CONCAT. If you do not use GROUP_CONCAT (as in @Dmitry Teplyakov answer) you will maybe obtain results where the page title did'nt match, only the keyword, but the tag_name field will show NULL as the first tag_row listed before the GROUP BY is applied on the query is a NULL field -- the page as 3 keywords, the matching keyword is not the first in the query --.

如果结果中需要匹配的标签,或者没有标签时需要NULL,那么有两种解决方案。联合查询混合了标题上的简单查询和带有内部连接的标签上的查询的结果,或者使用GROUP_CONCAT进行分组。如果不使用GROUP_CONCAT(如@Dmitry Teplyakov回答)你将可能获得的结果页面标题不能匹配,关键字,但是tag_name字段将显示空之前列出的第一个tag_row集团的应用页面上查询是一个空的领域——3关键字,关键字匹配并不是第一个在查询——

SELECT 
 p.page_id,
 p.page_name,
 GROUP_CONCAT(t.tag_name ORDER BY t.tag_name ASC SEPARATOR ",")
FROM root_pages AS p
   LEFT JOIN root_mm_pages_tags AS mm
       ON mm.page_id = p.page_id
     LEFT JOIN root_tags AS t
         ON t.tag_id =  mm.tag_id 
WHERE p.page_hide != 1
 AND (p.page_title = 'article title 8'
  OR t.tag_name LIKE '%story%')
GROUP BY p.page_id, p.page_name;

But here we loose your order by tag_name. Ordering by tag_name means you wants the same page appearing in several rows if it is matching the keyword several times. Or if the name is matching and the keyword also... or maybe not. So in fact the UNION query solution is maybe better. But the key point is you should explain what you want in the tag_name field :-)

但是这里我们用tag_name来取消订单。通过tag_name排序意味着,如果同一页面多次匹配关键字,则希望同一页面出现在数行中。或者如果名字是匹配的,关键词也是。或者不是。所以实际上联合查询解决方案可能更好。但是关键的一点是您应该解释您想要在tag_name字段中:-)

#2


2  

Try to not use condition in LEFT JOIN:

尽量不要在左连接中使用条件:

SELECT *
FROM root_pages AS p

LEFT JOIN root_mm_pages_tags AS mm
ON mm.page_id = p.page_id

LEFT JOIN root_tags AS t
ON t.tag_id =  mm.tag_id

WHERE p.page_title LIKE '%article title 8%'
AND p.page_hide != '1'
AND t.tag_name LIKE '%story%'

GROUP BY p.page_id
ORDER BY (t.tag_name+0) ASC

EDIT: If you want to fetch rows with page title contains 'article title' and rows that have not that titles but have needed keyword, use this query (As @user985935 suggested):

编辑:如果您想获取包含“文章标题”的行,以及没有标题但需要关键字的行,请使用此查询(如@user985935所建议的):

SELECT *
FROM root_pages AS p

LEFT JOIN root_mm_pages_tags AS mm
ON mm.page_id = p.page_id

LEFT JOIN root_tags AS t
ON t.tag_id =  mm.tag_id

WHERE (p.page_title LIKE '%article title 8%'
OR t.tag_name LIKE '%story%')
AND p.page_hide != '1'


GROUP BY p.page_id
ORDER BY (t.tag_name+0) ASC

#3


0  

here is the sample query I'm mentioning on the comment:

下面是我在评论中提到的示例查询:

SELECT *
FROM root_pages AS p

LEFT JOIN root_mm_pages_tags AS mm
ON mm.page_id = p.page_id

LEFT JOIN root_tags AS t
ON t.tag_id =  mm.tag_id

WHERE p.page_hide != '1'
AND (t.tag_name LIKE '%story%' OR p.page_title LIKE '%article title 8%')
GROUP BY p.page_id
ORDER BY (t.tag_name+0) ASC

#1


1  

Oucha.

Oucha。

Your SQl queries are quite strange I think.

我认为您的SQl查询非常奇怪。

Several thing to notice:

几件要注意的:

  • using bar LIKE '%foo%' is very hard for the SQL engine, he must sequentially scan all rows and search the substring 'foo' in the column bar. Index usage is not available. So avoid it if you can. Use at least bar LIKE 'foo%' if you can (index available if you have the start). And in you case you could have pages with a title 'article title 80' matching, are you sure you do not simply need a p.page_title = 'article title 8'?
  • 对于SQL引擎来说,使用'%foo%'这样的bar是非常困难的,他必须顺序扫描所有行并在列栏中搜索'foo'子字符串。索引使用不可用。所以尽量避免。如果可以,至少使用'foo%'这样的bar(如果有开始,可以使用索引)。在你的例子中,你可以有一个标题为“文章标题80”匹配的页面,你确定你不只是需要一个p。page_title = 'article title 8'?
  • why do you make a +0 in the order by instruction? Do you really want to prevent index usage?
  • 你为什么用指令把a +0按顺序排列?您真的想防止索引的使用吗?
  • p.page_hide != '1', p.page_hide isn't a tinyint? it's a string? why using UTF8 encoded characters to store 0 or 1?
  • p。page_hide ! = ' 1 ',p。page_hide不是一个非常小的整数?它是一个字符串?为什么要使用UTF8编码字符来存储0或1?

But this is not the problem.

但这不是问题所在。

One of your problem is that using a group by GROUP BY p.page_id is in fact wrong in SQL but MySQL hides this fact. A group by instruction should contain at least every element which is not an aggegate in the SELECT part (an aggregate is count or sum, or avg, etc). Here you group by id and you obtain a random thing, MySQL thinks you know what you're doing and you're sure every other field in the select is the same when the id is the same (which is not the case, the tag_name differs).

你的问题之一是,用一组一组的p。page_id在SQL中实际上是错误的,但是MySQL隐藏了这个事实。一个由指令组成的组至少应该包含在选择部分中不是aggegate的所有元素(集合是count或sum,或avg等)。这里根据id进行分组,得到一个随机的东西,MySQL认为您知道自己在做什么,并且您确信当id相同时,select中的所有其他字段都是相同的(不是这样,tag_name是不同的)。

And if you have several tags matching your keyword ('story' here) don't you want the page to be listed several times? with all tags?

如果你有几个标签匹配你的关键字(这里是“story”),你不希望这个页面被列出好几次吗?所有标签吗?

So.

所以。

You want to select a page, where you have a tag. I would say use EXISTS keyword and make things simplier.

您想要选择一个页面,其中有一个标记。我会说使用exist关键字使事情更简单。

It may be something like that:

可能是这样的:

SELECT * 
 FROM root_pages AS p
WHERE p.page_title = 'article title 8'
 AND p.page_hide != 1
 -- exists will return true as soon as the engine find one matching row
 AND EXISTS (
  SELECT mm.page_id
  FROM root_mm_pages_tags AS mm
    LEFT JOIN root_tags AS t
      ON t.tag_id =  mm.tag_id
  -- here we make a correlation between the subquery and the main query
  WHERE mm.page_id = p.page_id
  AND t.tag_name LIKE '%story%'
)

But with this query you only obtain the page name, not the tag result. And if you want to list all matching tags for a page you need another query, quite near of what you have:

但是使用这个查询,您只能获得页面名称,而不是标记结果。如果你想列出一个页面的所有匹配标签,你需要另一个查询,非常接近你所拥有的:

SELECT p.page_id, p.page_name, t.tag_name
 FROM root_pages AS p
   INNER JOIN root_mm_pages_tags AS mm
       ON mm.page_id = p.page_id
     INNER JOIN root_tags AS t
         ON (t.tag_id =  mm.tag_id 
         AND t.tag_name LIKE '%story%')
WHERE p.page_title = 'article title 8'
 AND p.page_hide != 1

With the first INNER JOIN I'm only keeping pages which have tags. With the second INNER JOIN I'm only keeping rows from root_mm_pages having a matching tag in root_tags. I think your NULL came from rows in this tables linked to other unmatching tags (so having NULL field in root_tags table result for you query). So do not use LEFT JOIN if you only want matchings results.

对于第一个内部连接,我只保留有标记的页面。对于第二个内部连接,我只保留root_mm_pages中的行在root_tags中具有匹配标记。我认为您的NULL来自这个表中与其他不匹配标记相关联的行(因此在root_tags表结果中有NULL字段供您查询)。因此,如果只想要匹配结果,就不要使用左连接。

If you want only one result for each table you will need a GROUP BY p.page_id, p.page_name and you will need to add an aggregate function on the remaining field t.tag_name. You could use GROUP_CONTACT(t.tag_name ORDER BY t.tag_name ASC SEPARATOR ",") to obtain a list of all matching tags for this table.

如果每个表只需要一个结果,那么就需要一个p的组。page_id,p。page_name,您需要在其余字段t.tag_name上添加聚合函数。您可以使用GROUP_CONTACT(t。tag_name ORDER BY t。获取该表所有匹配标记的列表。

EDIT

编辑

So it seems in fact you want pages with matching title OR pages with matching keyword. In this case you should use LEFT JOIN, and you will have NULL values. If you do not need the tag in the result the EXISTS keyword is still your best friend, just replace the AND EXISTS with OR EXISTS. It's the fastest solution.

所以看起来你想要标题匹配的页面或者关键词匹配的页面。在这种情况下,您应该使用左连接,并且您将拥有空值。如果您不需要结果中的标记,那么exist关键字仍然是您最好的朋友,只需用或存在替换和。这是最快的解决方案。

If you need the matching tags in the result or NULL when they were no tags you have 2 solutions. A UNION query mixing results from a simple query on titles and a query on tags with inner joins, or doing the nice group by with GROUP_CONCAT. If you do not use GROUP_CONCAT (as in @Dmitry Teplyakov answer) you will maybe obtain results where the page title did'nt match, only the keyword, but the tag_name field will show NULL as the first tag_row listed before the GROUP BY is applied on the query is a NULL field -- the page as 3 keywords, the matching keyword is not the first in the query --.

如果结果中需要匹配的标签,或者没有标签时需要NULL,那么有两种解决方案。联合查询混合了标题上的简单查询和带有内部连接的标签上的查询的结果,或者使用GROUP_CONCAT进行分组。如果不使用GROUP_CONCAT(如@Dmitry Teplyakov回答)你将可能获得的结果页面标题不能匹配,关键字,但是tag_name字段将显示空之前列出的第一个tag_row集团的应用页面上查询是一个空的领域——3关键字,关键字匹配并不是第一个在查询——

SELECT 
 p.page_id,
 p.page_name,
 GROUP_CONCAT(t.tag_name ORDER BY t.tag_name ASC SEPARATOR ",")
FROM root_pages AS p
   LEFT JOIN root_mm_pages_tags AS mm
       ON mm.page_id = p.page_id
     LEFT JOIN root_tags AS t
         ON t.tag_id =  mm.tag_id 
WHERE p.page_hide != 1
 AND (p.page_title = 'article title 8'
  OR t.tag_name LIKE '%story%')
GROUP BY p.page_id, p.page_name;

But here we loose your order by tag_name. Ordering by tag_name means you wants the same page appearing in several rows if it is matching the keyword several times. Or if the name is matching and the keyword also... or maybe not. So in fact the UNION query solution is maybe better. But the key point is you should explain what you want in the tag_name field :-)

但是这里我们用tag_name来取消订单。通过tag_name排序意味着,如果同一页面多次匹配关键字,则希望同一页面出现在数行中。或者如果名字是匹配的,关键词也是。或者不是。所以实际上联合查询解决方案可能更好。但是关键的一点是您应该解释您想要在tag_name字段中:-)

#2


2  

Try to not use condition in LEFT JOIN:

尽量不要在左连接中使用条件:

SELECT *
FROM root_pages AS p

LEFT JOIN root_mm_pages_tags AS mm
ON mm.page_id = p.page_id

LEFT JOIN root_tags AS t
ON t.tag_id =  mm.tag_id

WHERE p.page_title LIKE '%article title 8%'
AND p.page_hide != '1'
AND t.tag_name LIKE '%story%'

GROUP BY p.page_id
ORDER BY (t.tag_name+0) ASC

EDIT: If you want to fetch rows with page title contains 'article title' and rows that have not that titles but have needed keyword, use this query (As @user985935 suggested):

编辑:如果您想获取包含“文章标题”的行,以及没有标题但需要关键字的行,请使用此查询(如@user985935所建议的):

SELECT *
FROM root_pages AS p

LEFT JOIN root_mm_pages_tags AS mm
ON mm.page_id = p.page_id

LEFT JOIN root_tags AS t
ON t.tag_id =  mm.tag_id

WHERE (p.page_title LIKE '%article title 8%'
OR t.tag_name LIKE '%story%')
AND p.page_hide != '1'


GROUP BY p.page_id
ORDER BY (t.tag_name+0) ASC

#3


0  

here is the sample query I'm mentioning on the comment:

下面是我在评论中提到的示例查询:

SELECT *
FROM root_pages AS p

LEFT JOIN root_mm_pages_tags AS mm
ON mm.page_id = p.page_id

LEFT JOIN root_tags AS t
ON t.tag_id =  mm.tag_id

WHERE p.page_hide != '1'
AND (t.tag_name LIKE '%story%' OR p.page_title LIKE '%article title 8%')
GROUP BY p.page_id
ORDER BY (t.tag_name+0) ASC