需要sql查询帮助来查找标记有所有指定标签的内容

时间:2021-08-22 01:50:08

Let's say I have the following tables:

假设我有以下表格:

TAGS

id: integer
name: string

id:整数名称:字符串

POSTS

id: integer
body: text

id:整数体:文本

TAGGINGS

id: integer
tag_id: integer
post_id: integer

id:integer tag_id:integer post_id:integer

How would I go about writing a query that select all posts that are tagged with ALL of the following tags (name attribute of tags table): "Cheese", "Wine", "Paris", "Frace", "City", "Scenic", "Art"

我将如何编写一个查询,选择所有标记有以下所有标记的帖子(标签表的名称属性):“Cheese”,“Wine”,“Paris”,“Frace”,“City”,“风景“,”艺术“

See also: Need help with sql query to find things with most specified tags (note: similar, but not a duplicate!)

另请参阅:需要有关sql查询的帮助以查找具有最多指定标记的内容(注意:类似,但不重复!)

2 个解决方案

#1


17  

Using IN:

SELECT p.*
  FROM POSTS p
 WHERE p.id IN (SELECT tg.post_id
                  FROM TAGGINGS tg
                  JOIN TAGS t ON t.id = tg.tag_id
                 WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
              GROUP BY tg.post_id
                HAVING COUNT(DISTINCT t.name) = 7)

Using a JOIN

SELECT p.*
  FROM POSTS p
  JOIN (SELECT tg.post_id
          FROM TAGGINGS tg
          JOIN TAGS t ON t.id = tg.tag_id
         WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
      GROUP BY tg.post_id
        HAVING COUNT(DISTINCT t.name) = 7) x ON x.post_id = p.id

Using EXISTS

SELECT p.*
  FROM POSTS p
 WHERE EXISTS (SELECT NULL
                 FROM TAGGINGS tg
                 JOIN TAGS t ON t.id = tg.tag_id
                WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
                  AND tg.post_id = p.id
             GROUP BY tg.post_id
               HAVING COUNT(DISTINCT t.name) = 7)

Explanation

The crux of things is that the COUNT(DISTINCT t.name) needs to match the number of tag names to ensure that all those tags are related to the post. Without the DISTINCT, there's a risk that duplicates of one of the names could return a count of 7--so you'd have a false positive.

事情的关键是COUNT(DISTINCT t.name)需要匹配标签名称的数量,以确保所有这些标签与帖子相关。如果没有DISTINCT,其中一个名称的重复可能会返回7的数量 - 因此您会有误报。

Performance

Most will tell you the JOIN is optimal, but JOINs also risk duplicating rows in the resultset. EXISTS would be my next choice--no duplicate risk, and generally faster execution but checking the explain plan will ultimately tell you what's best based on your setup and data.

大多数人会告诉你JOIN是最优的,但是JOIN还冒着重复结果集中的行的风险。 EXISTS将是我的下一个选择 - 没有重复的风险,通常执行速度更快,但检查解释计划最终将根据您的设置和数据告诉您什么是最好的。

#2


1  

Try this:

尝试这个:

Select * From Posts p
   Where Not Exists
       (Select * From tags t
        Where name in 
           ('Cheese', 'Wine', 'Paris', 
             'Frace', 'City', 'Scenic', 'Art')
           And Not Exists
             (Select * From taggings
              Where tag_id = t.Tag_Id
                And post_Id = p.Post_Id))

Explanation: Asking for a list of those Posts that have had every one of a specified set of tags associated with it is equivilent to asking for those posts where there is no tag in that same specified set, that has not been associated with it. i.e., the sql above.

说明:询问那些已经拥有与之关联的指定标记集中的每一个的帖子的列表,等同于询问那些在同一指定集合中没有标记的帖子,这些帖子没有与之关联。即,上面的sql。

#1


17  

Using IN:

SELECT p.*
  FROM POSTS p
 WHERE p.id IN (SELECT tg.post_id
                  FROM TAGGINGS tg
                  JOIN TAGS t ON t.id = tg.tag_id
                 WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
              GROUP BY tg.post_id
                HAVING COUNT(DISTINCT t.name) = 7)

Using a JOIN

SELECT p.*
  FROM POSTS p
  JOIN (SELECT tg.post_id
          FROM TAGGINGS tg
          JOIN TAGS t ON t.id = tg.tag_id
         WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
      GROUP BY tg.post_id
        HAVING COUNT(DISTINCT t.name) = 7) x ON x.post_id = p.id

Using EXISTS

SELECT p.*
  FROM POSTS p
 WHERE EXISTS (SELECT NULL
                 FROM TAGGINGS tg
                 JOIN TAGS t ON t.id = tg.tag_id
                WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
                  AND tg.post_id = p.id
             GROUP BY tg.post_id
               HAVING COUNT(DISTINCT t.name) = 7)

Explanation

The crux of things is that the COUNT(DISTINCT t.name) needs to match the number of tag names to ensure that all those tags are related to the post. Without the DISTINCT, there's a risk that duplicates of one of the names could return a count of 7--so you'd have a false positive.

事情的关键是COUNT(DISTINCT t.name)需要匹配标签名称的数量,以确保所有这些标签与帖子相关。如果没有DISTINCT,其中一个名称的重复可能会返回7的数量 - 因此您会有误报。

Performance

Most will tell you the JOIN is optimal, but JOINs also risk duplicating rows in the resultset. EXISTS would be my next choice--no duplicate risk, and generally faster execution but checking the explain plan will ultimately tell you what's best based on your setup and data.

大多数人会告诉你JOIN是最优的,但是JOIN还冒着重复结果集中的行的风险。 EXISTS将是我的下一个选择 - 没有重复的风险,通常执行速度更快,但检查解释计划最终将根据您的设置和数据告诉您什么是最好的。

#2


1  

Try this:

尝试这个:

Select * From Posts p
   Where Not Exists
       (Select * From tags t
        Where name in 
           ('Cheese', 'Wine', 'Paris', 
             'Frace', 'City', 'Scenic', 'Art')
           And Not Exists
             (Select * From taggings
              Where tag_id = t.Tag_Id
                And post_Id = p.Post_Id))

Explanation: Asking for a list of those Posts that have had every one of a specified set of tags associated with it is equivilent to asking for those posts where there is no tag in that same specified set, that has not been associated with it. i.e., the sql above.

说明:询问那些已经拥有与之关联的指定标记集中的每一个的帖子的列表,等同于询问那些在同一指定集合中没有标记的帖子,这些帖子没有与之关联。即,上面的sql。