Let's say I have the following tables:
假设我有以下表格:
TAGS
id: integer
name: string
id:整数名称:字符串
POSTS
id: integer
body: text
id:整数体:文本
TAGGINGS
id: integer
tag_id: integer
post_id: integer
id:integer tag_id:integer post_id:integer
How would I go about writing a query that select all posts that are tagged with ALL of the following tags (name attribute of tags table): "Cheese", "Wine", "Paris", "Frace", "City", "Scenic", "Art"
我将如何编写一个查询,选择所有标记有以下所有标记的帖子(标签表的名称属性):“Cheese”,“Wine”,“Paris”,“Frace”,“City”,“风景“,”艺术“
See also: Need help with sql query to find things with most specified tags (note: similar, but not a duplicate!)
另请参阅:需要有关sql查询的帮助以查找具有最多指定标记的内容(注意:类似,但不重复!)
2 个解决方案
#1
17
Using IN:
SELECT p.*
FROM POSTS p
WHERE p.id IN (SELECT tg.post_id
FROM TAGGINGS tg
JOIN TAGS t ON t.id = tg.tag_id
WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
GROUP BY tg.post_id
HAVING COUNT(DISTINCT t.name) = 7)
Using a JOIN
SELECT p.*
FROM POSTS p
JOIN (SELECT tg.post_id
FROM TAGGINGS tg
JOIN TAGS t ON t.id = tg.tag_id
WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
GROUP BY tg.post_id
HAVING COUNT(DISTINCT t.name) = 7) x ON x.post_id = p.id
Using EXISTS
SELECT p.*
FROM POSTS p
WHERE EXISTS (SELECT NULL
FROM TAGGINGS tg
JOIN TAGS t ON t.id = tg.tag_id
WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
AND tg.post_id = p.id
GROUP BY tg.post_id
HAVING COUNT(DISTINCT t.name) = 7)
Explanation
The crux of things is that the COUNT(DISTINCT t.name)
needs to match the number of tag names to ensure that all those tags are related to the post. Without the DISTINCT, there's a risk that duplicates of one of the names could return a count of 7--so you'd have a false positive.
事情的关键是COUNT(DISTINCT t.name)需要匹配标签名称的数量,以确保所有这些标签与帖子相关。如果没有DISTINCT,其中一个名称的重复可能会返回7的数量 - 因此您会有误报。
Performance
Most will tell you the JOIN is optimal, but JOINs also risk duplicating rows in the resultset. EXISTS would be my next choice--no duplicate risk, and generally faster execution but checking the explain plan will ultimately tell you what's best based on your setup and data.
大多数人会告诉你JOIN是最优的,但是JOIN还冒着重复结果集中的行的风险。 EXISTS将是我的下一个选择 - 没有重复的风险,通常执行速度更快,但检查解释计划最终将根据您的设置和数据告诉您什么是最好的。
#2
1
Try this:
尝试这个:
Select * From Posts p
Where Not Exists
(Select * From tags t
Where name in
('Cheese', 'Wine', 'Paris',
'Frace', 'City', 'Scenic', 'Art')
And Not Exists
(Select * From taggings
Where tag_id = t.Tag_Id
And post_Id = p.Post_Id))
Explanation: Asking for a list of those Posts that have had every one of a specified set of tags associated with it is equivilent to asking for those posts where there is no tag in that same specified set, that has not been associated with it. i.e., the sql above.
说明:询问那些已经拥有与之关联的指定标记集中的每一个的帖子的列表,等同于询问那些在同一指定集合中没有标记的帖子,这些帖子没有与之关联。即,上面的sql。
#1
17
Using IN:
SELECT p.*
FROM POSTS p
WHERE p.id IN (SELECT tg.post_id
FROM TAGGINGS tg
JOIN TAGS t ON t.id = tg.tag_id
WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
GROUP BY tg.post_id
HAVING COUNT(DISTINCT t.name) = 7)
Using a JOIN
SELECT p.*
FROM POSTS p
JOIN (SELECT tg.post_id
FROM TAGGINGS tg
JOIN TAGS t ON t.id = tg.tag_id
WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
GROUP BY tg.post_id
HAVING COUNT(DISTINCT t.name) = 7) x ON x.post_id = p.id
Using EXISTS
SELECT p.*
FROM POSTS p
WHERE EXISTS (SELECT NULL
FROM TAGGINGS tg
JOIN TAGS t ON t.id = tg.tag_id
WHERE t.name IN ('Cheese','Wine','Paris','Frace','City','Scenic','Art')
AND tg.post_id = p.id
GROUP BY tg.post_id
HAVING COUNT(DISTINCT t.name) = 7)
Explanation
The crux of things is that the COUNT(DISTINCT t.name)
needs to match the number of tag names to ensure that all those tags are related to the post. Without the DISTINCT, there's a risk that duplicates of one of the names could return a count of 7--so you'd have a false positive.
事情的关键是COUNT(DISTINCT t.name)需要匹配标签名称的数量,以确保所有这些标签与帖子相关。如果没有DISTINCT,其中一个名称的重复可能会返回7的数量 - 因此您会有误报。
Performance
Most will tell you the JOIN is optimal, but JOINs also risk duplicating rows in the resultset. EXISTS would be my next choice--no duplicate risk, and generally faster execution but checking the explain plan will ultimately tell you what's best based on your setup and data.
大多数人会告诉你JOIN是最优的,但是JOIN还冒着重复结果集中的行的风险。 EXISTS将是我的下一个选择 - 没有重复的风险,通常执行速度更快,但检查解释计划最终将根据您的设置和数据告诉您什么是最好的。
#2
1
Try this:
尝试这个:
Select * From Posts p
Where Not Exists
(Select * From tags t
Where name in
('Cheese', 'Wine', 'Paris',
'Frace', 'City', 'Scenic', 'Art')
And Not Exists
(Select * From taggings
Where tag_id = t.Tag_Id
And post_Id = p.Post_Id))
Explanation: Asking for a list of those Posts that have had every one of a specified set of tags associated with it is equivilent to asking for those posts where there is no tag in that same specified set, that has not been associated with it. i.e., the sql above.
说明:询问那些已经拥有与之关联的指定标记集中的每一个的帖子的列表,等同于询问那些在同一指定集合中没有标记的帖子,这些帖子没有与之关联。即,上面的sql。