I know the title of this question is a bit confusing, so bear with me. :)
我知道这个问题的标题有点令人困惑,所以请耐心等待。 :)
I have a (MySQL) database with a Person
record. A Person
also has a slug field. Unfortunately, slug fields are not unique. There are a number of duplicate records, i.e., the records have different IDs but the same first name, last name, and slug. A Person
may also have 0 or more associated articles, blog entries, and podcast episodes.
我有一个带有Person记录的(MySQL)数据库。一个人也有一个slu field领域。不幸的是,slug字段并不是唯一的。有许多重复记录,即记录具有不同的ID但具有相同的名字,姓氏和slug。一个人也可能有0个或更多相关文章,博客条目和播客剧集。
If that's confusing, here's a diagram of the structure:
如果这令人困惑,这里是结构图:
alt text http://mipadi.cbstaff.com/images/misc/people_db.jpg
alt text http://mipadi.cbstaff.com/images/misc/people_db.jpg
I would like to produce a list of records that match this criteria: duplicate records (i.e., same slug field) for people who also have at least 1 article, blog entry, or podcast episode.
我想生成一个符合此标准的记录列表:重复记录(即同一段塞字段),适用于同时拥有至少1篇文章,博客条目或播客剧集的人。
I have a SQL query that will list all records with the same slug fields:
我有一个SQL查询,将列出具有相同slug字段的所有记录:
SELECT
id,
first_name,
last_name,
slug,
COUNT(slug) AS person_records
FROM
people_person
GROUP BY
slug
HAVING
(COUNT(slug) > 1)
ORDER BY
last_name, first_name, id;
But this includes records for people that may not have at least 1 article, blog entry, or podcast. Can I tweak this to fit the second criteria?
但这包括可能没有至少1篇文章,博客条目或播客的人的记录。我可以调整一下以符合第二个标准吗?
Edit:
I updated the database diagram to simplify it and make it more clear what I am doing. (Note, some of the DB table names changed -- I was trying to give a higher-level look at the structure before, but it was a bit unclear.)
我更新了数据库图表以简化它并使我更清楚我在做什么。 (注意,一些数据库表名改变了 - 我之前尝试对结构进行更高层次的观察,但有点不清楚。)
5 个解决方案
#1
2
Select P.id, P.first_name, P.last_name, P.slug
From people_person as P
Join (
Select P1.slug
From people_person As P1
Where Exists (
Select 1
From magazine_author As ma1
Where ma1.person_id = P1.id
Union All
Select 1
From podcast_episode_guests As pod1
Where pod1.person_id = P1.Id
Union All
Select 1
From blogs_blog_authors As b1
Where b1.person_id = P1.Id
)
Group By P1.slug
Having Count(*) > 1
) As dup_slugs
On dup_slugs.slug = P.slug
Order By P.last_name, P.first_name, P.id
#2
1
You can still include a WHERE clause to filter the results:
您仍然可以包含WHERE子句来过滤结果:
SELECT
id,
first_name,
last_name,
slug,
COUNT(slug) AS person_records
FROM
people_person
WHERE id IN (SELECT id FROM article)
GROUP BY
slug
HAVING
(COUNT(slug) > 1)
ORDER BY
last_name, first_name, id;
#3
1
You could perhaps handle it through the having clause:
你也许可以通过having子句来处理它:
select Id
, last_name
, first_name
, slug
, COUNT(*) as Person_Records
from Person as p
group by Id
, last_name
, first_name
, slug
having COUNT(slug) > 1
and (
select COUNT(*)
from Author as a
where a.Person_Id = p.Id
) > 1
and (
select COUNT(*)
from Podcast_Guests as pg
where pg.Person_Id = p.Id
) > 1
I omitted the remaining conditions as this is a simple sample.
我省略了剩余的条件,因为这是一个简单的样本。
I hope this helps! =)
我希望这有帮助! =)
#4
1
SELECT
id,
first_name,
last_name,
slug,
COUNT(slug) AS person_records,
FROM
people_person
WHERE
id IN (SELECT person_id from podcast_guests GROUP BY person_id) OR
id IN (SELECT person_id from authors GROUP BY person_id) OR
[....]
GROUP BY
slug
HAVING
(COUNT(slug) > 1)
ORDER BY
last_name, first_name, id;
#5
0
The other sql statements in the question and other answers are all incorrect, I will try to explain how to avoid the chicken and egg problem using a function (which makes the code much clearer):
问题和其他答案中的其他sql语句都是不正确的,我将尝试解释如何使用函数避免鸡和蛋问题(这使代码更清晰):
SELECT first_name,
last_name,
slug,
COUNT(slug) AS person_records,
SUM(get_count_articles(id)) AS total_articles
FROM people_person
GROUP BY first_name,
last_name,
slug
HAVING COUNT(*) > 1 AND SUM(get_count_articles(id))>=1
ORDER BY last_name, first_name;
With the function (written in Oracle syntax, please excuse my lack of knowledge of mysql functions).
有了这个函数(用Oracle语法编写,请原谅我对mysql函数缺乏了解)。
FUNCTION get_count_articles(p_id NUMBER) RETURNS NUMBER IS
l_mag_auth NUMBER;
l_pod_guests NUMBER;
l_blog_auth NUMBER;
BEGIN
SELECT COUNT(*)
INTO l_mag_auth
FROM magazine_author ma1, article a1
WHERE ma1.person_id = p_id;
SELECT COUNT(*)
INTO l_pod_guests
FROM podcast_episode_guests As pod1
WHERE pod1.person_id = p_id;
SELECT COUNT(*)
INTO l_blog_auth
FROM blogs_blog_authors As b1
WHERE b1.person_id = p_id;
RETURN l_mag_auth+l_pod_guests+l_blog_auth;
END;
Note1: The magazine_author should be linked to article as above because the there may not actually be an article.
注1:magazine_author应与上述文章相关联,因为实际上可能没有文章。
Note2: I have removed the ID from the original questions select and group by because it will force the wrong answer (as id should be unique in the table no record will be returned EVER). The syntax count(slug) may be confusing the issue here. If the output requires both of the duplicate rows then you MUST re-link to the people_person table to show the list of id's for the slug.
注意2:我已从原始问题select和group by中删除了ID,因为它会强制执行错误的答案(因为id在表中应该是唯一的,否则将不返回任何记录)。语法计数(slug)可能会混淆此处的问题。如果输出需要两个重复的行,那么你必须重新链接到people_person表以显示slug的id列表。
#1
2
Select P.id, P.first_name, P.last_name, P.slug
From people_person as P
Join (
Select P1.slug
From people_person As P1
Where Exists (
Select 1
From magazine_author As ma1
Where ma1.person_id = P1.id
Union All
Select 1
From podcast_episode_guests As pod1
Where pod1.person_id = P1.Id
Union All
Select 1
From blogs_blog_authors As b1
Where b1.person_id = P1.Id
)
Group By P1.slug
Having Count(*) > 1
) As dup_slugs
On dup_slugs.slug = P.slug
Order By P.last_name, P.first_name, P.id
#2
1
You can still include a WHERE clause to filter the results:
您仍然可以包含WHERE子句来过滤结果:
SELECT
id,
first_name,
last_name,
slug,
COUNT(slug) AS person_records
FROM
people_person
WHERE id IN (SELECT id FROM article)
GROUP BY
slug
HAVING
(COUNT(slug) > 1)
ORDER BY
last_name, first_name, id;
#3
1
You could perhaps handle it through the having clause:
你也许可以通过having子句来处理它:
select Id
, last_name
, first_name
, slug
, COUNT(*) as Person_Records
from Person as p
group by Id
, last_name
, first_name
, slug
having COUNT(slug) > 1
and (
select COUNT(*)
from Author as a
where a.Person_Id = p.Id
) > 1
and (
select COUNT(*)
from Podcast_Guests as pg
where pg.Person_Id = p.Id
) > 1
I omitted the remaining conditions as this is a simple sample.
我省略了剩余的条件,因为这是一个简单的样本。
I hope this helps! =)
我希望这有帮助! =)
#4
1
SELECT
id,
first_name,
last_name,
slug,
COUNT(slug) AS person_records,
FROM
people_person
WHERE
id IN (SELECT person_id from podcast_guests GROUP BY person_id) OR
id IN (SELECT person_id from authors GROUP BY person_id) OR
[....]
GROUP BY
slug
HAVING
(COUNT(slug) > 1)
ORDER BY
last_name, first_name, id;
#5
0
The other sql statements in the question and other answers are all incorrect, I will try to explain how to avoid the chicken and egg problem using a function (which makes the code much clearer):
问题和其他答案中的其他sql语句都是不正确的,我将尝试解释如何使用函数避免鸡和蛋问题(这使代码更清晰):
SELECT first_name,
last_name,
slug,
COUNT(slug) AS person_records,
SUM(get_count_articles(id)) AS total_articles
FROM people_person
GROUP BY first_name,
last_name,
slug
HAVING COUNT(*) > 1 AND SUM(get_count_articles(id))>=1
ORDER BY last_name, first_name;
With the function (written in Oracle syntax, please excuse my lack of knowledge of mysql functions).
有了这个函数(用Oracle语法编写,请原谅我对mysql函数缺乏了解)。
FUNCTION get_count_articles(p_id NUMBER) RETURNS NUMBER IS
l_mag_auth NUMBER;
l_pod_guests NUMBER;
l_blog_auth NUMBER;
BEGIN
SELECT COUNT(*)
INTO l_mag_auth
FROM magazine_author ma1, article a1
WHERE ma1.person_id = p_id;
SELECT COUNT(*)
INTO l_pod_guests
FROM podcast_episode_guests As pod1
WHERE pod1.person_id = p_id;
SELECT COUNT(*)
INTO l_blog_auth
FROM blogs_blog_authors As b1
WHERE b1.person_id = p_id;
RETURN l_mag_auth+l_pod_guests+l_blog_auth;
END;
Note1: The magazine_author should be linked to article as above because the there may not actually be an article.
注1:magazine_author应与上述文章相关联,因为实际上可能没有文章。
Note2: I have removed the ID from the original questions select and group by because it will force the wrong answer (as id should be unique in the table no record will be returned EVER). The syntax count(slug) may be confusing the issue here. If the output requires both of the duplicate rows then you MUST re-link to the people_person table to show the list of id's for the slug.
注意2:我已从原始问题select和group by中删除了ID,因为它会强制执行错误的答案(因为id在表中应该是唯一的,否则将不返回任何记录)。语法计数(slug)可能会混淆此处的问题。如果输出需要两个重复的行,那么你必须重新链接到people_person表以显示slug的id列表。