如何使用SQL来选择重复记录以及相关项目的计数？

I know the title of this question is a bit confusing, so bear with me. :)

我知道这个问题的标题有点令人困惑,所以请耐心等待。 :)

I have a (MySQL) database with a Person record. A Person also has a slug field. Unfortunately, slug fields are not unique. There are a number of duplicate records, i.e., the records have different IDs but the same first name, last name, and slug. A Person may also have 0 or more associated articles, blog entries, and podcast episodes.

我有一个带有Person记录的(MySQL)数据库。一个人也有一个slu field领域。不幸的是,slug字段并不是唯一的。有许多重复记录,即记录具有不同的ID但具有相同的名字,姓氏和slug。一个人也可能有0个或更多相关文章,博客条目和播客剧集。

If that's confusing, here's a diagram of the structure:

如果这令人困惑,这里是结构图:

alt text http://mipadi.cbstaff.com/images/misc/people_db.jpg

I would like to produce a list of records that match this criteria: duplicate records (i.e., same slug field) for people who also have at least 1 article, blog entry, or podcast episode.

我想生成一个符合此标准的记录列表:重复记录(即同一段塞字段),适用于同时拥有至少1篇文章,博客条目或播客剧集的人。

I have a SQL query that will list all records with the same slug fields:

我有一个SQL查询,将列出具有相同slug字段的所有记录:

SELECT
 id,
 first_name,
 last_name,
 slug,
 COUNT(slug) AS person_records
FROM
 people_person
GROUP BY
 slug
HAVING
 (COUNT(slug) > 1)
ORDER BY
 last_name, first_name, id;

But this includes records for people that may not have at least 1 article, blog entry, or podcast. Can I tweak this to fit the second criteria?

但这包括可能没有至少1篇文章,博客条目或播客的人的记录。我可以调整一下以符合第二个标准吗?

Edit:

I updated the database diagram to simplify it and make it more clear what I am doing. (Note, some of the DB table names changed -- I was trying to give a higher-level look at the structure before, but it was a bit unclear.)

我更新了数据库图表以简化它并使我更清楚我在做什么。 (注意,一些数据库表名改变了 - 我之前尝试对结构进行更高层次的观察,但有点不清楚。)

5 个解决方案

#1

Select P.id, P.first_name, P.last_name, P.slug
From people_person as P
    Join    (
            Select P1.slug
            From people_person As P1
            Where Exists    (
                            Select 1
                            From magazine_author As ma1
                            Where ma1.person_id = P1.id
                            Union All
                            Select 1
                            From podcast_episode_guests As pod1
                            Where pod1.person_id = P1.Id
                            Union All
                            Select 1
                            From blogs_blog_authors As b1
                            Where b1.person_id = P1.Id
                            )
            Group By P1.slug
            Having Count(*) > 1
            ) As dup_slugs
        On dup_slugs.slug = P.slug
Order By P.last_name, P.first_name, P.id

#2

You can still include a WHERE clause to filter the results:

您仍然可以包含WHERE子句来过滤结果:

SELECT
 id,
 first_name,
 last_name,
 slug,
 COUNT(slug) AS person_records
FROM
 people_person
WHERE id IN (SELECT id FROM article)
GROUP BY
 slug
HAVING
 (COUNT(slug) > 1)
ORDER BY
 last_name, first_name, id;

#3

You could perhaps handle it through the having clause:

你也许可以通过having子句来处理它:

select Id
        , last_name
        , first_name
        , slug
        , COUNT(*) as Person_Records
    from Person as p
    group by Id
            , last_name
            , first_name
            , slug
        having COUNT(slug) > 1
            and ( 
                select COUNT(*)
                    from Author as a
                    where a.Person_Id = p.Id
            ) > 1
            and (
                select COUNT(*)
                    from Podcast_Guests as pg
                    where pg.Person_Id = p.Id
            ) > 1

I omitted the remaining conditions as this is a simple sample.

我省略了剩余的条件,因为这是一个简单的样本。

I hope this helps! =)

我希望这有帮助! =)

#4

SELECT
 id,
 first_name,
 last_name,
 slug,
 COUNT(slug) AS person_records,
FROM
 people_person
WHERE 
 id IN (SELECT person_id from podcast_guests GROUP BY person_id) OR 
 id IN (SELECT person_id from authors GROUP BY person_id) OR 
 [....]
GROUP BY
 slug
HAVING
 (COUNT(slug) > 1)
ORDER BY
 last_name, first_name, id;

#5

The other sql statements in the question and other answers are all incorrect, I will try to explain how to avoid the chicken and egg problem using a function (which makes the code much clearer):

问题和其他答案中的其他sql语句都是不正确的,我将尝试解释如何使用函数避免鸡和蛋问题(这使代码更清晰):

SELECT  first_name,
        last_name, 
        slug,
        COUNT(slug) AS person_records,
        SUM(get_count_articles(id)) AS total_articles
FROM  people_person
GROUP BY first_name,
        last_name, 
        slug
HAVING  COUNT(*) > 1 AND SUM(get_count_articles(id))>=1
ORDER   BY  last_name, first_name;

With the function (written in Oracle syntax, please excuse my lack of knowledge of mysql functions).

有了这个函数(用Oracle语法编写,请原谅我对mysql函数缺乏了解)。

FUNCTION get_count_articles(p_id NUMBER) RETURNS NUMBER IS
  l_mag_auth NUMBER;
  l_pod_guests NUMBER;
  l_blog_auth NUMBER;
BEGIN
  SELECT COUNT(*)
  INTO l_mag_auth
  FROM magazine_author ma1, article a1
  WHERE ma1.person_id = p_id;

  SELECT COUNT(*) 
  INTO l_pod_guests
  FROM podcast_episode_guests As pod1
  WHERE pod1.person_id = p_id;

  SELECT COUNT(*)
  INTO l_blog_auth
  FROM blogs_blog_authors As b1
  WHERE b1.person_id = p_id;

  RETURN l_mag_auth+l_pod_guests+l_blog_auth;
END;

Note1: The magazine_author should be linked to article as above because the there may not actually be an article.

注1:magazine_author应与上述文章相关联,因为实际上可能没有文章。

Note2: I have removed the ID from the original questions select and group by because it will force the wrong answer (as id should be unique in the table no record will be returned EVER). The syntax count(slug) may be confusing the issue here. If the output requires both of the duplicate rows then you MUST re-link to the people_person table to show the list of id's for the slug.

注意2:我已从原始问题select和group by中删除了ID,因为它会强制执行错误的答案(因为id在表中应该是唯一的,否则将不返回任何记录)。语法计数(slug)可能会混淆此处的问题。如果输出需要两个重复的行,那么你必须重新链接到people_person表以显示slug的id列表。

#1

Select P.id, P.first_name, P.last_name, P.slug
From people_person as P
    Join    (
            Select P1.slug
            From people_person As P1
            Where Exists    (
                            Select 1
                            From magazine_author As ma1
                            Where ma1.person_id = P1.id
                            Union All
                            Select 1
                            From podcast_episode_guests As pod1
                            Where pod1.person_id = P1.Id
                            Union All
                            Select 1
                            From blogs_blog_authors As b1
                            Where b1.person_id = P1.Id
                            )
            Group By P1.slug
            Having Count(*) > 1
            ) As dup_slugs
        On dup_slugs.slug = P.slug
Order By P.last_name, P.first_name, P.id

#2