SQL Count（*）和Group By - 查找行之间的差异

Below is a SQL query I wrote to find the total number of rows by each Product ID (proc_id):

下面是我编写的SQL查询,用于查找每个产品ID(proc_id)的总行数:

SELECT proc_id, count(*)
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
GROUP BY proc_id
ORDER BY proc_id;

Below is the result of the SQL query above:

下面是上面SQL查询的结果:

proc_id count(*)
01  626
02  624
03  626
04  624
05  622
06  624
07  624
09  624

Notice the total counts by proc_id = '01', proc_id = '03', and proc_id = '05' are different (not equal to 624 rows as the other proc_id).

请注意,proc_id ='01',proc_id ='03'和proc_id ='05'的总计数不同(不等于624行,因为其他proc_id)。

How do I write a SQL query to find which proc_id rows are different for proc_id = '01', proc_id = '03', and proc_id = '05' as compared to the other proc_id?

如何编写SQL查询以查找proc_id ='01',proc_id ='03'和proc_id ='05'与其他proc_id相比哪些proc_id行不同?

6 个解决方案

#1

First you need to define the criteria that makes '624' correct. Is it the average count(*) ? Is it the count(*) that occurs most often? Is it your favorite count(*) ?

首先,您需要定义使'624'正确的标准。是平均数(*)吗?它是最常出现的计数(*)吗?这是你最喜欢的数量(*)吗?

Then you can use the HAVING clause to separate the ones that don't match your criteria:

然后,您可以使用HAVING子句来分隔与您的条件不匹配的子句:

SELECT proc_id, count(*)
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
GROUP BY proc_id
HAVING count(*) <> 624
ORDER BY proc_id;

or:

SELECT proc_id, count(*)
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
GROUP BY proc_id
HAVING count(*) <> (
  <insert here a subquery that produces the magic '624'>
 )
ORDER BY proc_id;

#2

If you know 624 is the magic number:

如果你知道624是神奇的数字:

SELECT proc_id, count(*)
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
GROUP BY proc_id
HAVING count(*) <> 624
ORDER BY proc_id;

#3

try this:

SELECT proc_id, count(*)
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
GROUP BY proc_id
HAVING count(*) <> (select count(*) from proc z where proc_id in (1) group by proc_id)
ORDER BY proc_id;

#4

You can't do this. For some procIds there are fewer rows with that ProcId. In other words, the rows that make that procId not have a count = 624 are rows that DO NOT EXIST. How can any query show those rows?

你不能这样做。对于某些procId,ProcId的行数较少。换句话说,使procId不具有count = 624的行是不存在的行。任何查询如何显示这些行?

For the ProcIds that have too many rows, IF ( and this is big if), IF all the rows in the 624 for other procIds have some attribute that is in common with a 624 count subset of the sets that are too large, then you might be able to identify the "extra" rows, buit there is no way to identify missing rows, all you can do is identify which procIds have too many rows or too few...

对于行数太多的ProcIds,IF(这个很大,如果),如果624中其他procId的所有行都有一些属性与624计数的子集太大,那么你可能能够识别“额外”行,但是没有办法识别丢失的行,你所能做的就是确定哪些行有太多行或太少......

#5

If I understand your question correctly (which is differently than the other posted answers) you want the rows that make proc_id 01 different? If that's the case, you need to join on all the columns that should be the same, and look for the differences. So, to compare 01 with 02:

如果我正确理解你的问题(这与其他发布的答案不同)你想要使proc_id 01的行不同吗?如果是这种情况,您需要加入应该相同的所有列,并查找差异。所以,比较01和02:

 SELECT [01].*
 FROM (
    SELECT * FROM proc
    WHERE grouping_primary = 'SLB'
    AND eff_date = '01-JUL-09'
    AND proc_id = '01'
 ) as [01]
 FULL JOIN (
    SELECT * FROM proc
    WHERE grouping_primary = 'SLB'
    AND eff_date = '01-JUL-09'
    AND proc_id = '02'
 ) as [02] ON
    [01].col1 = [02].col1
    AND [01].col2 = [02].col2
    AND [01].col3 = [02].col3
    /* etc...just don't include proc_id */
 WHERE
    [01].proc_id IS NULL --no match in [02]
    OR [02].proc_id IS NULL --no match in [01]

I'm pretty sure MS Sql Server has a row hash function that may make it easier if you have a bunch of columns...but I can't think of the name of it.

我很确定MS Sql Server有一个行哈希函数,如果你有一堆列可能会更容易...但我想不出它的名字。

#6

Well, in order to find the extra you would use the NOT IN phrase. To find the missing rows you would need to reverse the logic. This naturally assumes that all 624 rows are the same from proc_id to proc_id.

好吧,为了找到额外的你会使用NOT IN短语。要找到缺失的行,您需要反转逻辑。这自然假设从proc_id到proc_id的所有624行都是相同的。

SELECT proc_id, varying_column 
FROM proc
WHERE grouping_primary = 'SLB'
AND   eff_date = '01-JUL-09'
AND   varying_column NOT IN (SELECT b.varying_column 
                             FROM proc b
                             WHERE b.grouping_primary = 'SLB'
                             AND   b.eff_date = '01-JUL-09'
                             AND   b.proc_id = (SELECT FIRST a.proc_id
                                                FROM proc a
                                                WHERE a.grouping_primary = 'SLB'
                                                AND   a.eff_date = '01-JUL-09'
                                                AND   COUNT(a.*) = 624
                                                GROUP BY a.proc_id
                                                ORDER BY a.proc_id;))
ORDER BY proc_id, varying_column;

#1