从多个表中分组的多个计数,其中包含一个连接

时间:2022-03-17 11:10:33

I have two tables that hold image paths. The counts of approved images should equal the count of published images but I have evidence that they do not. One table has to have a join to group by the same criteria as the other table. I only want to return rows where the two counts don't match up. I've included the two queries I have that would return the data individually. Even if I tried to use Excel as a crutch, the amount of rows this query would return is in the millions.

我有两个表格来保存图像路径。批准图像的数量应该等于已发布图像的数量,但我有证据表明它们没有。一个表必须按照与另一个表相同的条件进行分组。我只想返回两个计数不匹配的行。我已经包含了我将分别返回数据的两个查询。即使我尝试使用Excel作为拐杖,此查询将返回的行数也是数百万。

Query 1

select product_id, count(*)
from published p
join published_set ps on ps.id = p.media_set_id
group by ps.product_id

Query 2

select product_id, count(*)
from media
where status in 'APPROVED'
group by pro_sku

I did use Excel to pull one category of products that I suspected was the worst and got 8,000 mismatches out of 12,000 products. I want to compare this to other areas of the website, which I believe have little to no mismatches. My suspicion is a system is inserting data into the table incorrectly in a specific category.

我确实使用Excel来提取一类我认为最差的产品,并在12,000种产品中获得了8,000种不匹配。我想将此与网站的其他区域进行比较,我相信这些区域几乎没有不匹配。我怀疑是系统在特定类别中错误地将数据插入表中。

1 个解决方案

#1


0  

Your question does leave a little to be clarified. For example, when you say 8,000 mismatches, how are you determining that there are 8,000 mismatches? Is this just the difference in the totals from the grouping? Also, is your concern that there are published images that have not been approved, or the other way around?

你的问题确实留下了一点澄清。例如,当你说8,000个不匹配时,你如何确定有8,000个不匹配?这只是分组总数的差异吗?此外,您是否担心已发布的图像尚未获得批准,反之亦然?

Listing the columns of the tables may help here, as I don't know what information you need to understand the issue at hand.

列出表的列可能对此有所帮助,因为我不知道您需要哪些信息来理解手头的问题。

Edit: Provided the following query that lists the counts and those product_id's where the count is different. Nothing too hard here. Let me know if this doesn't satisfy what you are after. You just need to join what you already had really, and then selecting those rows where the counts are not equal.

编辑:提供以下查询列出计数和那些计数不同的product_id。这里没什么难的。如果这不符合您的要求,请告诉我。你只需要加入你已经拥有的东西,然后选择那些计数不相等的行。

select Published.product_id,Published_Count,Media_Count
from (
    select product_id, count(*) as Published_Count
    from published p
    join published_set ps on ps.id = p.media_set_id
    group by ps.product_id
) Published

join (
    select product_id,count(*) as Media_Count
    from media
    where [status] = 'APPROVED'
    group by product_id
) Media
on Published.product_id = Media.product_id
where Published_Count <> Media_Count

#1


0  

Your question does leave a little to be clarified. For example, when you say 8,000 mismatches, how are you determining that there are 8,000 mismatches? Is this just the difference in the totals from the grouping? Also, is your concern that there are published images that have not been approved, or the other way around?

你的问题确实留下了一点澄清。例如,当你说8,000个不匹配时,你如何确定有8,000个不匹配?这只是分组总数的差异吗?此外,您是否担心已发布的图像尚未获得批准,反之亦然?

Listing the columns of the tables may help here, as I don't know what information you need to understand the issue at hand.

列出表的列可能对此有所帮助,因为我不知道您需要哪些信息来理解手头的问题。

Edit: Provided the following query that lists the counts and those product_id's where the count is different. Nothing too hard here. Let me know if this doesn't satisfy what you are after. You just need to join what you already had really, and then selecting those rows where the counts are not equal.

编辑:提供以下查询列出计数和那些计数不同的product_id。这里没什么难的。如果这不符合您的要求,请告诉我。你只需要加入你已经拥有的东西,然后选择那些计数不相等的行。

select Published.product_id,Published_Count,Media_Count
from (
    select product_id, count(*) as Published_Count
    from published p
    join published_set ps on ps.id = p.media_set_id
    group by ps.product_id
) Published

join (
    select product_id,count(*) as Media_Count
    from media
    where [status] = 'APPROVED'
    group by product_id
) Media
on Published.product_id = Media.product_id
where Published_Count <> Media_Count