Postgres中的左外连接没有返回空值

时间:2022-09-25 12:24:13

A download is comprised of download-times, download-time id, and buno ID. Faults are comprised of fault-codes, download-time id, status, and type. A download can have many faults, and can be joined on the download-time id.

下载包括下载时间,下载时间ID和buno ID。故障由故障代码,下载时间ID,状态和类型组成。下载可能有许多错误,可以加载下载时ID。

Given a set of fault-codes, the results must contain each fault-code with a corresponding fault-count. If a fault-code is not found in the download, the fault-code must be returned with a fault-count of zero.

给定一组故障代码,结果必须包含每个具有相应故障计数的故障代码。如果在下载中未找到故障代码,则必须返回故障代码,故障计数为零。

The problem seems to require an OUTER JOIN, but haven't seen this working as expected on Postgres as it does not seem to return the set with nulls from the LEFT table.

问题似乎需要一个OUTER JOIN,但是没有看到它在Postgres上按预期工作,因为它似乎没有从LEFT表返回带有空值的集合。

The query is below, with some details left out for brevity:

查询如下,为简洁起见省略了一些细节:

SELECT  f.faultcode, f.downloadtimeid, d.downloadtime, count(*) as faultcount 
FROM    download_time d 
LEFT OUTER JOIN fs_fault f ON f.downloadtimeid = d.id
    AND f.faultcode IN (1000,1100)
    AND f.statusid IN(2, 4)
WHERE (d.downloadtime BETWEEN '04/11/2011' AND '05/01/2012')
    AND d.bunoid = 166501
GROUP BY d.bunoid, f.downloadtimeid, d.downloadtime, f.faultcode

The following day, I've edited to show the answer. All answers were close and had various elements of assistance. However, JayC's answer was closest. Here is the final SQL, having the only change as the WHERE clause taking the fault-code IN statement:

第二天,我编辑了以显示答案。所有答案都很接近,并有各种各样的帮助。然而,JayC的答案是最接近的。这是最终的SQL,只有作为WHERE子句采用故障代码IN语句的更改:

SELECT  f.faultcode, f.downloadtimeid, d.downloadtime, count(*) as faultcount
FROM    download_time d  
RIGHT OUTER JOIN fs_fault f ON f.downloadtimeid = d.id
        AND f.statusid IN(2, 4)
        AND d.downloadtime BETWEEN '04/11/2011' AND '05/01/2012'
        AND d.bunoid = 166501
WHERE f.faultcode IN (1000,1100)
GROUP BY d.bunoid, f.downloadtimeid, d.downloadtime, f.faultcode

Thanks, all for your assistance! Love this site!

谢谢,一切都是为了你的帮助!喜欢这个网站!

4 个解决方案

#1


20  

I'm giving my answer because I have significant doubts about the other answers. You gotta be careful about filter requirements. Remember, the where clause runs after your joins. So if there are any filter requirements in the where clause that refer to the non-outer joined table, you have (in many circumstances) nullified your outer join. So taking your sql, It seems the simplest solution is to either use the proper join or move the table names appropriately, and then move the filter conditions out of the where clause and into the join clause.

我给出了答案,因为我对其他答案有很大疑问。你必须小心过滤器的要求。请记住,where子句在您加入后运行。因此,如果where子句中有任何过滤器要求引用非外连接表,则(在许多情况下)使外连接无效。所以拿你的sql,似乎最简单的解决方案是使用正确的连接或适当地移动表名,然后将过滤条件移出where子句并进入join子句。

SELECT  f.faultcode, f.downloadtimeid, d.downloadtime, count(*) as faultcount 
FROM    download_time d 
RIGHT OUTER JOIN fs_fault f ON 
    f.downloadtimeid = d.id
    AND f.faultcode IN (1000,1100)
    AND f.statusid IN(2, 4)
    AND d.downloadtime BETWEEN '04/11/2011' AND '05/01/2012')
    AND d.bunoid = 166501
GROUP BY d.bunoid, f.downloadtimeid, d.downloadtime, f.faultcode

Another way which I believe should be equivalent is

我认为应该相同的另一种方式是

SELECT  f.faultcode, f.downloadtimeid, d.downloadtime, count(*) as faultcount 
FROM    download_time d 
RIGHT OUTER JOIN fs_fault f ON 
    f.downloadtimeid = d.id
    AND d.downloadtime BETWEEN '04/11/2011' AND '05/01/2012')
    AND d.bunoid = 166501
WHERE
    f.faultcode IN (1000,1100)
    AND f.statusid IN(2, 4)
GROUP BY d.bunoid, f.downloadtimeid, d.downloadtime, f.faultcode

As it doesn't strictly matter where the filter requirements on fs_fault are. (and your SQL engine's going to change that all up anyway).

因为fs_fault的过滤器要求不严格。 (而你的SQL引擎无论如何都会改变这一点)。

Edit: Here's a SQLFiddle demonstrating filtering on the join clause vs. the where clause.

编辑:这是一个SQLFiddle,演示了对join子句和where子句的过滤。

#2


2  

This will require a RIGHT OUTER JOIN. The right outer join includes all values from the right table, with NULLs where there is no entry in the left table (I'm not sure if this will work with GROUP BY, though...) if fs_fault were a table with all fault codes.

这将需要一个正确的外部加入。右外连接包括右表中的所有值,其中NULL表示左表中没有条目(我不确定这是否适用于GROUP BY,但是......)如果fs_fault是一个包含所有错误的表码。

In your case, fs_fault seems to contain all faults for a download. Might this be the case for the unexpected behavior?

在您的情况下,fs_fault似乎包含下载的所有错误。这可能是意外行为的情况吗?

#3


2  

If you want counts by faultcode, this seems like the simplest solution:

如果你想通过faultcode计算,这似乎是最简单的解决方案:

WITH fc(faultcode) AS (VALUES (1000,1100))
SELECT fc.faultcode, count(d.downloadtimeid) as faultcount 
  FROM fc
  LEFT JOIN (fs_fault f ON f.faultcode = fc.faultcode
                       AND f.statusid IN(2, 4)
  JOIN download_time d ON d.id = f.downloadtimeid
                      AND d.bunoid = 166501
                      AND d.downloadtime::date BETWEEN date '2011-04-11'
                                                   AND date '2011-05-01')
  GROUP BY fc.faultcode
  ORDER BY fc.faultcode

Note that I kept your conditions, where faults are not counted if they don't have the right statusid or bunoid. I was a bit afraid that the date selection might not have been doing what you thought, so I suggested an alternative. Even that might not do what you want if you're using TIMESTAMP WITHOUT TIME ZONE, but that's another story. I also added an ORDER BY clause, since you probably don't want the results in inconsistent order; without that clause it may or may not be in GROUP BY sequence, and that might change without warning.

请注意,我保留了您的条件,如果没有正确的statusid或bunoid,则不计算故障。我有点害怕日期选择可能没有按照你的想法进行,所以我提出了另一种选择。如果你在没有时区的情况下使用TIMESTAMP,那么即使这样做可能也不会做你想要的,但这是另一个故事。我还添加了一个ORDER BY子句,因为您可能不希望结果的顺序不一致;如果没有该子句,它可能会或可能不会处于GROUP BY序列中,并且可能会在没有警告的情况下更改。

#4


1  

The left outer join selects everything in the first table plus matching rows in the second table. The first table seems to consist of download attempts. So, your result from the "from" includes all download attempts.

左外连接选择第一个表中的所有内容以及第二个表中的匹配行。第一个表似乎包含下载尝试。因此,“from”的结果包括所有下载尝试。

But, it does not necessarily contain all your fault codes. What is happening is that you have no faults for one or more codes that meet the criteria.

但是,它不一定包含您的所有故障代码。发生的事情是,您对一个或多个符合条件的代码没有任何错误。

You need a table that contains all the fault codes, in order for this to work. Here I just create a list of the fault codes as the first table. I think the following query does this:

您需要一个包含所有故障代码的表,以便使其正常工作。这里我只是创建一个故障代码列表作为第一个表。我认为以下查询执行此操作:

SELECT thefaults.faultcode, f.downloadtimeid, d.downloadtime, count(*) as faultcount
FROM  (select 1000 as faultcode union all select 1100
      ) thefaults join
      fs_fault f
      on f.faultcode = thefaults.faultcode and
         f.statusid in (2, 4) left outer join
      download_time d
      ON f.downloadtimeid = d.id
WHERE (d.downloadtime BETWEEN '04/11/2011' AND '05/01/2012') AND
      d.bunoid = 166501
GROUP BY d.bunoid, f.downloadtimeid, d.downloadtime, f.faultcode 

I admit: I am using SQL Server syntax to create "thefaults".

我承认:我使用SQL Server语法来创建“故障”。

#1


20  

I'm giving my answer because I have significant doubts about the other answers. You gotta be careful about filter requirements. Remember, the where clause runs after your joins. So if there are any filter requirements in the where clause that refer to the non-outer joined table, you have (in many circumstances) nullified your outer join. So taking your sql, It seems the simplest solution is to either use the proper join or move the table names appropriately, and then move the filter conditions out of the where clause and into the join clause.

我给出了答案,因为我对其他答案有很大疑问。你必须小心过滤器的要求。请记住,where子句在您加入后运行。因此,如果where子句中有任何过滤器要求引用非外连接表,则(在许多情况下)使外连接无效。所以拿你的sql,似乎最简单的解决方案是使用正确的连接或适当地移动表名,然后将过滤条件移出where子句并进入join子句。

SELECT  f.faultcode, f.downloadtimeid, d.downloadtime, count(*) as faultcount 
FROM    download_time d 
RIGHT OUTER JOIN fs_fault f ON 
    f.downloadtimeid = d.id
    AND f.faultcode IN (1000,1100)
    AND f.statusid IN(2, 4)
    AND d.downloadtime BETWEEN '04/11/2011' AND '05/01/2012')
    AND d.bunoid = 166501
GROUP BY d.bunoid, f.downloadtimeid, d.downloadtime, f.faultcode

Another way which I believe should be equivalent is

我认为应该相同的另一种方式是

SELECT  f.faultcode, f.downloadtimeid, d.downloadtime, count(*) as faultcount 
FROM    download_time d 
RIGHT OUTER JOIN fs_fault f ON 
    f.downloadtimeid = d.id
    AND d.downloadtime BETWEEN '04/11/2011' AND '05/01/2012')
    AND d.bunoid = 166501
WHERE
    f.faultcode IN (1000,1100)
    AND f.statusid IN(2, 4)
GROUP BY d.bunoid, f.downloadtimeid, d.downloadtime, f.faultcode

As it doesn't strictly matter where the filter requirements on fs_fault are. (and your SQL engine's going to change that all up anyway).

因为fs_fault的过滤器要求不严格。 (而你的SQL引擎无论如何都会改变这一点)。

Edit: Here's a SQLFiddle demonstrating filtering on the join clause vs. the where clause.

编辑:这是一个SQLFiddle,演示了对join子句和where子句的过滤。

#2


2  

This will require a RIGHT OUTER JOIN. The right outer join includes all values from the right table, with NULLs where there is no entry in the left table (I'm not sure if this will work with GROUP BY, though...) if fs_fault were a table with all fault codes.

这将需要一个正确的外部加入。右外连接包括右表中的所有值,其中NULL表示左表中没有条目(我不确定这是否适用于GROUP BY,但是......)如果fs_fault是一个包含所有错误的表码。

In your case, fs_fault seems to contain all faults for a download. Might this be the case for the unexpected behavior?

在您的情况下,fs_fault似乎包含下载的所有错误。这可能是意外行为的情况吗?

#3


2  

If you want counts by faultcode, this seems like the simplest solution:

如果你想通过faultcode计算,这似乎是最简单的解决方案:

WITH fc(faultcode) AS (VALUES (1000,1100))
SELECT fc.faultcode, count(d.downloadtimeid) as faultcount 
  FROM fc
  LEFT JOIN (fs_fault f ON f.faultcode = fc.faultcode
                       AND f.statusid IN(2, 4)
  JOIN download_time d ON d.id = f.downloadtimeid
                      AND d.bunoid = 166501
                      AND d.downloadtime::date BETWEEN date '2011-04-11'
                                                   AND date '2011-05-01')
  GROUP BY fc.faultcode
  ORDER BY fc.faultcode

Note that I kept your conditions, where faults are not counted if they don't have the right statusid or bunoid. I was a bit afraid that the date selection might not have been doing what you thought, so I suggested an alternative. Even that might not do what you want if you're using TIMESTAMP WITHOUT TIME ZONE, but that's another story. I also added an ORDER BY clause, since you probably don't want the results in inconsistent order; without that clause it may or may not be in GROUP BY sequence, and that might change without warning.

请注意,我保留了您的条件,如果没有正确的statusid或bunoid,则不计算故障。我有点害怕日期选择可能没有按照你的想法进行,所以我提出了另一种选择。如果你在没有时区的情况下使用TIMESTAMP,那么即使这样做可能也不会做你想要的,但这是另一个故事。我还添加了一个ORDER BY子句,因为您可能不希望结果的顺序不一致;如果没有该子句,它可能会或可能不会处于GROUP BY序列中,并且可能会在没有警告的情况下更改。

#4


1  

The left outer join selects everything in the first table plus matching rows in the second table. The first table seems to consist of download attempts. So, your result from the "from" includes all download attempts.

左外连接选择第一个表中的所有内容以及第二个表中的匹配行。第一个表似乎包含下载尝试。因此,“from”的结果包括所有下载尝试。

But, it does not necessarily contain all your fault codes. What is happening is that you have no faults for one or more codes that meet the criteria.

但是,它不一定包含您的所有故障代码。发生的事情是,您对一个或多个符合条件的代码没有任何错误。

You need a table that contains all the fault codes, in order for this to work. Here I just create a list of the fault codes as the first table. I think the following query does this:

您需要一个包含所有故障代码的表,以便使其正常工作。这里我只是创建一个故障代码列表作为第一个表。我认为以下查询执行此操作:

SELECT thefaults.faultcode, f.downloadtimeid, d.downloadtime, count(*) as faultcount
FROM  (select 1000 as faultcode union all select 1100
      ) thefaults join
      fs_fault f
      on f.faultcode = thefaults.faultcode and
         f.statusid in (2, 4) left outer join
      download_time d
      ON f.downloadtimeid = d.id
WHERE (d.downloadtime BETWEEN '04/11/2011' AND '05/01/2012') AND
      d.bunoid = 166501
GROUP BY d.bunoid, f.downloadtimeid, d.downloadtime, f.faultcode 

I admit: I am using SQL Server syntax to create "thefaults".

我承认:我使用SQL Server语法来创建“故障”。