如何有效地编写SQL条件?

时间:2021-09-03 16:58:06

I have the following very large (~ 10e8 records) table (table):

我有以下非常大的(~ 10e8记录)表(表):

+--------------------------------+
|      id      order       value |
+--------------------------------+
|      PK       int         int  |
|       1        1           1   |
|       2        2           5   |
|       3        2               |
|       4        2           0   |
+--------------------------------+

As you can see, the value column can contain only non-negative integers or null. Now, I need to write a query returning orders which don't have a value > 0 (i.e. order = 2 doesn't hold the condition, because there's the record with value = 5).

如您所见,值列只能包含非负整数或null。现在,我需要编写一个查询,返回没有值> 0的订单(也就是说,order = 2不包含条件,因为有值= 5的记录)。

The inverse query is simple:

逆向查询很简单:

SELECT order
FROM table
WHERE value > 0

The performance of the query is satisfactory for me.

这个查询的性能让我很满意。

But we can't quite write

但是我们不能写出来

SELECT order
FROM table
WHERE value = 0

because it's possible to have a record with the same order, but having value > 0. The only way I could find to write that query is that:

因为有可能有一个顺序相同的记录,但是值> 0。我能找到的编写该查询的唯一方法是:

SELECT order
FROM table
GROUP BY order
HAVING SUM(COALESCE(value, 0)) = 0

But the query is very slow because of computing sum of very large amount of data.

但由于计算大量数据的总和,查询速度非常慢。

Is there a way to write the query more efficiently?

是否有一种更有效地编写查询的方法?

1 个解决方案

#1


7  

It might be faster to use exists:

它可能会更快地使用:

select o.*
from orders o
where not exists (select 1
                  from table t
                  where t.order = o.order and t.value > 0
                 );

This assumes that you have a table with just the orders (called orders in the query). Also, it will work best with an index on table(order, value).

这假定您有一个只包含订单的表(在查询中称为订单)。而且,它最适合使用表上的索引(order, value)。

I also wonder if the following query would have acceptable performance with an index on table(order, value desc)

我还想知道,使用表上的索引(order, value desc),下面的查询是否具有可接受的性能

select t.*
from (select distinct on (order) t.*
      from table t
      order by order, value desc
     ) t
where value = 0;

The distinct on should use the index for the sorting, just taking the first row encountered. The outer where would then filter these, but the two scans would probably be pretty fast.

不同的on应该使用索引进行排序,只取遇到的第一行。外部的东西会过滤这些,但是两个扫描可能会很快。

#1


7  

It might be faster to use exists:

它可能会更快地使用:

select o.*
from orders o
where not exists (select 1
                  from table t
                  where t.order = o.order and t.value > 0
                 );

This assumes that you have a table with just the orders (called orders in the query). Also, it will work best with an index on table(order, value).

这假定您有一个只包含订单的表(在查询中称为订单)。而且,它最适合使用表上的索引(order, value)。

I also wonder if the following query would have acceptable performance with an index on table(order, value desc)

我还想知道,使用表上的索引(order, value desc),下面的查询是否具有可接受的性能

select t.*
from (select distinct on (order) t.*
      from table t
      order by order, value desc
     ) t
where value = 0;

The distinct on should use the index for the sorting, just taking the first row encountered. The outer where would then filter these, but the two scans would probably be pretty fast.

不同的on应该使用索引进行排序,只取遇到的第一行。外部的东西会过滤这些,但是两个扫描可能会很快。