当表列相同时,EXCEPT的执行速度比JOIN快

时间:2021-10-19 04:09:53

To find all the changes between two databases, I am left joining the tables on the pk and using a date_modified field to choose the latest record. Will using EXCEPT increase performance since the tables have the same schema. I would like to rewrite it with an EXCEPT, but I'm not sure if the implementation for EXCEPT would out perform a JOIN in every case. Hopefully someone has a more technical explanation for when to use EXCEPT.

要查找两个数据库之间的所有更改,我将继续加入pk上的表并使用date_modified字段选择最新记录。由于表具有相同的模式,因此使用EXCEPT会提高性能。我想用EXCEPT重写它,但我不确定EXCEPT的实现是否会在每种情况下执行JOIN。希望有人对何时使用EXCEPT有更多的技术解释。

2 个解决方案

#1


13  

There is no way anyone can tell you that EXCEPT will always or never out-perform an equivalent OUTER JOIN. The optimizer will choose an appropriate execution plan regardless of how you write your intent.

任何人都无法告诉你,EXCEPT总是或永远不会超过等效的OUTER JOIN。无论您如何编写意图,优化程序都将选择适当的执行计划。

That said, here is my guideline:

那就是说,这是我的指导方针:


Use EXCEPT when at least one of the following is true:

如果满足以下条件之一,请使用EXCEPT:

  1. The query is more readable (this will almost always be true).
  2. 查询更具可读性(这几乎总是如此)。
  3. Performance is improved.
  4. 性能得到改善。

And BOTH of the following are true:

并且以下两者都是真的:

  1. The query produces semantically identical results, and you can demonstrate this through sufficient regression testing, including all edge cases.
  2. 查询产生语义相同的结果,您可以通过充分的回归测试来证明这一点,包括所有边缘情况。
  3. Performance is not degraded (again, in all edge cases, as well as environmental changes such as clearing buffer pool, updating statistics, clearing plan cache, and restarting the service).
  4. 性能不会降低(同样,在所有边缘情况下,以及环境更改,如清除缓冲池,更新统计信息,清除计划缓存和重新启动服务)。

It is important to note that it can be a challenge to write an equivalent EXCEPT query as the JOIN becomes more complex and/or you are relying on duplicates in part of the columns but not others. Writing a NOT EXISTS equivalent, while slightly less readable than EXCEPT should be far more trivial to accomplish - and will often lead to a better plan (but note that I would never say ALWAYS or NEVER, except in the way I just did).

值得注意的是,编写等效的EXCEPT查询可能是一个挑战,因为JOIN变得更加复杂和/或您依赖于部分列中的重复项而不是其他列。写一个NOT EXISTS等价物,虽然比EXCEPT稍微不那么易读但应该更容易实现 - 并且通常会导致一个更好的计划(但请注意,我永远不会总是说,或者永远不会说,除了我刚刚做的方式)。

In this blog post I demonstrate at least one case where EXCEPT is outperformed by both a properly constructed LEFT OUTER JOIN and of course by an equivalent NOT EXISTS variation.

在这篇博文中,我演示了至少一个案例,其中EXCEPT的表现优于正确构造的LEFT OUTER JOIN,当然还有等效的NOT EXISTS变体。

#2


2  

In the following example, the LEFT JOIN is faster than EXCEPT by 70% (PostgreSQL 9.4.3)

在以下示例中,LEFT JOIN比EXCEPT快70%(PostgreSQL 9.4.3)

Example:

例:

There are three tables. suppliers, parts, shipments. We need to get all parts not supplied by any supplier in London.

有三张桌子。供应商,零件,货物。我们需要获得伦敦任何供应商提供的所有零件。

Database(has indexes on all involved columns):

数据库(在所有涉及的列上都有索引):

CREATE TABLE suppliers (
  id     bigint    primary key,
  city   character varying NOT NULL
);

CREATE TABLE parts (
  id     bigint    primary key,
  name   character varying NOT NULL,
);

CREATE TABLE shipments (
  id          bigint primary key,
  supplier_id bigint NOT NULL,
  part_id     bigint NOT NULL
);

Records count:

记录数:

db=# SELECT COUNT(*) FROM suppliers;
  count
---------
 1281280
(1 row)

db=# SELECT COUNT(*) FROM parts;
  count
---------
 1280000
(1 row)

db=# SELECT COUNT(*) FROM shipments;
  count
---------
 1760161
(1 row)

Query using EXCEPT.

使用EXCEPT查询。

SELECT parts.*
  FROM parts

EXCEPT

SELECT parts.*
  FROM parts
  LEFT JOIN shipments
    ON (parts.id = shipments.part_id)
  LEFT JOIN suppliers
    ON (shipments.supplier_id = suppliers.id)
 WHERE suppliers.city = 'London'
;

-- Execution time: 3327.728 ms

Query using LEFT JOIN with table, returned by subquery.

使用LEFT JOIN查询表,由子查询返回。

SELECT parts.*
  FROM parts
  LEFT JOIN (
    SELECT parts.id
      FROM parts
      LEFT JOIN shipments
        ON (parts.id = shipments.part_id)
      LEFT JOIN suppliers
        ON (shipments.supplier_id = suppliers.id)
     WHERE suppliers.city = 'London'
  ) AS subquery_tbl
  ON (parts.id = subquery_tbl.id)
WHERE subquery_tbl.id IS NULL
;

-- Execution time: 1136.393 ms

#1


13  

There is no way anyone can tell you that EXCEPT will always or never out-perform an equivalent OUTER JOIN. The optimizer will choose an appropriate execution plan regardless of how you write your intent.

任何人都无法告诉你,EXCEPT总是或永远不会超过等效的OUTER JOIN。无论您如何编写意图,优化程序都将选择适当的执行计划。

That said, here is my guideline:

那就是说,这是我的指导方针:


Use EXCEPT when at least one of the following is true:

如果满足以下条件之一,请使用EXCEPT:

  1. The query is more readable (this will almost always be true).
  2. 查询更具可读性(这几乎总是如此)。
  3. Performance is improved.
  4. 性能得到改善。

And BOTH of the following are true:

并且以下两者都是真的:

  1. The query produces semantically identical results, and you can demonstrate this through sufficient regression testing, including all edge cases.
  2. 查询产生语义相同的结果,您可以通过充分的回归测试来证明这一点,包括所有边缘情况。
  3. Performance is not degraded (again, in all edge cases, as well as environmental changes such as clearing buffer pool, updating statistics, clearing plan cache, and restarting the service).
  4. 性能不会降低(同样,在所有边缘情况下,以及环境更改,如清除缓冲池,更新统计信息,清除计划缓存和重新启动服务)。

It is important to note that it can be a challenge to write an equivalent EXCEPT query as the JOIN becomes more complex and/or you are relying on duplicates in part of the columns but not others. Writing a NOT EXISTS equivalent, while slightly less readable than EXCEPT should be far more trivial to accomplish - and will often lead to a better plan (but note that I would never say ALWAYS or NEVER, except in the way I just did).

值得注意的是,编写等效的EXCEPT查询可能是一个挑战,因为JOIN变得更加复杂和/或您依赖于部分列中的重复项而不是其他列。写一个NOT EXISTS等价物,虽然比EXCEPT稍微不那么易读但应该更容易实现 - 并且通常会导致一个更好的计划(但请注意,我永远不会总是说,或者永远不会说,除了我刚刚做的方式)。

In this blog post I demonstrate at least one case where EXCEPT is outperformed by both a properly constructed LEFT OUTER JOIN and of course by an equivalent NOT EXISTS variation.

在这篇博文中,我演示了至少一个案例,其中EXCEPT的表现优于正确构造的LEFT OUTER JOIN,当然还有等效的NOT EXISTS变体。

#2


2  

In the following example, the LEFT JOIN is faster than EXCEPT by 70% (PostgreSQL 9.4.3)

在以下示例中,LEFT JOIN比EXCEPT快70%(PostgreSQL 9.4.3)

Example:

例:

There are three tables. suppliers, parts, shipments. We need to get all parts not supplied by any supplier in London.

有三张桌子。供应商,零件,货物。我们需要获得伦敦任何供应商提供的所有零件。

Database(has indexes on all involved columns):

数据库(在所有涉及的列上都有索引):

CREATE TABLE suppliers (
  id     bigint    primary key,
  city   character varying NOT NULL
);

CREATE TABLE parts (
  id     bigint    primary key,
  name   character varying NOT NULL,
);

CREATE TABLE shipments (
  id          bigint primary key,
  supplier_id bigint NOT NULL,
  part_id     bigint NOT NULL
);

Records count:

记录数:

db=# SELECT COUNT(*) FROM suppliers;
  count
---------
 1281280
(1 row)

db=# SELECT COUNT(*) FROM parts;
  count
---------
 1280000
(1 row)

db=# SELECT COUNT(*) FROM shipments;
  count
---------
 1760161
(1 row)

Query using EXCEPT.

使用EXCEPT查询。

SELECT parts.*
  FROM parts

EXCEPT

SELECT parts.*
  FROM parts
  LEFT JOIN shipments
    ON (parts.id = shipments.part_id)
  LEFT JOIN suppliers
    ON (shipments.supplier_id = suppliers.id)
 WHERE suppliers.city = 'London'
;

-- Execution time: 3327.728 ms

Query using LEFT JOIN with table, returned by subquery.

使用LEFT JOIN查询表,由子查询返回。

SELECT parts.*
  FROM parts
  LEFT JOIN (
    SELECT parts.id
      FROM parts
      LEFT JOIN shipments
        ON (parts.id = shipments.part_id)
      LEFT JOIN suppliers
        ON (shipments.supplier_id = suppliers.id)
     WHERE suppliers.city = 'London'
  ) AS subquery_tbl
  ON (parts.id = subquery_tbl.id)
WHERE subquery_tbl.id IS NULL
;

-- Execution time: 1136.393 ms