To find all the changes between two databases, I am left joining the tables on the pk and using a date_modified field to choose the latest record. Will using EXCEPT
increase performance since the tables have the same schema. I would like to rewrite it with an EXCEPT
, but I'm not sure if the implementation for EXCEPT
would out perform a JOIN
in every case. Hopefully someone has a more technical explanation for when to use EXCEPT
.
要查找两个数据库之间的所有更改,我将继续加入pk上的表并使用date_modified字段选择最新记录。由于表具有相同的模式,因此使用EXCEPT会提高性能。我想用EXCEPT重写它,但我不确定EXCEPT的实现是否会在每种情况下执行JOIN。希望有人对何时使用EXCEPT有更多的技术解释。
2 个解决方案
#1
13
There is no way anyone can tell you that EXCEPT
will always or never out-perform an equivalent OUTER JOIN
. The optimizer will choose an appropriate execution plan regardless of how you write your intent.
任何人都无法告诉你,EXCEPT总是或永远不会超过等效的OUTER JOIN。无论您如何编写意图,优化程序都将选择适当的执行计划。
That said, here is my guideline:
那就是说,这是我的指导方针:
Use EXCEPT
when at least one of the following is true:
如果满足以下条件之一,请使用EXCEPT:
- The query is more readable (this will almost always be true).
- 查询更具可读性(这几乎总是如此)。
- Performance is improved.
- 性能得到改善。
And BOTH of the following are true:
并且以下两者都是真的:
- The query produces semantically identical results, and you can demonstrate this through sufficient regression testing, including all edge cases.
- 查询产生语义相同的结果,您可以通过充分的回归测试来证明这一点,包括所有边缘情况。
- Performance is not degraded (again, in all edge cases, as well as environmental changes such as clearing buffer pool, updating statistics, clearing plan cache, and restarting the service).
- 性能不会降低(同样,在所有边缘情况下,以及环境更改,如清除缓冲池,更新统计信息,清除计划缓存和重新启动服务)。
It is important to note that it can be a challenge to write an equivalent EXCEPT
query as the JOIN
becomes more complex and/or you are relying on duplicates in part of the columns but not others. Writing a NOT EXISTS
equivalent, while slightly less readable than EXCEPT
should be far more trivial to accomplish - and will often lead to a better plan (but note that I would never say ALWAYS
or NEVER
, except in the way I just did).
值得注意的是,编写等效的EXCEPT查询可能是一个挑战,因为JOIN变得更加复杂和/或您依赖于部分列中的重复项而不是其他列。写一个NOT EXISTS等价物,虽然比EXCEPT稍微不那么易读但应该更容易实现 - 并且通常会导致一个更好的计划(但请注意,我永远不会总是说,或者永远不会说,除了我刚刚做的方式)。
在这篇博文中,我演示了至少一个案例,其中EXCEPT的表现优于正确构造的LEFT OUTER JOIN,当然还有等效的NOT EXISTS变体。
#2
2
In the following example, the LEFT JOIN
is faster than EXCEPT
by 70% (PostgreSQL 9.4.3)
在以下示例中,LEFT JOIN比EXCEPT快70%(PostgreSQL 9.4.3)
Example:
例:
There are three tables. suppliers
, parts
, shipments
. We need to get all parts not supplied by any supplier in London.
有三张桌子。供应商,零件,货物。我们需要获得伦敦任何供应商提供的所有零件。
Database(has indexes on all involved columns):
数据库(在所有涉及的列上都有索引):
CREATE TABLE suppliers (
id bigint primary key,
city character varying NOT NULL
);
CREATE TABLE parts (
id bigint primary key,
name character varying NOT NULL,
);
CREATE TABLE shipments (
id bigint primary key,
supplier_id bigint NOT NULL,
part_id bigint NOT NULL
);
Records count:
记录数:
db=# SELECT COUNT(*) FROM suppliers;
count
---------
1281280
(1 row)
db=# SELECT COUNT(*) FROM parts;
count
---------
1280000
(1 row)
db=# SELECT COUNT(*) FROM shipments;
count
---------
1760161
(1 row)
Query using EXCEPT
.
使用EXCEPT查询。
SELECT parts.*
FROM parts
EXCEPT
SELECT parts.*
FROM parts
LEFT JOIN shipments
ON (parts.id = shipments.part_id)
LEFT JOIN suppliers
ON (shipments.supplier_id = suppliers.id)
WHERE suppliers.city = 'London'
;
-- Execution time: 3327.728 ms
Query using LEFT JOIN
with table, returned by subquery.
使用LEFT JOIN查询表,由子查询返回。
SELECT parts.*
FROM parts
LEFT JOIN (
SELECT parts.id
FROM parts
LEFT JOIN shipments
ON (parts.id = shipments.part_id)
LEFT JOIN suppliers
ON (shipments.supplier_id = suppliers.id)
WHERE suppliers.city = 'London'
) AS subquery_tbl
ON (parts.id = subquery_tbl.id)
WHERE subquery_tbl.id IS NULL
;
-- Execution time: 1136.393 ms
#1
13
There is no way anyone can tell you that EXCEPT
will always or never out-perform an equivalent OUTER JOIN
. The optimizer will choose an appropriate execution plan regardless of how you write your intent.
任何人都无法告诉你,EXCEPT总是或永远不会超过等效的OUTER JOIN。无论您如何编写意图,优化程序都将选择适当的执行计划。
That said, here is my guideline:
那就是说,这是我的指导方针:
Use EXCEPT
when at least one of the following is true:
如果满足以下条件之一,请使用EXCEPT:
- The query is more readable (this will almost always be true).
- 查询更具可读性(这几乎总是如此)。
- Performance is improved.
- 性能得到改善。
And BOTH of the following are true:
并且以下两者都是真的:
- The query produces semantically identical results, and you can demonstrate this through sufficient regression testing, including all edge cases.
- 查询产生语义相同的结果,您可以通过充分的回归测试来证明这一点,包括所有边缘情况。
- Performance is not degraded (again, in all edge cases, as well as environmental changes such as clearing buffer pool, updating statistics, clearing plan cache, and restarting the service).
- 性能不会降低(同样,在所有边缘情况下,以及环境更改,如清除缓冲池,更新统计信息,清除计划缓存和重新启动服务)。
It is important to note that it can be a challenge to write an equivalent EXCEPT
query as the JOIN
becomes more complex and/or you are relying on duplicates in part of the columns but not others. Writing a NOT EXISTS
equivalent, while slightly less readable than EXCEPT
should be far more trivial to accomplish - and will often lead to a better plan (but note that I would never say ALWAYS
or NEVER
, except in the way I just did).
值得注意的是,编写等效的EXCEPT查询可能是一个挑战,因为JOIN变得更加复杂和/或您依赖于部分列中的重复项而不是其他列。写一个NOT EXISTS等价物,虽然比EXCEPT稍微不那么易读但应该更容易实现 - 并且通常会导致一个更好的计划(但请注意,我永远不会总是说,或者永远不会说,除了我刚刚做的方式)。
在这篇博文中,我演示了至少一个案例,其中EXCEPT的表现优于正确构造的LEFT OUTER JOIN,当然还有等效的NOT EXISTS变体。
#2
2
In the following example, the LEFT JOIN
is faster than EXCEPT
by 70% (PostgreSQL 9.4.3)
在以下示例中,LEFT JOIN比EXCEPT快70%(PostgreSQL 9.4.3)
Example:
例:
There are three tables. suppliers
, parts
, shipments
. We need to get all parts not supplied by any supplier in London.
有三张桌子。供应商,零件,货物。我们需要获得伦敦任何供应商提供的所有零件。
Database(has indexes on all involved columns):
数据库(在所有涉及的列上都有索引):
CREATE TABLE suppliers (
id bigint primary key,
city character varying NOT NULL
);
CREATE TABLE parts (
id bigint primary key,
name character varying NOT NULL,
);
CREATE TABLE shipments (
id bigint primary key,
supplier_id bigint NOT NULL,
part_id bigint NOT NULL
);
Records count:
记录数:
db=# SELECT COUNT(*) FROM suppliers;
count
---------
1281280
(1 row)
db=# SELECT COUNT(*) FROM parts;
count
---------
1280000
(1 row)
db=# SELECT COUNT(*) FROM shipments;
count
---------
1760161
(1 row)
Query using EXCEPT
.
使用EXCEPT查询。
SELECT parts.*
FROM parts
EXCEPT
SELECT parts.*
FROM parts
LEFT JOIN shipments
ON (parts.id = shipments.part_id)
LEFT JOIN suppliers
ON (shipments.supplier_id = suppliers.id)
WHERE suppliers.city = 'London'
;
-- Execution time: 3327.728 ms
Query using LEFT JOIN
with table, returned by subquery.
使用LEFT JOIN查询表,由子查询返回。
SELECT parts.*
FROM parts
LEFT JOIN (
SELECT parts.id
FROM parts
LEFT JOIN shipments
ON (parts.id = shipments.part_id)
LEFT JOIN suppliers
ON (shipments.supplier_id = suppliers.id)
WHERE suppliers.city = 'London'
) AS subquery_tbl
ON (parts.id = subquery_tbl.id)
WHERE subquery_tbl.id IS NULL
;
-- Execution time: 1136.393 ms