左外连接的SQL性能vs不存在

时间:2022-10-18 23:57:36

If I want to find a set of entries in table A but not in table B, I can use either LEFT OUTER JOIN or NOT EXISTS. I've heard SQL Server is geared towards ANSI and in some case LEFT OUTER JOINs are far more efficient than NOT EXISTS. Will ANSI JOIN perform better in this case? and are join operators more efficient than NOT EXISTS in general on SQL Server?

如果我想在表a而不是表B中找到一组条目,我可以使用LEFT OUTER JOIN或者不存在。我听说SQL Server是面向ANSI的,在某些情况下,左外连接的效率比不存在的要高得多。在这种情况下,ANSI JOIN的表现会更好吗?在SQL Server上,连接操作符是否比一般情况下不存在的更有效?

6 个解决方案

#1


52  

Joe's link is a good starting point. Quassnoi covers this too.

乔的链接是一个很好的起点。Quassnoi覆盖这也。

In general, if your fields are properly indexed, OR if you expect to filter out more records (i.e. have a lots of rows EXIST in the subquery) NOT EXISTS will perform better.

一般来说,如果字段被正确地索引,或者您希望过滤更多的记录(例如,在子查询中有很多行),那么不存在的字段将表现得更好。

EXISTS and NOT EXISTS both short circuit - as soon as a record matches the criteria it's either included or filtered out and the optimizer moves on to the next record.

存在和不存在都是短路——只要一条记录与它包含的或过滤的条件相匹配,优化器就会进入下一条记录。

LEFT JOIN will join ALL RECORDS regardless of whether they match or not, then filter out all non-matching records. If your tables are large and/or you have multiple JOIN criteria, this can be very very resource intensive.

左连接将连接所有记录,不管它们是否匹配,然后过滤掉所有不匹配的记录。如果您的表很大,并且/或您有多个连接标准,那么这将非常耗费资源。

I normally try to use NOT EXISTS and EXISTS where possible. For SQL Server, IN and NOT IN are semantically equivalent and may be easier to write. These are among the only operators you will find in SQL Server that are guaranteed to short circuit.

我通常尝试使用不存在和存在的可能。对于SQL Server, IN和NOT IN在语义上是等价的,可能更容易编写。在SQL Server中,这些操作符是唯一可以保证短时间运行的操作符。

#2


6  

The best discussion I've read on this topic for SQL Server is here.

我在这里读到关于SQL Server这个主题的最好的讨论。

#3


1  

Personally, I think that this one gets a big old, "It Depends". I've seen instances where each method has outperformed the other.

就我个人而言,我认为这个已经过时了,“看情况而定”。我曾见过每个方法都优于另一个的例子。

Your best bet is to test both and see which performs better. If it's a situation where the tables will always be small and performance isn't as crucial then I'd just go with whichever is the clearest to you (that's usually NOT EXISTS for most people) and move on.

最好的办法是测试两者,看看哪个表现更好。如果在这种情况下,表总是很小,性能也不那么重要,那么我就选择对你来说最清楚的(对大多数人来说通常不存在),然后继续前进。

#4


0  

This blog entry gives examples of various ways ( NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT and NOT EXISTS ) to achieve same results and proves that Not Exists ( Left Anti Semi Join) is the best options in both cold cache and warm cache scenarios.

这个博客条目提供了各种各样的例子(不包括,外部应用,左外连接,除了和不存在)来获得相同的结果,并证明不存在(左反半连接)是冷缓存和热缓存场景中最好的选项。

#5


0  

I've been wondering how we can use the index on the table we are deleting from in these cases that the OP describes.

我一直在想,如何在我们正在删除的表上使用OP描述的索引。

Say we have:

我们说:

 table EMPLOYEE (emp_id int, name varchar) 
and
 table EMPLOYEE_LOCATION (emp_id int, loc_id int)

In my real world example my tables are much wider and contain 1million + rows, I have simplified the schema for example purpose.

在我的真实世界示例中,我的表要宽得多,并且包含100多万行,我简化了模式。

If I want to delete the rows from EMPLOYEE_LOCATION that don't have corresponding emp_id's in EMPLOYEE I can obviously use the Left outer technique or the NOT IN but I was wondering...

如果我想从EMPLOYEE_LOCATION中删除那些在EMPLOYEE中没有对应的emp_id的行,我显然可以使用左外置技术或者NOT in,但是我想…

If both tables have indexes with leading column of emp_id then would it be worthwhile trying to use them?

如果这两个表都有带有emp_id的主要列的索引,那么尝试使用它们是否值得?

Perhaps I could pull the emp_id's from EMPLOYEE, the emp_id's from EMPLOYEE_LOCATION into a temp table and get the emp_id's from the temp tables that I want to delete.

也许我可以从EMPLOYEE中提取emp_id,从EMPLOYEE_LOCATION中提取emp_id,然后从我想要删除的temp表中获取emp_id。

I could then cycle round these emp_id's and actually use the index like so:

然后我可以围绕这些emp_id循环使用如下索引:

loop for each emp_id X to delete -- (this would be a cursor)
 DELETE EMPLOYEE_LOCATION WHERE emp_id = X

I know there is overhead with the cursor but in my real example I am dealing with huge tables so I think explicitly using the index is desirable.

我知道游标存在开销,但在我的实际示例中,我正在处理大型表,因此我认为显式地使用索引是可取的。

#6


0  

Answer on dba.stackexchange

回答dba.stackexchange

An exception I've noticed to the NOT EXISTS being superior (however marginally) to LEFT JOIN ... WHERE IS NULL is when using Linked Servers.

我注意到一个不存在的例外,即左连接优于左连接(尽管不是很好)……当使用链接服务器时,为NULL。

From examining the execution plans, it appears that NOT EXISTS operator gets executed in a nested loop fashion. Whereby it is executed on a per row basis (which I suppose makes sense).

通过检查执行计划,似乎不存在操作符以嵌套循环方式执行。它以行为单位执行(我想这是有道理的)。

Example execution plan demonstrating this behaviour: 左外连接的SQL性能vs不存在

示例执行计划演示了这种行为:

#1


52  

Joe's link is a good starting point. Quassnoi covers this too.

乔的链接是一个很好的起点。Quassnoi覆盖这也。

In general, if your fields are properly indexed, OR if you expect to filter out more records (i.e. have a lots of rows EXIST in the subquery) NOT EXISTS will perform better.

一般来说,如果字段被正确地索引,或者您希望过滤更多的记录(例如,在子查询中有很多行),那么不存在的字段将表现得更好。

EXISTS and NOT EXISTS both short circuit - as soon as a record matches the criteria it's either included or filtered out and the optimizer moves on to the next record.

存在和不存在都是短路——只要一条记录与它包含的或过滤的条件相匹配,优化器就会进入下一条记录。

LEFT JOIN will join ALL RECORDS regardless of whether they match or not, then filter out all non-matching records. If your tables are large and/or you have multiple JOIN criteria, this can be very very resource intensive.

左连接将连接所有记录,不管它们是否匹配,然后过滤掉所有不匹配的记录。如果您的表很大,并且/或您有多个连接标准,那么这将非常耗费资源。

I normally try to use NOT EXISTS and EXISTS where possible. For SQL Server, IN and NOT IN are semantically equivalent and may be easier to write. These are among the only operators you will find in SQL Server that are guaranteed to short circuit.

我通常尝试使用不存在和存在的可能。对于SQL Server, IN和NOT IN在语义上是等价的,可能更容易编写。在SQL Server中,这些操作符是唯一可以保证短时间运行的操作符。

#2


6  

The best discussion I've read on this topic for SQL Server is here.

我在这里读到关于SQL Server这个主题的最好的讨论。

#3


1  

Personally, I think that this one gets a big old, "It Depends". I've seen instances where each method has outperformed the other.

就我个人而言,我认为这个已经过时了,“看情况而定”。我曾见过每个方法都优于另一个的例子。

Your best bet is to test both and see which performs better. If it's a situation where the tables will always be small and performance isn't as crucial then I'd just go with whichever is the clearest to you (that's usually NOT EXISTS for most people) and move on.

最好的办法是测试两者,看看哪个表现更好。如果在这种情况下,表总是很小,性能也不那么重要,那么我就选择对你来说最清楚的(对大多数人来说通常不存在),然后继续前进。

#4


0  

This blog entry gives examples of various ways ( NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT and NOT EXISTS ) to achieve same results and proves that Not Exists ( Left Anti Semi Join) is the best options in both cold cache and warm cache scenarios.

这个博客条目提供了各种各样的例子(不包括,外部应用,左外连接,除了和不存在)来获得相同的结果,并证明不存在(左反半连接)是冷缓存和热缓存场景中最好的选项。

#5


0  

I've been wondering how we can use the index on the table we are deleting from in these cases that the OP describes.

我一直在想,如何在我们正在删除的表上使用OP描述的索引。

Say we have:

我们说:

 table EMPLOYEE (emp_id int, name varchar) 
and
 table EMPLOYEE_LOCATION (emp_id int, loc_id int)

In my real world example my tables are much wider and contain 1million + rows, I have simplified the schema for example purpose.

在我的真实世界示例中,我的表要宽得多,并且包含100多万行,我简化了模式。

If I want to delete the rows from EMPLOYEE_LOCATION that don't have corresponding emp_id's in EMPLOYEE I can obviously use the Left outer technique or the NOT IN but I was wondering...

如果我想从EMPLOYEE_LOCATION中删除那些在EMPLOYEE中没有对应的emp_id的行,我显然可以使用左外置技术或者NOT in,但是我想…

If both tables have indexes with leading column of emp_id then would it be worthwhile trying to use them?

如果这两个表都有带有emp_id的主要列的索引,那么尝试使用它们是否值得?

Perhaps I could pull the emp_id's from EMPLOYEE, the emp_id's from EMPLOYEE_LOCATION into a temp table and get the emp_id's from the temp tables that I want to delete.

也许我可以从EMPLOYEE中提取emp_id,从EMPLOYEE_LOCATION中提取emp_id,然后从我想要删除的temp表中获取emp_id。

I could then cycle round these emp_id's and actually use the index like so:

然后我可以围绕这些emp_id循环使用如下索引:

loop for each emp_id X to delete -- (this would be a cursor)
 DELETE EMPLOYEE_LOCATION WHERE emp_id = X

I know there is overhead with the cursor but in my real example I am dealing with huge tables so I think explicitly using the index is desirable.

我知道游标存在开销,但在我的实际示例中,我正在处理大型表,因此我认为显式地使用索引是可取的。

#6


0  

Answer on dba.stackexchange

回答dba.stackexchange

An exception I've noticed to the NOT EXISTS being superior (however marginally) to LEFT JOIN ... WHERE IS NULL is when using Linked Servers.

我注意到一个不存在的例外,即左连接优于左连接(尽管不是很好)……当使用链接服务器时,为NULL。

From examining the execution plans, it appears that NOT EXISTS operator gets executed in a nested loop fashion. Whereby it is executed on a per row basis (which I suppose makes sense).

通过检查执行计划,似乎不存在操作符以嵌套循环方式执行。它以行为单位执行(我想这是有道理的)。

Example execution plan demonstrating this behaviour: 左外连接的SQL性能vs不存在

示例执行计划演示了这种行为: