I am working on optimizing one of the SQL Job.
我正在优化其中一个SQL作业。
Here I have few places where we have used <> operator. THe same query can be replaced using NOT EXISTS operator. I am just wondering which is better way of doing it.
在这里,我有几个地方使用<>运算符。可以使用NOT EXISTS运算符替换相同的查询。我只是想知道哪种方式更好。
Sample Query
示例查询
If(@Email <> (select Email from Members WHERE MemberId = @MemberId))
--Do Something.
--Same thing can be written as
If(NOT EXISTS (SELECT Email FROM Members WHERE MemberId = @MemberId AND Email = @EmailId))
Which is better?
哪个更好?
I went through execution plans for both (coundn't attach as all image hosting is blocked in office). I can see <> operator has Assert and Stream Aggregate operations extra than NOT EXISTS. Not sure if they are good or bad or no impact.
我完成了两个执行计划(因为所有图像托管都在办公室被阻止,所以不能附加)。我可以看到<>运算符的Assert和Stream Aggregate操作比NOT EXISTS更多。不确定它们是好是坏还是没有影响。
2 个解决方案
#1
3
NOT EXISTS is generally better (although in your case if the table is small and/or indexed properly it may not be the case).
NOT EXISTS通常更好(尽管在您的情况下,如果表格很小和/或索引正确,可能不是这种情况)。
Almost always you should use EXISTS/NOT EXISTS for queries in which you're trying to find out whether a certain record exists (or does not exist)!
几乎总是你应该使用EXISTS / NOT EXISTS进行查询,在这些查询中你试图找出某条记录是否存在(或者不存在)!
The reasoning behind is that EXISTS (and NOT EXISTS) queries will stop as soon as the condition is fulfilled (or in the case of NOT EXISTS proven false) as opposed to using sub-queries which will continue to scan records through the whole table.
其背后的原因是EXISTS(和NOT EXISTS)查询将在条件满足后立即停止(或者在NOT EXISTS被证明为false的情况下),而不是使用将继续扫描整个表中的记录的子查询。
#2
1
The difference between your two statements lies in the question "how much is done in pure SQL, and how much is done by the engine running the procedures/scripts etc. (I'd like to say what is done by the database and what's outside of the database, but in a stored proc both parts are handled by the database.)
你的两个陈述之间的区别在于“在纯SQL中做了多少,以及运行程序/脚本的引擎做了多少等等。(我想说的是数据库做了什么,外面是什么)数据库,但在存储过程中,这两个部分都由数据库处理。)
In your example, the first statement uses SQL to fetch one member's Email. The Table access uses what I assume a primary key and its associated unique index, so it should be really fast even for a large table. The EMail is passed to outside of SQL, and the comparison is then done in the script.
在您的示例中,第一个语句使用SQL来获取一个成员的电子邮件。 Table访问使用我假设的主键及其关联的唯一索引,因此即使对于大型表也应该非常快。 EMail传递到SQL之外,然后在脚本中完成比较。
In the second statment, pretty much the same happens. The MemberID is again used to access the unique record, then the email is compared and a boolean result is passed back to outside of SQL.
在第二个声明中,几乎相同。 MemberID再次用于访问唯一记录,然后比较电子邮件并将布尔结果传递回SQL外部。
Therefore, the performance for your example should be pretty similar.
因此,您的示例的性能应该非常相似。
There will be different considerations (such as MikyD has noted) when more than one value has to be transferred to outside of SQL and a more complicated comparison has to be done (e.g. selecting a large number of emails using SQL and then doing the comparison in the script with something like Email IN (Select ..)
). Then it would be usually preferable to do as much work as possible in SQL, transfer the least amount of data between SQL and non-SQL and let the database figure out the most effective way to get at the data.
当不止一个值必须转移到SQL之外时,会有不同的考虑(例如MikyD已经注意到),并且必须进行更复杂的比较(例如,使用SQL选择大量电子邮件,然后进行比较)脚本与电子邮件IN(选择..))。然后,通常最好在SQL中尽可能多地完成工作,在SQL和非SQL之间传输最少量的数据,让数据库找出获取数据的最有效方法。
#1
3
NOT EXISTS is generally better (although in your case if the table is small and/or indexed properly it may not be the case).
NOT EXISTS通常更好(尽管在您的情况下,如果表格很小和/或索引正确,可能不是这种情况)。
Almost always you should use EXISTS/NOT EXISTS for queries in which you're trying to find out whether a certain record exists (or does not exist)!
几乎总是你应该使用EXISTS / NOT EXISTS进行查询,在这些查询中你试图找出某条记录是否存在(或者不存在)!
The reasoning behind is that EXISTS (and NOT EXISTS) queries will stop as soon as the condition is fulfilled (or in the case of NOT EXISTS proven false) as opposed to using sub-queries which will continue to scan records through the whole table.
其背后的原因是EXISTS(和NOT EXISTS)查询将在条件满足后立即停止(或者在NOT EXISTS被证明为false的情况下),而不是使用将继续扫描整个表中的记录的子查询。
#2
1
The difference between your two statements lies in the question "how much is done in pure SQL, and how much is done by the engine running the procedures/scripts etc. (I'd like to say what is done by the database and what's outside of the database, but in a stored proc both parts are handled by the database.)
你的两个陈述之间的区别在于“在纯SQL中做了多少,以及运行程序/脚本的引擎做了多少等等。(我想说的是数据库做了什么,外面是什么)数据库,但在存储过程中,这两个部分都由数据库处理。)
In your example, the first statement uses SQL to fetch one member's Email. The Table access uses what I assume a primary key and its associated unique index, so it should be really fast even for a large table. The EMail is passed to outside of SQL, and the comparison is then done in the script.
在您的示例中,第一个语句使用SQL来获取一个成员的电子邮件。 Table访问使用我假设的主键及其关联的唯一索引,因此即使对于大型表也应该非常快。 EMail传递到SQL之外,然后在脚本中完成比较。
In the second statment, pretty much the same happens. The MemberID is again used to access the unique record, then the email is compared and a boolean result is passed back to outside of SQL.
在第二个声明中,几乎相同。 MemberID再次用于访问唯一记录,然后比较电子邮件并将布尔结果传递回SQL外部。
Therefore, the performance for your example should be pretty similar.
因此,您的示例的性能应该非常相似。
There will be different considerations (such as MikyD has noted) when more than one value has to be transferred to outside of SQL and a more complicated comparison has to be done (e.g. selecting a large number of emails using SQL and then doing the comparison in the script with something like Email IN (Select ..)
). Then it would be usually preferable to do as much work as possible in SQL, transfer the least amount of data between SQL and non-SQL and let the database figure out the most effective way to get at the data.
当不止一个值必须转移到SQL之外时,会有不同的考虑(例如MikyD已经注意到),并且必须进行更复杂的比较(例如,使用SQL选择大量电子邮件,然后进行比较)脚本与电子邮件IN(选择..))。然后,通常最好在SQL中尽可能多地完成工作,在SQL和非SQL之间传输最少量的数据,让数据库找出获取数据的最有效方法。