如何更快地执行SQL 'NOT IN'查询?

时间:2022-03-10 03:55:20

I have a table (EMAIL) of email addresses:

我有一张电子邮件地址表:

EmailAddress
------------
jack@aol.com
jill@aol.com
tom@aol.com
bill@aol.lcom

and a table (BLACKLIST) of blacklisted email addresses:

及黑名单电子邮件地址表:

EmailAddress
------------
jack@aol.com
jill@aol.com

and I want to select those email addresses that are in the EMAIL table but NOT in the BLACKLIST table. I'm doing:

我想选择那些在电子邮件表中但不在黑名单表中的电子邮件地址。我在做:

SELECT EmailAddress
FROM EMAIL
WHERE EmailAddress NOT IN
   (
      SELECT EmailAddress
      FROM BLACKLIST
   )

but when the row counts get very high the performance is terrible.

但是当争吵越来越多的时候,表现就很糟糕了。

How can I better do this? (Assume generic SQL if possible. If not, assume T-SQL.)

我该怎么做呢?(如果可能的话,假设是通用SQL。如果不是,假设t - sql)。

2 个解决方案

#1


22  

You can use a left outer join, or a not exists clause.

可以使用左外部连接或不存在子句。

Left outer join:

左外连接:

select E.EmailAddress
  from EMAIL E left outer join BLACKLIST B on (E.EmailAddress = B.EmailAddress)
 where B.EmailAddress is null;

Not Exists:

不存在:

select E.EmailAddress
  from EMAIL E where not exists
         (select EmailAddress from BLACKLIST B where B.EmailAddress = E.EmailAddress)

Both are quite generic SQL solutions (don't depend on a specific DB engine). I would say that the latter is a little bit more performant (not by much though). But definitely more performant than the not in one.

两者都是非常通用的SQL解决方案(不依赖于特定的DB引擎)。我想说,后者的表现稍微好一点(虽然不是很多)。但是肯定比没有表现的更有效果。

As commenters stated, you can also try creating an index on BLACKLIST(EmailAddress), that should help speed up the execution of your query.

正如评论者所说,您还可以尝试在黑名单(电子邮件地址)上创建一个索引,这将有助于加快查询的执行。

#2


3  

NOT IN differs from NOT EXISTS if the blacklist allow null value as EmailAddress. If there is a single null value the result of the query will always return zero rows because NOT IN (null) is unknown / false for every value. The query plans therefore differs slighyly but I don't think there would be any serious performance impact.

如果黑名单允许空值作为电子邮件地址,则不存在。如果只有一个空值,查询的结果将始终返回零行,因为(null)中没有对每个值都是未知/错误的。因此,查询计划略有不同,但我认为不会有任何严重的性能影响。

A suggestion is to create a new table called VALIDEMAIL, add a trigger to BLACKLIST that removes addresses from VALIDEMAIL when rows are inserted and add to VALIDEMAIL when removed from BLACKLIST. Then replace EMAIL with a view that is a union of both VALIDEMAIL and BLACKLIST.

一个建议是创建一个名为VALIDEMAIL的新表,向黑名单添加一个触发器,该触发器在插入行时从VALIDEMAIL中删除地址,在从黑名单中删除时添加到VALIDEMAIL。然后用VALIDEMAIL和黑名单结合的视图替换电子邮件。

#1


22  

You can use a left outer join, or a not exists clause.

可以使用左外部连接或不存在子句。

Left outer join:

左外连接:

select E.EmailAddress
  from EMAIL E left outer join BLACKLIST B on (E.EmailAddress = B.EmailAddress)
 where B.EmailAddress is null;

Not Exists:

不存在:

select E.EmailAddress
  from EMAIL E where not exists
         (select EmailAddress from BLACKLIST B where B.EmailAddress = E.EmailAddress)

Both are quite generic SQL solutions (don't depend on a specific DB engine). I would say that the latter is a little bit more performant (not by much though). But definitely more performant than the not in one.

两者都是非常通用的SQL解决方案(不依赖于特定的DB引擎)。我想说,后者的表现稍微好一点(虽然不是很多)。但是肯定比没有表现的更有效果。

As commenters stated, you can also try creating an index on BLACKLIST(EmailAddress), that should help speed up the execution of your query.

正如评论者所说,您还可以尝试在黑名单(电子邮件地址)上创建一个索引,这将有助于加快查询的执行。

#2


3  

NOT IN differs from NOT EXISTS if the blacklist allow null value as EmailAddress. If there is a single null value the result of the query will always return zero rows because NOT IN (null) is unknown / false for every value. The query plans therefore differs slighyly but I don't think there would be any serious performance impact.

如果黑名单允许空值作为电子邮件地址,则不存在。如果只有一个空值,查询的结果将始终返回零行,因为(null)中没有对每个值都是未知/错误的。因此,查询计划略有不同,但我认为不会有任何严重的性能影响。

A suggestion is to create a new table called VALIDEMAIL, add a trigger to BLACKLIST that removes addresses from VALIDEMAIL when rows are inserted and add to VALIDEMAIL when removed from BLACKLIST. Then replace EMAIL with a view that is a union of both VALIDEMAIL and BLACKLIST.

一个建议是创建一个名为VALIDEMAIL的新表,向黑名单添加一个触发器,该触发器在插入行时从VALIDEMAIL中删除地址,在从黑名单中删除时添加到VALIDEMAIL。然后用VALIDEMAIL和黑名单结合的视图替换电子邮件。