I have a table (EMAIL) of email addresses:
我有一张电子邮件地址表:
EmailAddress
------------
jack@aol.com
jill@aol.com
tom@aol.com
bill@aol.lcom
and a table (BLACKLIST) of blacklisted email addresses:
及黑名单电子邮件地址表:
EmailAddress
------------
jack@aol.com
jill@aol.com
and I want to select those email addresses that are in the EMAIL table but NOT in the BLACKLIST table. I'm doing:
我想选择那些在电子邮件表中但不在黑名单表中的电子邮件地址。我在做:
SELECT EmailAddress
FROM EMAIL
WHERE EmailAddress NOT IN
(
SELECT EmailAddress
FROM BLACKLIST
)
but when the row counts get very high the performance is terrible.
但是当争吵越来越多的时候,表现就很糟糕了。
How can I better do this? (Assume generic SQL if possible. If not, assume T-SQL.)
我该怎么做呢?(如果可能的话,假设是通用SQL。如果不是,假设t - sql)。
2 个解决方案
#1
22
You can use a left outer join, or a not exists
clause.
可以使用左外部连接或不存在子句。
Left outer join:
左外连接:
select E.EmailAddress
from EMAIL E left outer join BLACKLIST B on (E.EmailAddress = B.EmailAddress)
where B.EmailAddress is null;
Not Exists:
不存在:
select E.EmailAddress
from EMAIL E where not exists
(select EmailAddress from BLACKLIST B where B.EmailAddress = E.EmailAddress)
Both are quite generic SQL solutions (don't depend on a specific DB engine). I would say that the latter is a little bit more performant (not by much though). But definitely more performant than the not in
one.
两者都是非常通用的SQL解决方案(不依赖于特定的DB引擎)。我想说,后者的表现稍微好一点(虽然不是很多)。但是肯定比没有表现的更有效果。
As commenters stated, you can also try creating an index on BLACKLIST(EmailAddress)
, that should help speed up the execution of your query.
正如评论者所说,您还可以尝试在黑名单(电子邮件地址)上创建一个索引,这将有助于加快查询的执行。
#2
3
NOT IN differs from NOT EXISTS if the blacklist allow null value as EmailAddress. If there is a single null value the result of the query will always return zero rows because NOT IN (null) is unknown / false for every value. The query plans therefore differs slighyly but I don't think there would be any serious performance impact.
如果黑名单允许空值作为电子邮件地址,则不存在。如果只有一个空值,查询的结果将始终返回零行,因为(null)中没有对每个值都是未知/错误的。因此,查询计划略有不同,但我认为不会有任何严重的性能影响。
A suggestion is to create a new table called VALIDEMAIL, add a trigger to BLACKLIST that removes addresses from VALIDEMAIL when rows are inserted and add to VALIDEMAIL when removed from BLACKLIST. Then replace EMAIL with a view that is a union of both VALIDEMAIL and BLACKLIST.
一个建议是创建一个名为VALIDEMAIL的新表,向黑名单添加一个触发器,该触发器在插入行时从VALIDEMAIL中删除地址,在从黑名单中删除时添加到VALIDEMAIL。然后用VALIDEMAIL和黑名单结合的视图替换电子邮件。
#1
22
You can use a left outer join, or a not exists
clause.
可以使用左外部连接或不存在子句。
Left outer join:
左外连接:
select E.EmailAddress
from EMAIL E left outer join BLACKLIST B on (E.EmailAddress = B.EmailAddress)
where B.EmailAddress is null;
Not Exists:
不存在:
select E.EmailAddress
from EMAIL E where not exists
(select EmailAddress from BLACKLIST B where B.EmailAddress = E.EmailAddress)
Both are quite generic SQL solutions (don't depend on a specific DB engine). I would say that the latter is a little bit more performant (not by much though). But definitely more performant than the not in
one.
两者都是非常通用的SQL解决方案(不依赖于特定的DB引擎)。我想说,后者的表现稍微好一点(虽然不是很多)。但是肯定比没有表现的更有效果。
As commenters stated, you can also try creating an index on BLACKLIST(EmailAddress)
, that should help speed up the execution of your query.
正如评论者所说,您还可以尝试在黑名单(电子邮件地址)上创建一个索引,这将有助于加快查询的执行。
#2
3
NOT IN differs from NOT EXISTS if the blacklist allow null value as EmailAddress. If there is a single null value the result of the query will always return zero rows because NOT IN (null) is unknown / false for every value. The query plans therefore differs slighyly but I don't think there would be any serious performance impact.
如果黑名单允许空值作为电子邮件地址,则不存在。如果只有一个空值,查询的结果将始终返回零行,因为(null)中没有对每个值都是未知/错误的。因此,查询计划略有不同,但我认为不会有任何严重的性能影响。
A suggestion is to create a new table called VALIDEMAIL, add a trigger to BLACKLIST that removes addresses from VALIDEMAIL when rows are inserted and add to VALIDEMAIL when removed from BLACKLIST. Then replace EMAIL with a view that is a union of both VALIDEMAIL and BLACKLIST.
一个建议是创建一个名为VALIDEMAIL的新表,向黑名单添加一个触发器,该触发器在插入行时从VALIDEMAIL中删除地址,在从黑名单中删除时添加到VALIDEMAIL。然后用VALIDEMAIL和黑名单结合的视图替换电子邮件。