Is there a way I can improve this kind of SQL query performance:
是否有办法改进这种SQL查询性能:
INSERT
INTO ...
WHERE NOT EXISTS(Validation...)
The problem is when I have many data in my table (like million of rows), the execution of the WHERE NOT EXISTS
clause if very slow. I have to do this verification because I can't insert duplicated data.
问题是,当我的表中有很多数据时(比如数百万行),如果非常慢,则执行WHERE NOT EXISTS子句。我必须进行验证,因为我不能插入重复的数据。
I use SQLServer 2005
我用2005状态"置疑"
thx
谢谢
6 个解决方案
#1
11
Make sure you are searching on indexed columns, with no manipulation of the data within those columns (like substring etc.)
确保您正在索引列上搜索,而不操纵这些列中的数据(如子字符串等)。
#2
10
Off the top of my head, you could try something like:
在我的头顶上,你可以试试:
TRUNCATE temptable
INSERT INTO temptable ...
INSERT INTO temptable ...
...
INSERT INTO realtable
SELECT temptable.* FROM temptable
LEFT JOIN realtable on realtable.key = temptable.key
WHERE realtable.key is null
#3
5
Try to replace the NOT EXISTS with a left outer join, it sometimes performs better in large data sets.
尝试用左外连接替换不存在,它有时在大型数据集中表现得更好。
#4
0
Pay attention to the other answer regarding indexing. NOT EXISTS is typically quite fast if you have good indexes.
注意关于索引的其他答案。如果有良好的索引,则不存在通常会很快。
But I have had performance issues with statements like you describe. One method I've used to get around that is to use a temp table for the candidate values, perform a DELETE FROM ... WHERE EXISTS (...), and then blindly INSERT the remainder. Inside a transaction, of course, to avoid race conditions. Splitting up the queries sometimes allows the optimizer to do its job without getting confused.
但是我对你所描述的语句有性能问题。我使用过的一种方法是为候选值使用临时表,从…执行删除。存在的地方(…),然后盲目地插入余数。当然,在交易中,要避免竞态条件。分割查询有时允许优化器在不混淆的情况下完成其工作。
#5
0
If you can at all reduce your problem space, then you'll gain heaps of performance. Are you absolutely sure that every one of those rows in that table needs to be checked?
如果您能够减少问题空间,那么您将获得大量的性能。您确定该表中的每一行都需要检查吗?
The other thing you might want to try is a DELETE InsertTable FROM InsertTable INNER JOIN ExistingTable ON <Validation criteria>
before your insert. However, your mileage may vary
您可能想要尝试的另一件事情是在插入之前,在
#6
0
insert into customers
select *
from newcustomers
where customerid not in (select customerid
from customers)
..may be more efficient. As others have said, make sure you've got indexes on any lookup fields.
. .可能会更有效率。正如其他人所说,确保在任何查找字段上都有索引。
#1
11
Make sure you are searching on indexed columns, with no manipulation of the data within those columns (like substring etc.)
确保您正在索引列上搜索,而不操纵这些列中的数据(如子字符串等)。
#2
10
Off the top of my head, you could try something like:
在我的头顶上,你可以试试:
TRUNCATE temptable
INSERT INTO temptable ...
INSERT INTO temptable ...
...
INSERT INTO realtable
SELECT temptable.* FROM temptable
LEFT JOIN realtable on realtable.key = temptable.key
WHERE realtable.key is null
#3
5
Try to replace the NOT EXISTS with a left outer join, it sometimes performs better in large data sets.
尝试用左外连接替换不存在,它有时在大型数据集中表现得更好。
#4
0
Pay attention to the other answer regarding indexing. NOT EXISTS is typically quite fast if you have good indexes.
注意关于索引的其他答案。如果有良好的索引,则不存在通常会很快。
But I have had performance issues with statements like you describe. One method I've used to get around that is to use a temp table for the candidate values, perform a DELETE FROM ... WHERE EXISTS (...), and then blindly INSERT the remainder. Inside a transaction, of course, to avoid race conditions. Splitting up the queries sometimes allows the optimizer to do its job without getting confused.
但是我对你所描述的语句有性能问题。我使用过的一种方法是为候选值使用临时表,从…执行删除。存在的地方(…),然后盲目地插入余数。当然,在交易中,要避免竞态条件。分割查询有时允许优化器在不混淆的情况下完成其工作。
#5
0
If you can at all reduce your problem space, then you'll gain heaps of performance. Are you absolutely sure that every one of those rows in that table needs to be checked?
如果您能够减少问题空间,那么您将获得大量的性能。您确定该表中的每一行都需要检查吗?
The other thing you might want to try is a DELETE InsertTable FROM InsertTable INNER JOIN ExistingTable ON <Validation criteria>
before your insert. However, your mileage may vary
您可能想要尝试的另一件事情是在插入之前,在
#6
0
insert into customers
select *
from newcustomers
where customerid not in (select customerid
from customers)
..may be more efficient. As others have said, make sure you've got indexes on any lookup fields.
. .可能会更有效率。正如其他人所说,确保在任何查找字段上都有索引。