So I'm upgrading an old parser right now. It's written in C# and uses SQL to insert records into a database.
所以我现在正在升级一个旧的解析器。它是用C#编写的,并使用SQL将记录插入数据库。
Currently it reads and parses a few thousand lines of data from a file, then inserts the new data into a database containing over a million records.
目前,它从文件中读取并解析数千行数据,然后将新数据插入包含超过一百万条记录的数据库中。
Sometimes it can take over 10 minutes just to add a few thousand lines.
有时可能需要超过10分钟才能添加几千行。
I've come to the conclusion that this bottleneck in performance is due to a SQL command where it uses an IF NOT EXISTS
statement to determine whether the row attempting to be inserted already exists, and if it doesn't insert the record.
我得出结论,这个性能瓶颈是由于SQL命令,它使用IF NOT EXISTS语句来确定尝试插入的行是否已经存在,以及它是否未插入记录。
I believe the problem is that it just takes way too long to call the IF NOT EXISTS
on every single row in the new data.
我认为问题在于,在新数据的每一行上调用IF NOT EXISTS只需要太长时间。
Is there a faster way to determine whether data exists already or not?
有没有更快的方法来确定数据是否已经存在?
I was thinking to insert all of the records first anyways using the SQLBulkCopy
Class, then running a stored procedure to remove the duplicates.
我想首先使用SQLBulkCopy类插入所有记录,然后运行存储过程来删除重复项。
Does anyone else have any suggestions or methods to do this as efficiently and quickly as possible? Anything would be appreciated.
有没有其他人有任何建议或方法尽可能有效和快速地做到这一点?任何事情都会受到赞赏。
EDIT: To clarify, I'd run a stored procedure (on the large table) after copying the new data into the large table
编辑:为了澄清,我将新数据复制到大表后运行存储过程(在大表上)
large table = 1,000,000+ rows
大表= 1,000,000+行
1 个解决方案
#1
0
1. Create an IDataReader to loop over your source data.
2. Place the values into a strong dataset.
3. Every N number of rows, send the dataset (.GetXml) to a stored procedure. Let's say 1000 for the heck of it.
4. Have the stored procedure shred the xml.
5. Do your INSERT/UPDATE based on this shredded xml.
6. Return from the procedure, keep looping until you're done.
Here is an older example: http://granadacoder.wordpress.com/2009/01/27/bulk-insert-example-using-an-idatareader-to-strong-dataset-to-sql-server-xml/
这是一个较旧的例子:http://granadacoder.wordpress.com/2009/01/27/bulk-insert-example-using-an-idatareader-to-strong-dataset-to-sql-server-xml/
The key is that you are doing "bulk" operations.......instead of row by row. And you can pick a sweet spot # (1000 for example) that gives you the best performance.
关键是你正在进行“批量”操作.......而不是逐行。你可以选择一个最佳性能#(例如1000),以获得最佳性能。
#1
0
1. Create an IDataReader to loop over your source data.
2. Place the values into a strong dataset.
3. Every N number of rows, send the dataset (.GetXml) to a stored procedure. Let's say 1000 for the heck of it.
4. Have the stored procedure shred the xml.
5. Do your INSERT/UPDATE based on this shredded xml.
6. Return from the procedure, keep looping until you're done.
Here is an older example: http://granadacoder.wordpress.com/2009/01/27/bulk-insert-example-using-an-idatareader-to-strong-dataset-to-sql-server-xml/
这是一个较旧的例子:http://granadacoder.wordpress.com/2009/01/27/bulk-insert-example-using-an-idatareader-to-strong-dataset-to-sql-server-xml/
The key is that you are doing "bulk" operations.......instead of row by row. And you can pick a sweet spot # (1000 for example) that gives you the best performance.
关键是你正在进行“批量”操作.......而不是逐行。你可以选择一个最佳性能#(例如1000),以获得最佳性能。