使用SqlBulkCopy填充一个很大的表的最佳方式是什么?

时间:2021-03-28 07:37:06

Nightly, I need to fill a SQL Server 2005 table from an ODBC source with over 8 million records. Currently I am using an insert statement from linked server with syntax select similar to this:

每晚,我都需要从ODBC源填充SQL Server 2005表,其中记录超过800万条。目前,我正在使用来自链接服务器的insert语句,语法选择类似于以下内容:

Insert Into SQLStagingTable from Select * from OpenQuery(ODBCSource, 'Select * from SourceTable')

This is really inefficient and takes hours to run. I'm in the middle of coding a solution using SqlBulkInsert code similar to the code found in this question.

这是非常低效的,需要几个小时才能运行。我正在编写一个使用SqlBulkInsert代码的解决方案,类似于这个问题中的代码。

The code in that question is first populating a datatable in memory and then passing that datatable to the SqlBulkInserts WriteToServer method.

该问题中的代码首先在内存中填充一个datatable,然后将该datatable传递给SqlBulkInserts WriteToServer方法。

What should I do if the populated datatable uses more memory than is available on the machine it is running (a server with 16GB of memory in my case)?

如果填充的datatable使用的内存超过它正在运行的机器上的可用内存(在我的例子中,服务器的内存是16GB),我该怎么办?

I've thought about using the overloaded ODBCDataAdapter fill method which allows you to fill only the records from x to n (where x is the start index and n is the number of records to fill). However that could turn out to be an even slower solution than what I currently have since it would mean re-running the select statement on the source a number of times.

我考虑过使用重载的ODBCDataAdapter填充方法,该方法允许您只填充从x到n的记录(其中x是开始索引,n是要填充的记录数量)。然而,这可能是一个比我目前拥有的更慢的解决方案,因为这意味着要在源上多次重新运行select语句。

What should I do? Just populate the whole thing at once and let the OS manage the memory? Should I populate it in chunks? Is there another solution I haven't thought of?

我应该做什么?一次填充整个程序,让操作系统管理内存?我应该用块填充它吗?还有没有其他我没有想到的解决办法?

3 个解决方案

#1


4  

The easiest way would be to use ExecuteReader() against your odbc data source and pass the IDataReader to the WriteToServer(IDataReader) overload.

最简单的方法是对odbc数据源使用ExecuteReader(),并将IDataReader传递给WriteToServer(IDataReader)重载。

Most data reader implementations will only keep a very small portion of the total results in memory.

大多数数据阅读器实现只会在内存中保留很小的部分。

#2


1  

SSIS performs well and is very tweakable. In my experience 8 million rows is not out of its league. One of my larger ETLs pulls in 24 million rows a day and does major conversions and dimensional data warehouse manipulations.

SSIS表现得很好,并且很容易调整。在我的经验中,八百万行并不是它的联盟。我的一个较大的ETLs每天要处理2400万行,并进行主要的转换和维度数据仓库操作。

#3


0  

If you have indexes on the destination table, you might consider disabling those till the records get inserted?

如果目标表上有索引,您可以考虑禁用它们,直到插入记录?

#1


4  

The easiest way would be to use ExecuteReader() against your odbc data source and pass the IDataReader to the WriteToServer(IDataReader) overload.

最简单的方法是对odbc数据源使用ExecuteReader(),并将IDataReader传递给WriteToServer(IDataReader)重载。

Most data reader implementations will only keep a very small portion of the total results in memory.

大多数数据阅读器实现只会在内存中保留很小的部分。

#2


1  

SSIS performs well and is very tweakable. In my experience 8 million rows is not out of its league. One of my larger ETLs pulls in 24 million rows a day and does major conversions and dimensional data warehouse manipulations.

SSIS表现得很好,并且很容易调整。在我的经验中,八百万行并不是它的联盟。我的一个较大的ETLs每天要处理2400万行,并进行主要的转换和维度数据仓库操作。

#3


0  

If you have indexes on the destination table, you might consider disabling those till the records get inserted?

如果目标表上有索引,您可以考虑禁用它们,直到插入记录?