
时间:2022-03-06 16:59:03

I have a large amount of constantly incoming data (roughly 10,000 a minute, and growing) that I want to insert into a database as efficiently as possible. At the moment I'm using prepared insert statements, but am thinking of using the SqlBulkCopy class to import the data in larger chunks.


The problem is that I'm not inserting into a single table - elements of the data item are inserted into numerous tables, and their identity columns are used as foreign keys in other rows that are inserted at the same time. I understand that bulk copies aren't meant to allow for more complex inserts like this, but I wonder if it is worth exchanging my identity columns (bigints in this case) for uniqueidentifier columns. This will allow me to do a couple of bulk copies for each table, and since I can determine the IDs before the insert, I don't need to check for anything like SCOPE_IDENTITY which is preventing me from using bulk copy.


Does this sound like a viable solution, or are there other potential issues I might face? Or, is there another way I can insert data quickly, but retain my use of bigint identity columns?

这听起来是一个可行的解决方案,还是我可能面临的其他潜在问题?或者,是否有另一种方法可以快速插入数据,但仍然保留了对bigint identity列的使用?



2 个解决方案



It sounds like you are planning on exchanging "SQL assigns a [bigint identity() column] surrogate key" with a "data prep routine assings a GUID surrogate key" methodology. In other words, the key will not be assigned within SQL, but from outside SQL. Given your volumes, if the data-generating process can assign surrogate key, I'd definitely go with that.

这听起来像是你正在计划交换“SQL分配a [bigint identity()列]代理键”,并使用“数据准备例程分析GUID代理键”的方法。换句话说,密钥不会在SQL中分配,而是从SQL外部分配。给定您的卷,如果数据生成过程可以指定代理键,我肯定会使用它。

The question then becomes, must you use GUIDs, or can your data-generation process produce auto-incrementing integers? Creating such a process that works consistantly and infallibly is hard (one reason why you pay $$$ for SQL Server), but the trade-off for smaller and more human-legible keys within the database might be worth it.

接下来的问题是,您必须使用gui吗?或者您的数据生成过程能够生成自动递增的整数吗?创建这样一个运行一致且始终正确的进程是很困难的(这就是为什么您要为SQL Server花费$$),但是在数据库中使用更小、更容易识别的键可能是值得的。



uniqueidentifier will probably make things worse: page splits and wider. See this


If your load is/can be batched, one options is to:


  • you load a staging table
  • 加载一个staging表
  • load the real tables in one go as a stored procedure
  • 以存储过程的形式一次加载真实的表
  • use a uniqueidentifier in the staging table for each batch
  • 在staging表中为每个批使用惟一标识符

We deal with peaks of around 50k rows per second (and increasing this way). We actually use a separate staging database to avoid double transaction log writes)




It sounds like you are planning on exchanging "SQL assigns a [bigint identity() column] surrogate key" with a "data prep routine assings a GUID surrogate key" methodology. In other words, the key will not be assigned within SQL, but from outside SQL. Given your volumes, if the data-generating process can assign surrogate key, I'd definitely go with that.

这听起来像是你正在计划交换“SQL分配a [bigint identity()列]代理键”,并使用“数据准备例程分析GUID代理键”的方法。换句话说,密钥不会在SQL中分配,而是从SQL外部分配。给定您的卷,如果数据生成过程可以指定代理键,我肯定会使用它。

The question then becomes, must you use GUIDs, or can your data-generation process produce auto-incrementing integers? Creating such a process that works consistantly and infallibly is hard (one reason why you pay $$$ for SQL Server), but the trade-off for smaller and more human-legible keys within the database might be worth it.

接下来的问题是,您必须使用gui吗?或者您的数据生成过程能够生成自动递增的整数吗?创建这样一个运行一致且始终正确的进程是很困难的(这就是为什么您要为SQL Server花费$$),但是在数据库中使用更小、更容易识别的键可能是值得的。



uniqueidentifier will probably make things worse: page splits and wider. See this


If your load is/can be batched, one options is to:


  • you load a staging table
  • 加载一个staging表
  • load the real tables in one go as a stored procedure
  • 以存储过程的形式一次加载真实的表
  • use a uniqueidentifier in the staging table for each batch
  • 在staging表中为每个批使用惟一标识符

We deal with peaks of around 50k rows per second (and increasing this way). We actually use a separate staging database to avoid double transaction log writes)
