如何快速插入SQL Server 2008

时间:2021-11-10 16:58:40

I have a project that involves recording data from a device directly into a sql table.

我有一个项目涉及将设备中的数据直接记录到sql表中。

I do very little processing in code before writing to sql server (2008 express by the way)

在写入sql server之前,我在代码中进行的处理非常少(2008年顺便说一下)

typically i use the sqlhelper class's ExecuteNonQuery method and pass in a stored proc name and list of parameters that the SP expects.

通常我使用sqlhelper类的ExecuteNonQuery方法并传入存储的proc名称和SP期望的参数列表。

This is very convenient, but i need a much faster way of doing this.

这非常方便,但我需要一种更快的方法。

Thanks.

谢谢。

7 个解决方案

#1


39  

ExecuteNonQuery with an INSERT statement, or even a stored procedure, will get you into thousands of inserts per second range on Express. 4000-5000/sec are easily achievable, I know this for a fact.

使用INSERT语句或甚至存储过程的ExecuteNonQuery将使您在Express上每秒进行数千次插入。 4000-5000 /秒很容易实现,我知道这是事实。

What usually slows down individual updates is the wait time for log flush and you need to account for that. The easiest solution is to simply batch commit. Eg. commit every 1000 inserts, or every second. This will fill up the log pages and will amortize the cost of log flush wait over all the inserts in a transaction.

通常减慢个别更新的是日志刷新的等待时间,您需要考虑到这一点。最简单的解决方案是简单地批量提交。例如。每1000个插入或每秒提交一次。这将填满日志页面,并将分摊事务中所有插入的日志刷新等待的成本。

With batch commits you'll probably bottleneck on disk log write performance, which there is nothing you can do about it short of changing the hardware (going raid 0 stripe on log).

使用批量提交,您可能会遇到磁盘日志写入性能的瓶颈,除了更改硬件(在日志上进行raid 0条带化)之外,没有什么可以做的。

If you hit earlier bottlenecks (unlikely) then you can look into batching statements, ie. send one single T-SQL batch with multiple inserts on it. But this seldom pays off.

如果你遇到了较早的瓶颈(不太可能),那么你可以查看批处理语句,即。发送一个包含多个插入的单个T-SQL批处理。但这很少有回报。

Of course, you'll need to reduce the size of your writes to a minimum, meaning reduce the width of your table to the minimally needed columns, eliminate non-clustered indexes, eliminate unneeded constraints. If possible, use a Heap instead of a clustered index, since Heap inserts are significantly faster than clustered index ones.

当然,您需要将写入的大小减小到最小,这意味着将表的宽度减少到最低需要的列,消除非聚集索引,消除不必要的约束。如果可能,请使用Heap而不是聚簇索引,因为Heap插入明显快于聚簇索引。

There is little need to use the fast insert interface (ie. SqlBulkCopy). Using ordinary INSERTS and ExecuteNoQuery on batch commits you'll exhaust the drive sequential write throughput much faster than the need to deploy bulk insert. Bulk insert is needed on fast SAN connected machines, and you mention Express so it's probably not the case. There is a perception of the contrary out there, but is simply because people don't realize that bulk insert gives them batch commit, and its the batch commit that speeds thinks up, not the bulk insert.

几乎不需要使用快速插入接口(即SqlBulkCopy)。在批量提交中使用普通的INSERTS和ExecuteNoQuery,您将比部署批量插入的速度快得多地耗尽驱动器顺序写入吞吐量。快速SAN连接的机器上需要批量插入,你提到Express,所以可能不是这样。在那里有一种相反的看法,但仅仅是因为人们没有意识到批量插入给了批量提交,并且它的批量提交速度提高了,而不是批量插入。

As with any performance test, make sure you eliminate randomness, and preallocate the database and the log, you don't want to hit db or log growth event during test measurements or during production, that is sooo amateurish.

与任何性能测试一样,请确保消除随机性,并预先分配数据库和日志,您不希望在测试测量期间或生产期间点击数据库或日志增长事件,这是非常业余的。

#2


4  

bulk insert would be the fastest since it is minimally logged

批量插入将是最快的,因为它是最小的记录

.NET also has the SqlBulkCopy Class

.NET也有SqlBulkCopy类

#3


2  

This is typically done by way of a BULK INSERT. Basically, you prepare a file and then issue the BULK INSERT statement and SQL Server copies all the data from the file to the table with the fast method possible.

这通常通过BULK INSERT完成。基本上,您准备一个文件然后发出BULK INSERT语句,SQL Server使用快速方法将文件中的所有数据复制到表中。

It does have some restrictions (for example, there's no way to do "update or insert" type of behaviour if you have possibly-existing rows to update), but if you can get around those, then you're unlikely to find anything much faster.

它确实有一些限制(例如,如果您可能存在要更新的行,则无法执行“更新或插入”类型的行为),但是如果您可以绕过这些,那么您不太可能找到任何内容更快。

#4


2  

Things that can slow inserts include indexes and reads or updates (locks) on the same table. You can speed up situations like yours by avoiding both and inserting individual transactions to a separate holding table with no indexes or other activity. Then batch the holding table to the main table a little less frequently.

可以减慢插入的事情包括同一个表上的索引和读取或更新(锁)。您可以通过避免这两种情况并将单个事务插入到没有索引或其他活动的单独保留表中来加速像您这样的情况。然后将保持台批次更少地分配到主表。

#5


2  

Here is a good way to insert a lot of records using table variables...

这是使用表变量插入大量记录的好方法......

...but best to limit it to 1000 records at a time because table variables are "in Memory"

...但最好一次将其限制为1000条记录,因为表变量是“在内存中”

In this example I will insert 2 records into a table with 3 fields - CustID, Firstname, Lastname

在这个例子中,我将2个记录插入到一​​个包含3个字段的表中 - CustID,Firstname,Lastname

--first create an In-Memory table variable with same structure
--you could also use a temporary table, but it would be slower

declare @MyTblVar table (CustID int, FName nvarchar(50), LName nvarchar(50))

insert into @MyTblVar values (100,'Joe','Bloggs')

insert into @MyTblVar values (101,'Mary','Smith')

Insert into MyCustomerTable

Select * from @MyTblVar

#6


1  

If you mean from .NET then use SqlBulkCopy

如果您的意思是来自.NET,那么使用SqlBulkCopy

#7


1  

It can only really go as fast as your SP will run. Ensure that the table(s) are properly indexed and if you have a clustered index, ensure that it has a narrow, unique, increasing key. Ensure that the remaining indexes and constraints (if any) do not have a lot of overhead.

它只能像你的SP一样快。确保表格已正确编入索引,如果您有聚簇索引,请确保它具有一个狭窄,唯一,增加的键。确保剩余的索引和约束(如果有的话)没有很多开销。

You shouldn't see much overhead in the ADO.NET layer (I wouldn't necessarily use any other .NET library above SQLCommand). You may be able to use ADO.NET Async methods in order to queue several calls to the stored proc without blocking a single thread in your application (this potentially could free up more throughput than anything else - just like having multiple machines inserting into the database).

您不应该在ADO.NET层中看到太多开销(我不一定会在SQLCommand之上使用任何其他.NET库)。您可以使用ADO.NET Async方法将多个调用排队到存储过程而不会阻塞应用程序中的单个线程(这可能会释放比其他任何东西更多的吞吐量 - 就像将多台机器插入数据库一样)。

Other than that, you really need to tell us more about your requirements.

除此之外,您真的需要告诉我们更多有关您的要求的信息。

#1


39  

ExecuteNonQuery with an INSERT statement, or even a stored procedure, will get you into thousands of inserts per second range on Express. 4000-5000/sec are easily achievable, I know this for a fact.

使用INSERT语句或甚至存储过程的ExecuteNonQuery将使您在Express上每秒进行数千次插入。 4000-5000 /秒很容易实现,我知道这是事实。

What usually slows down individual updates is the wait time for log flush and you need to account for that. The easiest solution is to simply batch commit. Eg. commit every 1000 inserts, or every second. This will fill up the log pages and will amortize the cost of log flush wait over all the inserts in a transaction.

通常减慢个别更新的是日志刷新的等待时间,您需要考虑到这一点。最简单的解决方案是简单地批量提交。例如。每1000个插入或每秒提交一次。这将填满日志页面,并将分摊事务中所有插入的日志刷新等待的成本。

With batch commits you'll probably bottleneck on disk log write performance, which there is nothing you can do about it short of changing the hardware (going raid 0 stripe on log).

使用批量提交,您可能会遇到磁盘日志写入性能的瓶颈,除了更改硬件(在日志上进行raid 0条带化)之外,没有什么可以做的。

If you hit earlier bottlenecks (unlikely) then you can look into batching statements, ie. send one single T-SQL batch with multiple inserts on it. But this seldom pays off.

如果你遇到了较早的瓶颈(不太可能),那么你可以查看批处理语句,即。发送一个包含多个插入的单个T-SQL批处理。但这很少有回报。

Of course, you'll need to reduce the size of your writes to a minimum, meaning reduce the width of your table to the minimally needed columns, eliminate non-clustered indexes, eliminate unneeded constraints. If possible, use a Heap instead of a clustered index, since Heap inserts are significantly faster than clustered index ones.

当然,您需要将写入的大小减小到最小,这意味着将表的宽度减少到最低需要的列,消除非聚集索引,消除不必要的约束。如果可能,请使用Heap而不是聚簇索引,因为Heap插入明显快于聚簇索引。

There is little need to use the fast insert interface (ie. SqlBulkCopy). Using ordinary INSERTS and ExecuteNoQuery on batch commits you'll exhaust the drive sequential write throughput much faster than the need to deploy bulk insert. Bulk insert is needed on fast SAN connected machines, and you mention Express so it's probably not the case. There is a perception of the contrary out there, but is simply because people don't realize that bulk insert gives them batch commit, and its the batch commit that speeds thinks up, not the bulk insert.

几乎不需要使用快速插入接口(即SqlBulkCopy)。在批量提交中使用普通的INSERTS和ExecuteNoQuery,您将比部署批量插入的速度快得多地耗尽驱动器顺序写入吞吐量。快速SAN连接的机器上需要批量插入,你提到Express,所以可能不是这样。在那里有一种相反的看法,但仅仅是因为人们没有意识到批量插入给了批量提交,并且它的批量提交速度提高了,而不是批量插入。

As with any performance test, make sure you eliminate randomness, and preallocate the database and the log, you don't want to hit db or log growth event during test measurements or during production, that is sooo amateurish.

与任何性能测试一样,请确保消除随机性,并预先分配数据库和日志,您不希望在测试测量期间或生产期间点击数据库或日志增长事件,这是非常业余的。

#2


4  

bulk insert would be the fastest since it is minimally logged

批量插入将是最快的,因为它是最小的记录

.NET also has the SqlBulkCopy Class

.NET也有SqlBulkCopy类

#3


2  

This is typically done by way of a BULK INSERT. Basically, you prepare a file and then issue the BULK INSERT statement and SQL Server copies all the data from the file to the table with the fast method possible.

这通常通过BULK INSERT完成。基本上,您准备一个文件然后发出BULK INSERT语句,SQL Server使用快速方法将文件中的所有数据复制到表中。

It does have some restrictions (for example, there's no way to do "update or insert" type of behaviour if you have possibly-existing rows to update), but if you can get around those, then you're unlikely to find anything much faster.

它确实有一些限制(例如,如果您可能存在要更新的行,则无法执行“更新或插入”类型的行为),但是如果您可以绕过这些,那么您不太可能找到任何内容更快。

#4


2  

Things that can slow inserts include indexes and reads or updates (locks) on the same table. You can speed up situations like yours by avoiding both and inserting individual transactions to a separate holding table with no indexes or other activity. Then batch the holding table to the main table a little less frequently.

可以减慢插入的事情包括同一个表上的索引和读取或更新(锁)。您可以通过避免这两种情况并将单个事务插入到没有索引或其他活动的单独保留表中来加速像您这样的情况。然后将保持台批次更少地分配到主表。

#5


2  

Here is a good way to insert a lot of records using table variables...

这是使用表变量插入大量记录的好方法......

...but best to limit it to 1000 records at a time because table variables are "in Memory"

...但最好一次将其限制为1000条记录,因为表变量是“在内存中”

In this example I will insert 2 records into a table with 3 fields - CustID, Firstname, Lastname

在这个例子中,我将2个记录插入到一​​个包含3个字段的表中 - CustID,Firstname,Lastname

--first create an In-Memory table variable with same structure
--you could also use a temporary table, but it would be slower

declare @MyTblVar table (CustID int, FName nvarchar(50), LName nvarchar(50))

insert into @MyTblVar values (100,'Joe','Bloggs')

insert into @MyTblVar values (101,'Mary','Smith')

Insert into MyCustomerTable

Select * from @MyTblVar

#6


1  

If you mean from .NET then use SqlBulkCopy

如果您的意思是来自.NET,那么使用SqlBulkCopy

#7


1  

It can only really go as fast as your SP will run. Ensure that the table(s) are properly indexed and if you have a clustered index, ensure that it has a narrow, unique, increasing key. Ensure that the remaining indexes and constraints (if any) do not have a lot of overhead.

它只能像你的SP一样快。确保表格已正确编入索引,如果您有聚簇索引,请确保它具有一个狭窄,唯一,增加的键。确保剩余的索引和约束(如果有的话)没有很多开销。

You shouldn't see much overhead in the ADO.NET layer (I wouldn't necessarily use any other .NET library above SQLCommand). You may be able to use ADO.NET Async methods in order to queue several calls to the stored proc without blocking a single thread in your application (this potentially could free up more throughput than anything else - just like having multiple machines inserting into the database).

您不应该在ADO.NET层中看到太多开销(我不一定会在SQLCommand之上使用任何其他.NET库)。您可以使用ADO.NET Async方法将多个调用排队到存储过程而不会阻塞应用程序中的单个线程(这可能会释放比其他任何东西更多的吞吐量 - 就像将多台机器插入数据库一样)。

Other than that, you really need to tell us more about your requirements.

除此之外,您真的需要告诉我们更多有关您的要求的信息。