在backgroud MS SQL中运行大型查询

时间:2021-08-11 21:41:52

I am using MS SQL Server 2008 i have a table which is constantly in use (data is always changing and inserted to it) it contains now ~70 Mill rows, I am trying to run a simple query over the table with a stored procedure that should properly take a few days,

我用MS SQL Server 2008我有一个表中不断使用(数据总是变化和插入)现在包含~ 70轧机行,我试图运行一个简单的查询的表和存储过程,正确应该几天,

i need the table to keep being usable, now i executed the stored procedure and after a while every simple select by identity query that i try to execute on the table is not responding/running too much time that i break it

我需要这个表保持可用性,现在我执行了这个存储过程,过了一段时间,我尝试在表上执行的每一个简单的select by identity查询都没有响应/运行太多时间,以至于我中断了它

what should i do ? here is how my stored procedure looks like :

我该怎么办?我的存储过程如下:

 SET NOCOUNT ON;
update SOMETABLE
set
[some_col] = dbo.ufn_SomeFunction(CONVERT(NVARCHAR(500), another_column))
WHERE 
[some_col] = 243

even if i try it with this on the where clause (with an 'and' logic..) :

即使我在where子句(带有'and' logic.. ..)上尝试了它:

ID_COL > 57000000 and ID_COL < 60000000 and

it still doesn't work

它仍然不工作

BTW- SomeFunction does some simple mathmatics actions and looks up rows in another table that contains about 300k items, but is never changed

SomeFunction执行一些简单的数学操作,并查找包含约300k项的另一个表中的行,但不会更改

I would be happy to hear any advice

我很乐意听到任何建议

1 个解决方案

#1


3  

From my perspective your server has a serious performance problem. Even if we assume that none of the records in the query

在我看来,您的服务器存在严重的性能问题。即使我们假设查询中没有一条记录

select some_col with (nolock) where id_col between 57000000 and 57001000

was in memory, it shouldn't take 21 seconds to read the few pages sequentially from disk (your clustered index on the id_col should not be fragmented if it's an auto-identity and you didn't do something stupid like adding a "desc" to the index definition).

在内存中,按顺序从磁盘读取几页不需要21秒(如果id_col上的群集索引是自动标识的,则不应该分段,而且您也没有做一些愚蠢的事情,比如在索引定义中添加“desc”)。

But if you can't/won't fix that, my advice would be to make the update in small packages like 100-1000 records at a time (depending on how much time the lookup function consumes). One update/transaction should take no more than 30 seconds.

但是,如果您不能/不会修复这个问题,我的建议是每次以100-1000条记录这样的小包进行更新(这取决于查找函数消耗的时间)。一次更新/事务的时间不应超过30秒。

You see each update keeps an exclusive lock on all the records it modified until the transaction is complete. If you don't use an explicit transaction, each statement is executed in a single, automatic transaction context, so the locks get released when the update statement is done.

您可以看到,每次更新都对其修改的所有记录保持独占锁定,直到事务完成为止。如果不使用显式事务,则每个语句都在一个单独的自动事务上下文中执行,因此在完成更新语句时释放锁。

But you can still run into deadlocks that way, depending on what the other processes do. If they modify more than one record at a time, too, or even if they gather and hold read locks on several rows, you can get deadlocks.

但是您仍然可以以这种方式运行死锁,这取决于其他进程的操作。如果它们一次修改多个记录,或者即使它们收集并保持数行读取锁,您也可以获得死锁。

To avoid the deadlocks, your update statement needs to take a lock on all the records it will modify at once. The way to do this is to place the single update statement (with only the few rows limited by the id_col) in a serializable transaction like

为了避免死锁,您的update语句需要锁定它将同时修改的所有记录。这样做的方法是将单个update语句(只有少数行受到id_col的限制)放在可序列化事务中,比如

IF @@TRANCOUNT > 0
  -- Error: You are in a transaction context already

SET NOCOUNT ON
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

-- Insert Loop here to work "x" through the id range
  BEGIN TRANSACTION
    UPDATE SOMETABLE
      SET [some_col] = dbo.ufn_SomeFunction(CONVERT(NVARCHAR(500), another_column))
      WHERE [some_col] = 243 AND id_col BETWEEN x AND x+500 -- or whatever keeps the update in the small timerange
  COMMIT
-- Next loop

-- Get all new records while you where running the loop. If these are too many you may have to paginate this also:
BEGIN TRANSACTION
  UPDATE SOMETABLE
    SET [some_col] = dbo.ufn_SomeFunction(CONVERT(NVARCHAR(500), another_column))
    WHERE [some_col] = 243 AND id_col >= x
COMMIT

For each update this will take an update/exclusive key-range lock on the given records (but only them, because you limit the update through the clustered index key). It will wait for any other updates on the same records to finish, then get it's lock (causing blocking for all other transactions, but still only for the given records), then update the records and release the lock.

对于每个更新,这将对给定的记录进行更新/独占的键范围锁定(但仅限于它们,因为您通过聚集索引键限制更新)。它将等待同一记录上的任何其他更新完成,然后获取它的锁(对所有其他事务造成阻塞,但仍然只对给定的记录),然后更新记录并释放锁。

The last extra statement is important, because it will take a key range lock up to "infinity" and thus prevent even inserts on the end of the range while the update statement runs.

最后一个额外的语句很重要,因为它将把键范围锁到“infinity”,从而在update语句运行时防止在范围的末尾插入。

#1


3  

From my perspective your server has a serious performance problem. Even if we assume that none of the records in the query

在我看来,您的服务器存在严重的性能问题。即使我们假设查询中没有一条记录

select some_col with (nolock) where id_col between 57000000 and 57001000

was in memory, it shouldn't take 21 seconds to read the few pages sequentially from disk (your clustered index on the id_col should not be fragmented if it's an auto-identity and you didn't do something stupid like adding a "desc" to the index definition).

在内存中,按顺序从磁盘读取几页不需要21秒(如果id_col上的群集索引是自动标识的,则不应该分段,而且您也没有做一些愚蠢的事情,比如在索引定义中添加“desc”)。

But if you can't/won't fix that, my advice would be to make the update in small packages like 100-1000 records at a time (depending on how much time the lookup function consumes). One update/transaction should take no more than 30 seconds.

但是,如果您不能/不会修复这个问题,我的建议是每次以100-1000条记录这样的小包进行更新(这取决于查找函数消耗的时间)。一次更新/事务的时间不应超过30秒。

You see each update keeps an exclusive lock on all the records it modified until the transaction is complete. If you don't use an explicit transaction, each statement is executed in a single, automatic transaction context, so the locks get released when the update statement is done.

您可以看到,每次更新都对其修改的所有记录保持独占锁定,直到事务完成为止。如果不使用显式事务,则每个语句都在一个单独的自动事务上下文中执行,因此在完成更新语句时释放锁。

But you can still run into deadlocks that way, depending on what the other processes do. If they modify more than one record at a time, too, or even if they gather and hold read locks on several rows, you can get deadlocks.

但是您仍然可以以这种方式运行死锁,这取决于其他进程的操作。如果它们一次修改多个记录,或者即使它们收集并保持数行读取锁,您也可以获得死锁。

To avoid the deadlocks, your update statement needs to take a lock on all the records it will modify at once. The way to do this is to place the single update statement (with only the few rows limited by the id_col) in a serializable transaction like

为了避免死锁,您的update语句需要锁定它将同时修改的所有记录。这样做的方法是将单个update语句(只有少数行受到id_col的限制)放在可序列化事务中,比如

IF @@TRANCOUNT > 0
  -- Error: You are in a transaction context already

SET NOCOUNT ON
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

-- Insert Loop here to work "x" through the id range
  BEGIN TRANSACTION
    UPDATE SOMETABLE
      SET [some_col] = dbo.ufn_SomeFunction(CONVERT(NVARCHAR(500), another_column))
      WHERE [some_col] = 243 AND id_col BETWEEN x AND x+500 -- or whatever keeps the update in the small timerange
  COMMIT
-- Next loop

-- Get all new records while you where running the loop. If these are too many you may have to paginate this also:
BEGIN TRANSACTION
  UPDATE SOMETABLE
    SET [some_col] = dbo.ufn_SomeFunction(CONVERT(NVARCHAR(500), another_column))
    WHERE [some_col] = 243 AND id_col >= x
COMMIT

For each update this will take an update/exclusive key-range lock on the given records (but only them, because you limit the update through the clustered index key). It will wait for any other updates on the same records to finish, then get it's lock (causing blocking for all other transactions, but still only for the given records), then update the records and release the lock.

对于每个更新,这将对给定的记录进行更新/独占的键范围锁定(但仅限于它们,因为您通过聚集索引键限制更新)。它将等待同一记录上的任何其他更新完成,然后获取它的锁(对所有其他事务造成阻塞,但仍然只对给定的记录),然后更新记录并释放锁。

The last extra statement is important, because it will take a key range lock up to "infinity" and thus prevent even inserts on the end of the range while the update statement runs.

最后一个额外的语句很重要,因为它将把键范围锁到“infinity”,从而在update语句运行时防止在范围的末尾插入。