死锁的四个必要条件:
互斥条件(Mutual exclusion):资源不能被共享,只能由一个进程使用。
请求与保持条件(Hold and wait):已经得到资源的进程可以再次申请新的资源。
非剥夺条件(No pre-emption):已经分配的资源不能从相应的进程中被强制地剥夺。
循环等待条件(Circular wait):系统中若干进程组成环路,该环路中每个进程都在等待相邻进程正占用的资源。
仓库拣货卡死,排查了数据库的很多地方,都没有头绪,最后到SQL Server 错误日志里查看,终于发现了蛛丝马迹
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
|
EXEC xp_readerrorlog 0,1, NULL , NULL , '2015-09-21' , '2015-10-10' , 'DESC'
waiter id=process5c30e08 mode=U requestType=wait
waiter-list
owner id=process5c26988 mode=X
owner-list
keylock hobtid=72057597785604096 dbid=33 objectname=stoxxx.dbo.Orderxxx indexname=IX_PricingExpressProductCode_State id=lock17fa96980 mode=X associatedObjectId=72057597785604096
waiter id=process5c26988 mode=U requestType=wait
waiter-list
owner id=process5c30e08 mode=X
owner-list
keylock hobtid=72057597785604096 dbid=33 objectname=stoxxx.dbo.Orderxxx indexname=IX_PricingExpressProductCode_State id=lock87d69e780 mode=X associatedObjectId=72057597785604096
resource-list
(@OperateState money,@HandledByNewWms bit ,@State int ,@OrderOut int )
UPDATE [Orderxx] SET [OperateState] = @OperateState,[HandledByNewWms] = @HandledByNewWms WHERE (([Orderxxx].[State] = @State) And ([Orderxxx].[OrderOut] = @OrderOut) And ([Orderxxx].[PricingExpressProductCode] IN ( 'UKNIR' )))
inputbuf
unknown
frame procname=unknown line=1 sqlhandle=0x000000000000000000000000000000000000000000000000
UPDATE [Orderxxx] SET [OperateState] = @OperateState,[HandledByNewWms] = @HandledByNewWms WHERE (([Orderxxx].[State] = @State) And ([Orderxxx].[OrderOut] = @OrderOut) And ([Orderxxx].[PricingExpressProductCode] IN ( 'UKNIR' )))
frame procname=adhoc line=1 stmtstart=134 sqlhandle=0x020000009d376d18a17e7ea51289d8caa2fb4de65c976389
executionStack
process id=process5c30e08 taskpriority=0 logused=10320 waitresource= KEY : 33:72057597785604096 (112399c2054a) waittime=4813 ownerId=31578743038 transactionname=user_transaction lasttranstarted=2015-09-24T10:22:58.410 XDES=0x372e95950 lockMode=U schedulerid=17 kpid=8496 status=suspended spid=153 sbid=0 ecid=0 priority=0 trancount=2 lastbatchstarted=2015-09-24T10:22:58.540 lastbatchcompleted=2015-09-24T10:22:58.540 clientapp=.Net SqlClient Data Provider hostname=CK1-WIN-WEB02 hostpid=37992 loginname=ck1.biz isolationlevel= read committed (2) xactid=31578743038 currentdb=33 lockTimeout=4294967295 clientoption1=671088672 clientoption2=128056
(@OperateState money,@HandledByNewWms bit ,@State int ,@OrderOut int ) UPDATE [Orderxxx] SET [OperateState] = @OperateState,[HandledByNewWms] = @HandledByNewWms WHERE (([Orderxxx].[State] = @State) And ([Orderxxx].[OrderOut] = @OrderOut) And ([Orderxxx].[PricingExpressProductCode] IN ( 'UKNIR' )))
inputbuf
unknown
frame procname=unknown line=1 sqlhandle=0x000000000000000000000000000000000000000000000000
UPDATE [Orderxxx] SET [OperateState] = @OperateState,[HandledByNewWms] = @HandledByNewWms WHERE (([Orderxxx].[State] = @State) And ([Orderxxx].[OrderOut] = @OrderOut) And ([Orderxxx].[PricingExpressProductCode] IN ( 'UKNIR' )))
frame procname=adhoc line=1 stmtstart=134 sqlhandle=0x020000009d376d18a17e7ea51289d8caa2fb4de65c976389
executionStack
process id=process5c26988 taskpriority=0 logused=9892 waitresource= KEY : 33:72057597785604096 (70f5b089bb2b) waittime=4813 ownerId=31579268946 transactionname=user_transaction lasttranstarted=2015-09-24T10:27:01.357 XDES=0x98312f950 lockMode=U schedulerid=16 kpid=9184 status=suspended spid=454 sbid=0 ecid=0 priority=0 trancount=2 lastbatchstarted=2015-09-24T10:27:01.490 lastbatchcompleted=2015-09-24T10:27:01.487 clientapp=.Net SqlClient Data Provider hostname=CK1-WIN-WEB02 hostpid=37992 loginname=ck1.biz isolationlevel= read committed (2) xactid=31579268946 currentdb=33 lockTimeout=4294967295 clientoption1=671088672 clientoption2=128056
process-list
deadlock victim=process5c26988
deadlock-list
|
咋一看上面的错误信息,可以发现两条相同的语句造成的死锁,但是这么短的语句不可能持有排他锁太久
再仔细分析一下错误日志,发现都死锁在同一个非聚集索引上,再问了一下开发,开发那边说,这条语句是在一个大事务里面,这个事务会做7、8件事
索引属性
还有索引里面的数据,发现很多重复值
SQL语句是这样的
1
2
3
4
|
(@OperateState money,@HandledByNewWms bit ,@State int ,@OrderOut int )
@HandledByNewWms=(1) @OperateState=($1.0000) @OrderOut=(4055484) @State=(3)
UPDATE [Orderxxx] SET [OperateState] = $1.0000,[HandledByNewWms] = 1
WHERE (([Orderxxx].[State] = 3) And ([Orderxxx].[OrderOut] = 4055484) And ([Orderxxx].[PricingExpressProductCode] IN ( 'UKRRM' , 'UKRLE' )))
|
下图为语句生成的执行计划
当时的情况是大量SQL语句被阻塞,而阻塞的语句正是下面这条语句
1
2
|
UPDATE [Orderxxx] SET [OperateState] = $1.0000,[HandledByNewWms] = 1
WHERE (([Orderxxx].[State] = 3) And ([Orderxxx].[OrderOut] = 4055484) And ([Orderxxx].[PricingExpressProductCode] IN ( 'UKRRM' , 'UKRLE' )))
|
解决方法
上面得出几个症状
1、update语句是在一个大事务里面,事务太大导致其他session等待排他锁的时间变长
2、大家都在使用同一个非聚集索引,并扫描PricingExpressProductCode字段
3、索引里的重复值很多
从上面的症状基本可以判断,这个非聚集索引无啥用,可以禁用之
1
|
ALTER INDEX [IX_PricingExpressProductCode_State] ON [dbo].[Orderxxx] DISABLE
|
禁用之后,死锁消失,问题解决,仓库的怨气也随之消失
这一次排查过程时间有点长,但是很好定位,SQL Server错误日志给出了足够的信息定位死锁问题,所以遇到问题的时候一定要分析清楚日志