WaitType:CXPACKET

时间:2020-12-23 04:05:27

CXPACKET 等待类型是SQL Server 并发执行一个query时产生的。在run一个big query时,SQL Server充分利用系统的所有资源(CPU,Memory,IO),在最短时间内返回结果。在拥有多个Processor的系统中,如果将一个Query的Wordload 分布在多个Processor上,那么每个Processor只处理一部分数据,这将成倍的减少查询的时间消耗,相应地,成倍的提高查询性能。

在执行Query时,SQL Server 估计其时间 Cost,如果大于 Parallelism Threshold Value,那么SQL Server使用多个Thread(每个CPU上在同一时间只能运行一个Thread),以并发方式执行query;如果低于Parallelism Threshold Value,那么SQL Server 使用单个Thread执行query。这个 Threshold Value 是由 cost threshold for parallelism Option 决定,引用《cost threshold for parallelism Option》:

Use the cost threshold for parallelism option to specify the threshold at which Microsoft SQL Server creates and runs parallel plans for queries. SQL Server creates and runs a parallel plan for a query only when the estimated cost to run a serial plan for the same query is higher than the value set in cost threshold for parallelism. The cost refers to an estimated elapsed time in seconds required to run the serial plan on a specific hardware configuration.

Longer queries usually benefit from parallel plans; the performance advantage negates the additional time required to initialize, synchronize, and terminate parallel plans. The cost threshold for parallelism option is actively used when a mix of short and longer queries is run. The short queries run serial plans, whereas the longer queries use parallel plans. The value of cost threshold for parallelism determines which queries are considered short, and they should therefore be run using serial plans.

In certain cases, a parallel plan may be chosen even though the query's cost plan is less than the current cost threshold for parallelism value. This can happen because the decision to use a parallel or serial plan is based on a cost estimate provided before the full optimization is complete.

The cost threshold for parallelism option can be set to any value from 0 through 32767. The default value is 5.

不是所有的query都适合并发执行,并发执行有一定的overhead,如果一个query耗时十分小,SQL Server 创建并发执行结构体的Time Cost都会比执行整个查询的Time Cost高。Cost threshold for parallelism option的默认值是5。

如果SQL Server 以并发方式执行Single Query,那么SQL Server使用多少个Thread来并发执行该Query?这个数值是由 max degree of parallelism Option决定的,引用《max degree of parallelism Option》:

When SQL Server runs on a computer with more than one microprocessor or CPU, it detects the best degree of parallelism, that is, the number of processors employed to run a single statement, for each parallel plan execution. You can use the max degree of parallelism option to limit the number of processors to use in parallel plan execution. To enable the server to determine the maximum degree of parallelism, set this option to 0, the default value. Setting maximum degree of parallelism to 0 allows SQL Server to use all the available processors up to 64 processors. To suppress parallel plan generation, set max degree of parallelism to 1. Set the value to a number greater than 1 to restrict the maximum number of processors used by a single query execution. The maximum value for thedegree of parallelism setting is controlled by the edition of SQL Server, CPU type, and operating system. If a value greater than the number of available processors is specified, the actual number of available processors is used. If the computer has only one processor, the max degree of parallelism value is ignored.

sp_configure ;
GO
reconfigure;
GO
sp_configure ;
GO
reconfigure;
GO

sp_configure ;
GO
RECONFIGURE WITH OVERRIDE;
GO
sp_configure ;
GO
RECONFIGURE WITH OVERRIDE;
GO

在并发执行的Thread中,包括Master Thread,假如MDP=6,那么同时有7个Thread来执行Single Query,其中一个Thread是Master Thread,其他被称作Child Thread,Master Thread负责分配任务和将结果combine在一起,Child Thread负责执行特定的任务。由于Child Thread 被分配的Workload不平均,执行的速度不相同,因此,当一些Child Thread完成任务之后,需要等待未完成的Child Thread。这种在并发执行内部,一些Child Thread需要等待其他Child Thread的Wait type就是:CXPACKET。

CXPACKET indicates that multiple CPUs are working in parallel, dividing up the query in smaller pieces.

推荐阅读:《Wait statistics, or please tell me where it hurts

CXPACKET indicates parallelism, not necessarily that there’s a problem. The coordinator thread in a parallel query always accumulates these waits. If the parallel threads are not given equal amounts of work to do, or one thread blocks, the waiting threads will also accumulate CXPACKET waits, which will make them aggregate a lot faster – this is a problem. One thread may have a lot more to do than the others, and so the whole query is blocked while the long-running thread completes. If this is combined with a high number of PAGEIOLATCH_XX waits, it could be large parallel table scans going on because of incorrect non-clustered indexes, or a bad query plan.

If neither of these are the issue, you might want to try setting MAXDOP to 4, 2, or 1 for the offending queries (or possibly the whole instance). Make sure that if you have a NUMA system that you try setting MAXDOP to the number of cores in a single NUMA node first to see if that helps the problem. You also need to consider the MAXDOP effect on a mixed-load system. To be honest, I’d play with the cost threshold for parallelism setting (bump it up to, say, 25) before reducing the MAXDOP of the whole instance. And don’t forget Resource Governor in Enterprise Edition of  SQL Server 2008 onward that allows DOP governing for a particular group of connections to the server.

引用:Knee-Jerk Wait Statistics : CXPACKET

What does the CXPACKET wait type mean?

The simplest explanation is that CXPACKET means you’ve got queries running in parallel and you will *always* see CXPACKET waits for a parallel query. CXPACKET waits do NOT mean you have problematic parallelism – you need to dig deeper to determine that.

You can see the various threads in a parallel operator that are waiting for a resource using the sys.dm_os_waiting_tasks DMV

select *
from sys.dm_os_waiting_tasks

WaitType:CXPACKET

Script2

SELECT
    [owt].[session_id],
    [owt].[exec_context_id],
    [ot].[scheduler_id],
    [owt].[wait_duration_ms],
    [owt].[wait_type],
    [owt].[blocking_session_id],
    [owt].[resource_description],
    CASE [owt].[wait_type]
        WHEN N'CXPACKET' THEN
            RIGHT ([owt].[resource_description],
                )
        ELSE NULL
    END AS [Node ID],
    --[es].[program_name],
    --[est].text,
    [er].[database_id],
    --[eqp].[query_plan],
    [er].[cpu_time]
FROM sys.dm_os_waiting_tasks [owt]
INNER JOIN sys.dm_os_tasks [ot]
    ON [owt].[waiting_task_address] = [ot].[task_address]
INNER JOIN sys.dm_exec_sessions [es]
    ON [owt].[session_id] = [es].[session_id]
INNER JOIN sys.dm_exec_requests [er]
    ON [es].[session_id] = [er].[session_id]
OUTER APPLY sys.dm_exec_sql_text ([er].[sql_handle]) [est]
--OUTER APPLY sys.dm_exec_query_plan ([er].[plan_handle]) [eqp]

ORDER BY [owt].[session_id],
        [owt].[exec_context_id];
GO

WaitType:CXPACKET

There is always a control thread for any parallel operator, which by historical accident is always thread ID 0. The control thread always registers a CXPACKET wait, with the duration equal to the length of time the operator takes to execute each time it is run by the plan.

The only time non-control threads will register CXPACKET waits is if they complete before the other threads in the operator. This can happen if one of the threads gets stuck waiting for a resource for a long time, so look to see what the wait type is of the thread not showing CXPACKET (using my script above) and troubleshoot appropriately. This can also happen because of a skewed work distribution among the threads, and I’ll go into more depth on that case in my next post here (it’s caused by out-of-date statistics and other cardinality estimation problems).

Unexpected parallelism?

Given that CXPACKET simply means you’ve got parallelism happening, the first thing to look at is whether you expect parallelism for the query that’s using it. My query will give you the query plan node ID where the parallelism is happening (it pulls out the node ID from the XML query plan if the wait type of the thread is CXPACKET) so look for that node ID and determine whether the parallelism makes sense.

One of the common cases of unexpected parallelism is when a table scan happens where you’re expecting a smaller index seek or scan. You’ll either see this in the query plan or you’ll see lots of PAGEIOLATCH_SH waits (discussed in detail here) along with the CXPACKET waits (a classic wait statistics pattern to look out for). There are a variety of causes of unexpected table scans, including:

  • Missing nonclustered index so a table scan is the only alternative
  • Out-of-date statistics so the Query Optimizer thinks a table scan is the best data access method to use
  • An implicit conversion, because of a data type mismatch between a table column and a variable or parameter, which means a nonclustered index cannot be used
  • Arithmetic being performed on a table column instead of a variable or parameter, which means a nonclustered index cannot be used

参考Doc:

Wait statistics, or please tell me where it hurts

What is the CXPACKET Wait Type, and How Do You Reduce It?