使用游标的优缺点(在SQL服务器中)

时间:2021-06-30 08:40:56

I asked a question here Using cursor in OLTP databases (SQL server)

我在这里问了一个问题在OLTP数据库中使用游标(SQL服务器)

where people responded saying cursors should never be used.

人们回应说,不应该使用游标。

I feel cursors are very powerful tools that are meant to be used (I don't think Microsoft supports cursors for bad developers).Suppose you have a table where the value of a column in a row is dependent on the value of the same column in the previous row. If it is a one time back end process, don't you think using a cursor would be an acceptable choice?

我觉得游标是非常强大的工具,意味着要使用(我不认为Microsoft支持错误的开发人员的游标)。假设你有一个表,其中一行中的列的值取决于同一列的值在上一行。如果是一次性后端进程,您认为使用游标是否是可接受的选择?

Off the top of my head I can think of a couple of scenarios where I feel there should be no shame in using cursors. Please let me know if you guys feel otherwise.

在我的脑海中,我可以想到几个场景,我觉得使用游标不应该感到羞耻。如果你们有其他的感觉,请告诉我。

1>A one time back end process to clean bad data which completes execution within a few minutes. 2>Batch processes that run once in a long period of time (something like once a year). If in the above scenarios, there is no visible strain on the other processes, wouldn't it be unreasonable to spend extra hours writing code to avoid cursors? In other words in certain cases the developer's time is more important than the performance of a process that has almost no impact on anything else.

1>一次性后端处理,用于清除在几分钟内完成执行的错误数据。 2>在很长一段时间内运行一次的批处理(类似于一年一次)。如果在上述场景中,其他过程没有明显的压力,那么花费额外的时间编写代码以避免使用游标是不合理的?换句话说,在某些情况下,开发人员的时间比对其他任何事物几乎没有影响的流程的执行更重要。

In my opinion these would be some scenarios where you should seriously try to avoid using a cursor. 1>A stored proc called from a website that can get called very often. 2>A SQL job that would run multiple times a day and consume a lot of resources.

在我看来,这些是你应该认真尝试避免使用游标的一些场景。 1>从网站调用的存储过程可以经常被调用。 2>每天运行多次并消耗大量资源的SQL作业。

I think its very superficial to make a general statement like "cursors should never be used" without analyzing the task at hand and actually weighing it against the alternatives.

我认为在不分析手头的任务并实际权衡其他选择的情况下,做出“游标不应该被使用”这样的一般性陈述是非常肤浅的。

Please let me know of your thoughts.

请告诉我你的想法。

5 个解决方案

#1


11  

There are several scenarios where cursors actually perform better than set-based equivalents. Running totals is the one that always comes to mind - look for Itzik's words on that (and ignore any that involve SQL Server 2012, which adds new windowing functions that give cursors a run for their money in this situation).

在几种情况下,游标实际上比基于集合的等价物更好。总是会想到运行总计 - 查找Itzik的话(并忽略任何涉及SQL Server 2012的内容,它会添加新的窗口函数,在这种情况下为游标提供运行)。

One of the big problems people have with cursors is that they perform slowly, they use temporary storage, etc. This is partially because the default syntax is a global cursor with all kinds of inefficient default options. The next time you're doing something with a cursor that doesn't need to do things like UPDATE...WHERE CURRENT OF (which I've been able to avoid my entire career), give it a fair shake by comparing these two syntax options:

人们对游标的一个重大问题是它们执行缓慢,它们使用临时存储等。这部分是因为默认语法是具有各种低效默认选项的全局游标。下次你用光标做一些事情,不需要做一些事情,比如UPDATE ... WHERE CURRENT OF(我已经能够避免我的整个职业生涯),通过比较这两个来给它一个公平的动摇语法选项:

DECLARE c CURSOR 
    FOR <SELECT QUERY>;

DECLARE c CURSOR 
    LOCAL STATIC READ_ONLY FORWARD_ONLY
    FOR <SELECT QUERY>;

In fact the first version represents a bug in the undocumented stored procedure sp_MSforeachdb which makes it skip databases if the status of any database changes during execution. I subsequently wrote my own version of the stored procedure (see here and here) which both fixed the bug (simply by using the latter version of the syntax above) and added several parameters to control which databases would be chosen.

实际上,第一个版本表示未记录的存储过程sp_MSforeachdb中的错误,如果任何数据库的状态在执行期间发生更改,则会使其跳过数据库。我随后编写了我自己的存储过程版本(参见此处和此处),它们都修复了错误(只需使用上面语法的后一版本)并添加了几个参数来控制选择哪些数据库。

A lot of people think that a methodology is not a cursor because it doesn't say DECLARE CURSOR. I've seen people argue that a while loop is faster than a cursor (which I hope I've dispelled here) or that using FOR XML PATH to perform group concatenation is not performing a hidden cursor operation. Looking at the plan in a lot of cases will show the truth.

很多人认为方法不是游标,因为它没有说DECLARE CURSOR。我见过人们认为while循环比游标更快(我希望我已经消除了这个)或者使用FOR XML PATH来执行组连接并不执行隐藏的游标操作。在很多案例中查看计划将显示真相。

In a lot of cases cursors are used where set-based is more appropriate. But there are plenty of valid use cases where a set-based equivalent is much more complicated to write, for the optimizer to generate a plan for, both, or not possible (e.g. maintenance tasks where you're looping through tables to update statistics, calling a stored procedure for each value in a result, etc.). The same is true for a lot of big multi-table queries where the plan gets too monstrous for the optimizer to handle. In these cases it can be better to dump some of the intermediate results into a temporary structure first. The same goes for some set-based equivalents to cursors (like running totals). I've also written about the other way, where people almost always think instinctively to use a while loop / cursor and there are clever set-based alternatives that are much better.

在很多情况下,使用基于集合的游标更合适。但是有很多有效的用例,其中基于集合的等价物编写要复杂得多,优化器可以为两者或两者生成计划(例如,在循环表中更新统计信息的维护任务,为结果中的每个值调用存储过程,等等。对于许多大型多表查询也是如此,其中计划对于优化器来说太过怪异。在这些情况下,最好先将一些中间结果转储到临时结构中。对于游标的某些基于集合的等价物(如运行总计)也是如此。我还写过另一种方式,人们几乎总是本能地想到使用while循环/光标,并且有更好的基于集合的替代方案。

UPDATE 2013-07-25

更新2013-07-25

Just wanted to add some additional blog posts I've written about cursors, which options you should be using if you do have to use them, and using set-based queries instead of loops to generate sets:

只是想添加一些关于游标的其他博客文章,如果你必须使用它们应该使用哪些选项,并使用基于集合的查询而不是循环来生成集合:

Best Approaches for Running Totals - Updated for SQL Server 2012

运行总计的最佳方法 - 针对SQL Server 2012进行了更新

What impact can different cursor options have?

不同的光标选项有什么影响?

Generate a Set or Sequence Without Loops: [Part 1] [Part 2] [Part 3]

生成没有循环的集合或序列:[第1部分] [第2部分] [第3部分]

#2


6  

The issue with cursors in SQL Server is that the engine is set-based internally, unlike other DBMS's like Oracle which are cursor-based internally. This means that when you create a cursor in SQL Server, temporary storage needs to be created and the set-based resultset needs to be copied over to the temporary cursor storage. You can see why this would be expensive right off the bat, not to mention any row-by-row processing that you might be doing on top of the cursor itself. The bottom line is that set-based processing is more efficient, and often times your cursor-based operation can be done better using a CTE or temp table.

SQL Server中游标的问题在于引擎是基于集内部的,不像其他DBMS,如Oracle,它们是基于游标的内部。这意味着当您在SQL Server中创建游标时,需要创建临时存储,并且需要将基于集合的结果集复制到临时游标存储。您可以看到为什么这样做会非常昂贵,更不用说您可能在光标本身上执行的任何逐行处理。底线是基于集合的处理更有效,并且通常使用CTE或临时表可以更好地完成基于光标的操作。

That being said, there are cases where a cursor is probably acceptable, as you said for one-off operations. The most common use I can think of is in a maintenance plan where you may be iterating through all the databases on a server executing various maintenance tasks. As long as you limit your usage and don't design whole applications around RBAR (row-by-agonizing-row) processing, you should be fine.

话虽如此,有些情况下光标可能是可以接受的,正如你所说的一次性操作。我能想到的最常见的用途是在维护计划中,您可能正在迭代执行各种维护任务的服务器上的所有数据库。只要你限制你的使用并且没有围绕RBAR(逐行激动行)处理设计整个应用程序,你应该没问题。

#3


3  

In general cursors are a bad thing. However in some cases it is more practical to use a cursor and in some it is even faster to use one. A good example is a cursor through a contact table sending emails based on some criteria. (Not to open up the question if sending an email from your DBMS is a good idea - let's just assume it is for the problem at hand.) There is no way to write that set-based. You could use some trickery to come up with a set-based solution to generate dynamic SQL, but a real set-based solution does not exist.

一般来说,游标是一件坏事。但是在某些情况下使用光标更实际,而在某些情况下使用光标更快。一个很好的例子是光标通过联系表发送基于某些标准的电子邮件。 (如果从DBMS发送电子邮件是一个好主意,不要打开问题 - 让我们假设它是针对手头的问题。)没有办法写这个基于集合的。你可以使用一些技巧来提出一个基于集合的解决方案来生成动态SQL,但是不存在真正的基于集合的解决方案。

However, a calculation involving the previous row can be done using a self join. That is usually still faster than a cursor.

但是,可以使用自联接完成涉及前一行的计算。这通常仍然比光标快。

In all cases you need to balance the effort involved in developing a faster solution. If nobody cares, if you process runs in 1 minute or in one hour, use what gets the job done quickest. If you are looping through a dataset that grows over time like an [orders] table, try to stay away from a cursor if possible. If you are not sure, do a performance test comparing a cursor base with a set-based solution on several significantly different data sizes.

在所有情况下,您需要平衡开发更快解决方案所需的工作量。如果没有人关心,如果您在1分钟或1小时内处理运行,请使用最快速完成工作的内容。如果您循环遍历随着时间增长的数据集(如[orders]表),请尝试尽可能远离游标。如果您不确定,请执行性能测试,将游标基础与基于集合的解决方案在几种截然不同的数据大小上进行比较。

#4


0  

They are necessary for things like dynamic SQL pivoting, but you should try and avoid using them whenever possible.

它们对于动态SQL旋转这样的东西是必需的,但是你应该尽可能避免使用它们。

#5


0  

I had always disliked cursors because of their slow performance. However, I found I didn't fully understand the different types of cursors and that in certain instances, cursors are a viable solution.

由于性能缓慢,我一直不喜欢游标。但是,我发现我并没有完全理解不同类型的游标,在某些情况下,游标是一个可行的解决方案。

When you have a business problem that can only be solved by processing one row at a time, then a cursor is appropriate.

当您遇到只能通过一次处理一行来解决的业务问题时,则光标是合适的。

So to improve performance with the cursor, change the type of cursor you are using. Something I didn't know was, if you don't specify which type of cursor you are declaring, you get the Dynamic Optimistic type by default, which is the one that is the slowest for performance because it's doing lots of work under the hood. However, by declaring your cursor as a different type, say a static cursor, it has very good performance.

因此,要使用光标提高性能,请更改正在使用的光标类型。我不知道的是,如果你没有指定你声明的光标类型,默认情况下会得到动态乐观类型,这是性能最慢的类型,因为它在引擎盖下做了很多工作。但是,通过将游标声明为不同的类型,例如静态游标,它具有非常好的性能。

See these articles for a fuller explanation:

有关更全面的解释,请参阅这些文章:

The Truth About Cursors: Part I

关于游标的真相:第一部分

The Truth About Cursors: Part II

关于游标的真相:第二部分

The Truth About Cursors: Part III

关于游标的真相:第三部分

I think the biggest con against cursors is performance, however, not laying out a task in a set based approach would probably rank second. Third would be readability and layout of the tasks as they usually don't have a lot of helpful comments.

我认为对游标最大的反对意见是性能,然而,在基于集合的方法中没有布置任务可能排在第二位。第三是任务的可读性和布局,因为它们通常没有很多有用的评论。

SQL Server is optimized to run the set based approach. You write the query to return a result set of data, like a join on tables for example, but the SQL Server execution engine determines which join to use: Merge Join, Nested Loop Join, or Hash Join. SQL Server determines the best possible joining algorithm based upon the participating columns, data volume, indexing structure, and the set of values in the participating columns. So using a set based approach is generally the best approach in performance over the procedural cursor approach.

SQL Server已经过优化,可以运行基于集合的方法。您编写查询以返回结果数据集,例如表上的连接,但SQL Server执行引擎确定要使用的连接:合并连接,嵌套循环连接或散列连接。 SQL Server根据参与列中的参与列,数据量,索引结构和值集确定最佳连接算法。因此,使用基于集合的方法通常是性能优于程序游标方法的最佳方法。

#1


11  

There are several scenarios where cursors actually perform better than set-based equivalents. Running totals is the one that always comes to mind - look for Itzik's words on that (and ignore any that involve SQL Server 2012, which adds new windowing functions that give cursors a run for their money in this situation).

在几种情况下,游标实际上比基于集合的等价物更好。总是会想到运行总计 - 查找Itzik的话(并忽略任何涉及SQL Server 2012的内容,它会添加新的窗口函数,在这种情况下为游标提供运行)。

One of the big problems people have with cursors is that they perform slowly, they use temporary storage, etc. This is partially because the default syntax is a global cursor with all kinds of inefficient default options. The next time you're doing something with a cursor that doesn't need to do things like UPDATE...WHERE CURRENT OF (which I've been able to avoid my entire career), give it a fair shake by comparing these two syntax options:

人们对游标的一个重大问题是它们执行缓慢,它们使用临时存储等。这部分是因为默认语法是具有各种低效默认选项的全局游标。下次你用光标做一些事情,不需要做一些事情,比如UPDATE ... WHERE CURRENT OF(我已经能够避免我的整个职业生涯),通过比较这两个来给它一个公平的动摇语法选项:

DECLARE c CURSOR 
    FOR <SELECT QUERY>;

DECLARE c CURSOR 
    LOCAL STATIC READ_ONLY FORWARD_ONLY
    FOR <SELECT QUERY>;

In fact the first version represents a bug in the undocumented stored procedure sp_MSforeachdb which makes it skip databases if the status of any database changes during execution. I subsequently wrote my own version of the stored procedure (see here and here) which both fixed the bug (simply by using the latter version of the syntax above) and added several parameters to control which databases would be chosen.

实际上,第一个版本表示未记录的存储过程sp_MSforeachdb中的错误,如果任何数据库的状态在执行期间发生更改,则会使其跳过数据库。我随后编写了我自己的存储过程版本(参见此处和此处),它们都修复了错误(只需使用上面语法的后一版本)并添加了几个参数来控制选择哪些数据库。

A lot of people think that a methodology is not a cursor because it doesn't say DECLARE CURSOR. I've seen people argue that a while loop is faster than a cursor (which I hope I've dispelled here) or that using FOR XML PATH to perform group concatenation is not performing a hidden cursor operation. Looking at the plan in a lot of cases will show the truth.

很多人认为方法不是游标,因为它没有说DECLARE CURSOR。我见过人们认为while循环比游标更快(我希望我已经消除了这个)或者使用FOR XML PATH来执行组连接并不执行隐藏的游标操作。在很多案例中查看计划将显示真相。

In a lot of cases cursors are used where set-based is more appropriate. But there are plenty of valid use cases where a set-based equivalent is much more complicated to write, for the optimizer to generate a plan for, both, or not possible (e.g. maintenance tasks where you're looping through tables to update statistics, calling a stored procedure for each value in a result, etc.). The same is true for a lot of big multi-table queries where the plan gets too monstrous for the optimizer to handle. In these cases it can be better to dump some of the intermediate results into a temporary structure first. The same goes for some set-based equivalents to cursors (like running totals). I've also written about the other way, where people almost always think instinctively to use a while loop / cursor and there are clever set-based alternatives that are much better.

在很多情况下,使用基于集合的游标更合适。但是有很多有效的用例,其中基于集合的等价物编写要复杂得多,优化器可以为两者或两者生成计划(例如,在循环表中更新统计信息的维护任务,为结果中的每个值调用存储过程,等等。对于许多大型多表查询也是如此,其中计划对于优化器来说太过怪异。在这些情况下,最好先将一些中间结果转储到临时结构中。对于游标的某些基于集合的等价物(如运行总计)也是如此。我还写过另一种方式,人们几乎总是本能地想到使用while循环/光标,并且有更好的基于集合的替代方案。

UPDATE 2013-07-25

更新2013-07-25

Just wanted to add some additional blog posts I've written about cursors, which options you should be using if you do have to use them, and using set-based queries instead of loops to generate sets:

只是想添加一些关于游标的其他博客文章,如果你必须使用它们应该使用哪些选项,并使用基于集合的查询而不是循环来生成集合:

Best Approaches for Running Totals - Updated for SQL Server 2012

运行总计的最佳方法 - 针对SQL Server 2012进行了更新

What impact can different cursor options have?

不同的光标选项有什么影响?

Generate a Set or Sequence Without Loops: [Part 1] [Part 2] [Part 3]

生成没有循环的集合或序列:[第1部分] [第2部分] [第3部分]

#2


6  

The issue with cursors in SQL Server is that the engine is set-based internally, unlike other DBMS's like Oracle which are cursor-based internally. This means that when you create a cursor in SQL Server, temporary storage needs to be created and the set-based resultset needs to be copied over to the temporary cursor storage. You can see why this would be expensive right off the bat, not to mention any row-by-row processing that you might be doing on top of the cursor itself. The bottom line is that set-based processing is more efficient, and often times your cursor-based operation can be done better using a CTE or temp table.

SQL Server中游标的问题在于引擎是基于集内部的,不像其他DBMS,如Oracle,它们是基于游标的内部。这意味着当您在SQL Server中创建游标时,需要创建临时存储,并且需要将基于集合的结果集复制到临时游标存储。您可以看到为什么这样做会非常昂贵,更不用说您可能在光标本身上执行的任何逐行处理。底线是基于集合的处理更有效,并且通常使用CTE或临时表可以更好地完成基于光标的操作。

That being said, there are cases where a cursor is probably acceptable, as you said for one-off operations. The most common use I can think of is in a maintenance plan where you may be iterating through all the databases on a server executing various maintenance tasks. As long as you limit your usage and don't design whole applications around RBAR (row-by-agonizing-row) processing, you should be fine.

话虽如此,有些情况下光标可能是可以接受的,正如你所说的一次性操作。我能想到的最常见的用途是在维护计划中,您可能正在迭代执行各种维护任务的服务器上的所有数据库。只要你限制你的使用并且没有围绕RBAR(逐行激动行)处理设计整个应用程序,你应该没问题。

#3


3  

In general cursors are a bad thing. However in some cases it is more practical to use a cursor and in some it is even faster to use one. A good example is a cursor through a contact table sending emails based on some criteria. (Not to open up the question if sending an email from your DBMS is a good idea - let's just assume it is for the problem at hand.) There is no way to write that set-based. You could use some trickery to come up with a set-based solution to generate dynamic SQL, but a real set-based solution does not exist.

一般来说,游标是一件坏事。但是在某些情况下使用光标更实际,而在某些情况下使用光标更快。一个很好的例子是光标通过联系表发送基于某些标准的电子邮件。 (如果从DBMS发送电子邮件是一个好主意,不要打开问题 - 让我们假设它是针对手头的问题。)没有办法写这个基于集合的。你可以使用一些技巧来提出一个基于集合的解决方案来生成动态SQL,但是不存在真正的基于集合的解决方案。

However, a calculation involving the previous row can be done using a self join. That is usually still faster than a cursor.

但是,可以使用自联接完成涉及前一行的计算。这通常仍然比光标快。

In all cases you need to balance the effort involved in developing a faster solution. If nobody cares, if you process runs in 1 minute or in one hour, use what gets the job done quickest. If you are looping through a dataset that grows over time like an [orders] table, try to stay away from a cursor if possible. If you are not sure, do a performance test comparing a cursor base with a set-based solution on several significantly different data sizes.

在所有情况下,您需要平衡开发更快解决方案所需的工作量。如果没有人关心,如果您在1分钟或1小时内处理运行,请使用最快速完成工作的内容。如果您循环遍历随着时间增长的数据集(如[orders]表),请尝试尽可能远离游标。如果您不确定,请执行性能测试,将游标基础与基于集合的解决方案在几种截然不同的数据大小上进行比较。

#4


0  

They are necessary for things like dynamic SQL pivoting, but you should try and avoid using them whenever possible.

它们对于动态SQL旋转这样的东西是必需的,但是你应该尽可能避免使用它们。

#5


0  

I had always disliked cursors because of their slow performance. However, I found I didn't fully understand the different types of cursors and that in certain instances, cursors are a viable solution.

由于性能缓慢,我一直不喜欢游标。但是,我发现我并没有完全理解不同类型的游标,在某些情况下,游标是一个可行的解决方案。

When you have a business problem that can only be solved by processing one row at a time, then a cursor is appropriate.

当您遇到只能通过一次处理一行来解决的业务问题时,则光标是合适的。

So to improve performance with the cursor, change the type of cursor you are using. Something I didn't know was, if you don't specify which type of cursor you are declaring, you get the Dynamic Optimistic type by default, which is the one that is the slowest for performance because it's doing lots of work under the hood. However, by declaring your cursor as a different type, say a static cursor, it has very good performance.

因此,要使用光标提高性能,请更改正在使用的光标类型。我不知道的是,如果你没有指定你声明的光标类型,默认情况下会得到动态乐观类型,这是性能最慢的类型,因为它在引擎盖下做了很多工作。但是,通过将游标声明为不同的类型,例如静态游标,它具有非常好的性能。

See these articles for a fuller explanation:

有关更全面的解释,请参阅这些文章:

The Truth About Cursors: Part I

关于游标的真相:第一部分

The Truth About Cursors: Part II

关于游标的真相:第二部分

The Truth About Cursors: Part III

关于游标的真相:第三部分

I think the biggest con against cursors is performance, however, not laying out a task in a set based approach would probably rank second. Third would be readability and layout of the tasks as they usually don't have a lot of helpful comments.

我认为对游标最大的反对意见是性能,然而,在基于集合的方法中没有布置任务可能排在第二位。第三是任务的可读性和布局,因为它们通常没有很多有用的评论。

SQL Server is optimized to run the set based approach. You write the query to return a result set of data, like a join on tables for example, but the SQL Server execution engine determines which join to use: Merge Join, Nested Loop Join, or Hash Join. SQL Server determines the best possible joining algorithm based upon the participating columns, data volume, indexing structure, and the set of values in the participating columns. So using a set based approach is generally the best approach in performance over the procedural cursor approach.

SQL Server已经过优化,可以运行基于集合的方法。您编写查询以返回结果数据集,例如表上的连接,但SQL Server执行引擎确定要使用的连接:合并连接,嵌套循环连接或散列连接。 SQL Server根据参与列中的参与列,数据量,索引结构和值集确定最佳连接算法。因此,使用基于集合的方法通常是性能优于程序游标方法的最佳方法。