加速SQL查询数十亿行

时间:2021-08-05 02:45:01

I am running a stored procedure 17 days ago and this still does not finish. The query still not complete is not optimal , but neither how could accelerate as I need to analyze the different combinations of all rows together. I'm using SQL Server 2012.

我在17天前运行存储过程,但仍未完成。查询仍未完成并非最佳,但由于我需要分析所有行的不同组合,因此无法加速。我正在使用SQL Server 2012。

This is the code of the stored procedure

这是存储过程的代码


USE [DB]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[Calc] 
AS
BEGIN
DECLARE @statA int,@statB int,@statC int,@statD int,@statF int;

DECLARE @statA_Element1 int,@statB_Element1 int,@statC_Element1 int,@statD_Element1 int,@statF_Element1 int,@descriptionElement1 varchar(50);
DECLARE @statA_Element2 int,@statB_Element2 int,@statC_Element2 int,@statD_Element2 int,@statF_Element2 int,@descriptionElement2 varchar(50);
DECLARE @statA_Element3 int,@statB_Element3 int,@statC_Element3 int,@statD_Element3 int,@statF_Element3 int,@descriptionElement3 varchar(50);
DECLARE @statA_Element4 int,@statB_Element4 int,@statC_Element4 int,@statD_Element4 int,@statF_Element4 int,@descriptionElement4 varchar(50);
DECLARE @statA_Element5 int,@statB_Element5 int,@statC_Element5 int,@statD_Element5 int,@statF_Element5 int,@descriptionElement5 varchar(50);
DECLARE @statA_Element6 int,@statB_Element6 int,@statC_Element6 int,@statD_Element6 int,@statF_Element6 int,@descriptionElement6 varchar(50);

DECLARE @statA_Element7 int,@statB_Element7 int,@statC_Element7 int,@statD_Element7 int,@statF_Element7 int,@descriptionElement7 varchar(50);
DECLARE @statA_Element8 int,@statB_Element8 int,@statC_Element8 int,@statD_Element8 int,@statF_Element8 int,@descriptionElement8 varchar(50);
DECLARE @statA_Element9 int,@statB_Element9 int,@statC_Element9 int,@statD_Element9 int,@statF_Element9 int,@descriptionElement9 varchar(50);
DECLARE @statA_Element10 int,@statB_Element10 int,@statC_Element10 int,@statD_Element10 int,@statF_Element10 int,@descriptionElement10 varchar(50);
DECLARE @statA_Element11 int,@statB_Element11 int,@statC_Element11 int,@statD_Element11 int,@statF_Element11 int,@descriptionElement11 varchar(50);
DECLARE @statA_Element12 int,@statB_Element12 int,@statC_Element12 int,@statD_Element12 int,@statF_Element12 int,@descriptionElement12 varchar(50);

DECLARE element_cursor CURSOR FOR 
select  e1.statA,e1.statB,e1.statC,e1.statD,e1.statF,e1.Description,
        e2.statA,e2.statB,e2.statC,e2.statD,e2.statF,e2.Description,
        e3.statA,e3.statB,e3.statC,e3.statD,e3.statF,e3.Description,
        e4.statA,e4.statB,e4.statC,e4.statD,e4.statF,e4.Description,
        e5.statA,e5.statB,e5.statC,e5.statD,e5.statF,e5.Description,
        e6.statA,e6.statB,e6.statC,e6.statD,e6.statF,e6.Description,
        e7.statA,e7.statB,e7.statC,e7.statD,e7.statF,e7.Description,
        e8.statA,e8.statB,e8.statC,e8.statD,e8.statF,e8.Description,
        e9.statA,e9.statB,e9.statC,e9.statD,e9.statF,e9.Description,
        e10.statA,e10.statB,e10.statC,e10.statD,e10.statF,e10.Description,
        e11.statA,e11.statB,e11.statC,e11.statD,e11.statF,e11.Description,
        e12.statA,e12.statB,e12.statC,e12.statD,e12.statF,e12.Description,
                from Element1 e1
              ,Element2 e2
              ,Element3 e3
              ,Element4 e4
              ,Element5 e5
              ,Element6 e6
              ,Element7 e7
              ,Element8 e8
              ,Element9 e9
              ,Element10 e10
              ,Element11 e11
              ,Element12 e12;
truncate table resultado;
OPEN element_cursor

FETCH NEXT FROM element_cursor 
INTO @statA_Element1,@statB_Element1,@statC_Element1,@statD_Element1,@statF_Element1,@descriptionElement1,
     @statA_Element2,@statB_Element2,@statC_Element2,@statD_Element2,@statF_Element2,@descriptionElement2,
     @statA_Element3,@statB_Element3,@statC_Element3,@statD_Element3,@statF_Element3,@descriptionElement3,
     @statA_Element4,@statB_Element4,@statC_Element4,@statD_Element4,@statF_Element4,@descriptionElement4,
     @statA_Element5,@statB_Element5,@statC_Element5,@statD_Element5,@statF_Element5,@descriptionElement5,
     @statA_Element6,@statB_Element6,@statC_Element6,@statD_Element6,@statF_Element6,@descriptionElement6,
     @statA_Element7 ,@statB_Element7 ,@statC_Element7 ,@statD_Element7 ,@statF_Element7 ,@descriptionElement7, 
     @statA_Element8 ,@statB_Element8 ,@statC_Element8 ,@statD_Element8 ,@statF_Element8 ,@descriptionElement8, 
     @statA_Element9 ,@statB_Element9 ,@statC_Element9 ,@statD_Element9 ,@statF_Element9 ,@descriptionElement9, 
     @statA_Element10 ,@statB_Element10 ,@statC_Element10 ,@statD_Element10 ,@statF_Element10 ,@descriptionElement10, 
     @statA_Element11 ,@statB_Element11 ,@statC_Element11 ,@statD_Element11 ,@statF_Element11 ,@descriptionElement11, 
     @statA_Element12 ,@statB_Element12 ,@statC_Element12 ,@statD_Element12 ,@statF_Element12 ,@descriptionElement12 

WHILE @@FETCH_STATUS = 0
BEGIN

    set @statA= @statA_Element1+ @statA_Element2+ @statA_Element3+ @statA_Element4+ @statA_Element5+ @statA_Element6+@statA_Element7+@statA_Element11+@statA_Element8+@statA_Element9+@statA_Element10+@statA_Element12;
    set @statB= @statB_Element1+ @statB_Element2+ @statB_Element3+ @statB_Element4+ @statB_Element5+ @statB_Element6+@statB_Element7+@statB_Element8+@statB_Element9+@statB_Element10+@statB_Element11+@statB_Element12;
    set @statC= @statC_Element1+ @statC_Element2+ @statC_Element3+ @statC_Element4+ @statC_Element5+ @statC_Element6+@statC_Element7+@statC_Element8+@statC_Element9+@statC_Element10+@statC_Element11+@statC_Element12;
    set @statD= @statD_Element1+ @statD_Element2+ @statD_Element3+ @statD_Element4+ @statD_Element5+ @statD_Element6+@statD_Element7+@statD_Element8+@statD_Element9+@statD_Element10+@statD_Element11+@statD_Element12;
    set @statF = @statF_Element1+ @statF_Element2+ @statF_Element3+ @statF_Element4+ @statF_Element5+ @statF_Element6+@statF_Element7+@statF_Element8+@statF_Element9+@statF_Element10+@statF_Element11+@statF_Element12;

    if(@statC>=2000)
    begin
        insert into res values(@statA, @statB, @statC, @statD, @statF, @descriptionElement2, @descriptionElement3, @descriptionElement1, @descriptionElement5, @descriptionElement6, @descriptionElement4,@descriptionElement7,@descriptionElement8,@descriptionElement9,@descriptionElement10,@descriptionElement11,@descriptionElement12  );
    end
        -- Get the next vendor.
    FETCH NEXT FROM element_cursor 
INTO @statA_Element1,@statB_Element1,@statC_Element1,@statD_Element1,@statF_Element1,@descriptionElement1,
     @statA_Element2,@statB_Element2,@statC_Element2,@statD_Element2,@statF_Element2,@descriptionElement2,
     @statA_Element3,@statB_Element3,@statC_Element3,@statD_Element3,@statF_Element3,@descriptionElement3,
     @statA_Element4,@statB_Element4,@statC_Element4,@statD_Element4,@statF_Element4,@descriptionElement4,
     @statA_Element5,@statB_Element5,@statC_Element5,@statD_Element5,@statF_Element5,@descriptionElement5,
     @statA_Element6,@statB_Element6,@statC_Element6,@statD_Element6,@statF_Element6,@descriptionElement6,
     @statA_Element7 ,@statB_Element7 ,@statC_Element7 ,@statD_Element7 ,@statF_Element7 ,@descriptionElement7, 
     @statA_Element8 ,@statB_Element8 ,@statC_Element8 ,@statD_Element8 ,@statF_Element8 ,@descriptionElement8, 
     @statA_Element9 ,@statB_Element9 ,@statC_Element9 ,@statD_Element9 ,@statF_Element9 ,@descriptionElement9, 
     @statA_Element10 ,@statB_Element10 ,@statC_Element10 ,@statD_Element10 ,@statF_Element10 ,@descriptionElement10, 
     @statA_Element11 ,@statB_Element11 ,@statC_Element11 ,@statD_Element11 ,@statF_Element11 ,@descriptionElement11, 
     @statA_Element12 ,@statB_Element12 ,@statC_Element12 ,@statD_Element12 ,@statF_Element12 ,@descriptionElement12
END 
CLOSE element_cursor;
DEALLOCATE element_cursor;
END

Almost every table has 9-11 rows. What can I do to improve this query? Many thanks!!!

几乎每张桌子都有9-11行。我该怎么做才能改进这个查询?非常感谢!!!

3 个解决方案

#1


1  

It is unclear why you need to do a cartesian product. But, why are you using a cursor? That will slow things down immensely. Just do something like:

目前还不清楚为什么你需要做笛卡尔积。但是,你为什么要使用游标?这将大大减缓事情的发展。做一些像:

select *
into results
from (select (e1.StatA + e1.StatA + . . . ) as StatA,
             (e1.StatB + e1.StatB + . . . ) as StatB,
             (e1.StatC + e1.StatC + . . . ) as StatC,
             (e1.StatD + e1.StatD + . . . ) as StatD,
             (e1.StatF + e1.StatF + . . . ) as StatE,
             e1.description as description1, e2.description as description2, . . .
      from Element1 e1 cross join
           Element2 e2 cross join
           Element3 e3 cross join
           Element4 e4 cross join
           Element5 e5 cross join
           Element6 e6 cross join
           Element7 e7 cross join
           Element8 e8 cross join
           Element9 e9 cross join
           Element10 e10 cross join
           Element11 e11 cross join
           Element12 e12
      ) t
where StatC > 2000;

This should be much faster than the cursor version, but I don't know if the performance will be that good. Try it out on a smaller data set and see if it helps. In general, you want to avoid cursors when you can use set-based operations instead.

这应该比光标版本快得多,但我不知道性能是否会那么好。在较小的数据集上试一试,看看它是否有帮助。通常,您希望在可以使用基于集合的操作时避免使用游标。

#2


0  

For starters, don't do this within SQL server. It's not designed to handle such queries.

对于初学者,请不要在SQL Server中执行此操作。它不是为处理此类查询而设计的。

If you truly have billions of rows, you might consider breaking it into smaller chunks, or simply sampling the data. Consider what information you think you can pull out of the full data set that you couldn't get by sampling.

如果您确实拥有数十亿行,则可以考虑将其分成较小的块,或者只是对数据进行采样。考虑一下您认为可以通过抽样获得的完整数据集中的哪些信息。

#3


0  

Your are asking the wrong question, so any answer we will give is also going to be wrong.

你问的是错误的问题,所以我们给出的答案也是错的。

You are asking how to more effectively use the tool you have (SQL Server) to solve an unspecified problem.

您正在问如何更有效地使用您拥有的工具(SQL Server)来解决未指定的问题。

The "job to be done" may be more easily solved with some other tool or a completely different design.

使用其他工具或完全不同的设计可以更容易地解决“要完成的工作”。

If you really-really need to use a Cartesian product of this volume, you should look at using Lazy Evaluation or the call-by-need design pattern. How you do this depends on the client language involved.

如果您确实需要使用此卷的笛卡尔积,则应该使用Lazy Evaluation或按需调用设计模式。如何执行此操作取决于所涉及的客户端语言。

You MAY be able to create a T-SQL lazy-evaluation with a procedure and a few sequences, but that is just a hunch.

您可以使用过程和一些序列创建T-SQL延迟评估,但这只是一种预感。

#1


1  

It is unclear why you need to do a cartesian product. But, why are you using a cursor? That will slow things down immensely. Just do something like:

目前还不清楚为什么你需要做笛卡尔积。但是,你为什么要使用游标?这将大大减缓事情的发展。做一些像:

select *
into results
from (select (e1.StatA + e1.StatA + . . . ) as StatA,
             (e1.StatB + e1.StatB + . . . ) as StatB,
             (e1.StatC + e1.StatC + . . . ) as StatC,
             (e1.StatD + e1.StatD + . . . ) as StatD,
             (e1.StatF + e1.StatF + . . . ) as StatE,
             e1.description as description1, e2.description as description2, . . .
      from Element1 e1 cross join
           Element2 e2 cross join
           Element3 e3 cross join
           Element4 e4 cross join
           Element5 e5 cross join
           Element6 e6 cross join
           Element7 e7 cross join
           Element8 e8 cross join
           Element9 e9 cross join
           Element10 e10 cross join
           Element11 e11 cross join
           Element12 e12
      ) t
where StatC > 2000;

This should be much faster than the cursor version, but I don't know if the performance will be that good. Try it out on a smaller data set and see if it helps. In general, you want to avoid cursors when you can use set-based operations instead.

这应该比光标版本快得多,但我不知道性能是否会那么好。在较小的数据集上试一试,看看它是否有帮助。通常,您希望在可以使用基于集合的操作时避免使用游标。

#2


0  

For starters, don't do this within SQL server. It's not designed to handle such queries.

对于初学者,请不要在SQL Server中执行此操作。它不是为处理此类查询而设计的。

If you truly have billions of rows, you might consider breaking it into smaller chunks, or simply sampling the data. Consider what information you think you can pull out of the full data set that you couldn't get by sampling.

如果您确实拥有数十亿行,则可以考虑将其分成较小的块,或者只是对数据进行采样。考虑一下您认为可以通过抽样获得的完整数据集中的哪些信息。

#3


0  

Your are asking the wrong question, so any answer we will give is also going to be wrong.

你问的是错误的问题,所以我们给出的答案也是错的。

You are asking how to more effectively use the tool you have (SQL Server) to solve an unspecified problem.

您正在问如何更有效地使用您拥有的工具(SQL Server)来解决未指定的问题。

The "job to be done" may be more easily solved with some other tool or a completely different design.

使用其他工具或完全不同的设计可以更容易地解决“要完成的工作”。

If you really-really need to use a Cartesian product of this volume, you should look at using Lazy Evaluation or the call-by-need design pattern. How you do this depends on the client language involved.

如果您确实需要使用此卷的笛卡尔积,则应该使用Lazy Evaluation或按需调用设计模式。如何执行此操作取决于所涉及的客户端语言。

You MAY be able to create a T-SQL lazy-evaluation with a procedure and a few sequences, but that is just a hunch.

您可以使用过程和一些序列创建T-SQL延迟评估,但这只是一种预感。