Please consider the following 2 statements in Sql Server:
请考虑Sql Server中的以下2个语句:
This one is using Nested sub-queries:
这个使用嵌套子查询:
WITH cte AS
(
SELECT TOP 100 PERCENT *
FROM Segments
ORDER BY InvoiceDetailID, SegmentID
)
SELECT *, ReturnDate =
(SELECT TOP 1 cte.DepartureInfo
FROM cte
WHERE seg.InvoiceDetailID = cte.InvoiceDetailID
AND cte.SegmentID > seg.SegmentID),
DepartureCityCode =
(SELECT TOP 1 cte.DepartureCityCode
FROM cte
WHERE seg.InvoiceDetailID = cte.InvoiceDetailID
AND cte.SegmentID > seg.SegmentID)
FROM Segments seg
And this uses an OUTER APPLY operator:
这使用OUTER APPLY运算符:
WITH cte AS
(
SELECT TOP 100 PERCENT *
FROM Segments
ORDER BY InvoiceDetailID, SegmentID
)
SELECT seg.*, t.DepartureInfo AS ReturnDate, t.DepartureCityCode
FROM Segments seg OUTER APPLY (
SELECT TOP 1 cte.DepartureInfo, cte.DepartureCityCode
FROM cte
WHERE seg.InvoiceDetailID = cte.InvoiceDetailID
AND cte.SegmentID > seg.SegmentID
) t
Which of these 2 would potentially perform better considering that both Segments table can potentially have millions of rows?
考虑到两个Segments表可能有数百万行,这两个中哪一个可能表现得更好?
My intuition is OUTER APPLY would perform better.
我的直觉是外部应用会表现得更好。
A couple of more questions:
还有几个问题:
- Almost I am quite sure about this, but still wanted to confirm that in the first solution, the CTE would effectively be executed twice (because its referenced twice and CTE is expanded inline like a Macro).
- Would CTE be executed once for each row when used in the OUTER APPLY operator? Also would it be executed for each row when used in nested query in first statement??
几乎我对此非常肯定,但仍然想确认在第一个解决方案中,CTE实际上会被执行两次(因为它引用了两次而CTE像宏一样内联扩展)。
当在OUTER APPLY运算符中使用CTE时,每行会执行一次吗?当在第一个语句中的嵌套查询中使用时,它也会为每一行执行吗?
2 个解决方案
#1
4
First, get rid of the Top 100 Percent
in the CTE. You are not using TOP here and if you wanted the results sorted, you should add an Order By to the end of the entire statement. Second, to address your question about performance, and if forced to make a guess, my bet would be on the second form only because it has a single subquery instead of two. Third, another form which you might try would be:
首先,摆脱CTE中的前100%。你没有在这里使用TOP,如果你想对结果进行排序,你应该在整个语句的末尾添加一个Order By。第二,为了解决你关于表现的问题,如果*做出猜测,我的赌注只会在第二种形式,因为它有一个子查询而不是两个。第三,你可能尝试的另一种形式是:
With RankedSegments As
(
Select S1.SegmentId, ...
, Row_Number() Over( Partition By S1.SegmentId Order By S2.SegmentId ) As Num
From Segments As S1
Left Join Segments As S2
On S2.InvoiceDetailId = S1.InvoiceDetailId
And S2.SegmentId > S1.SegmentID
)
Select ...
From RankedSegments
Where Num = 1
Another possibility
With MinSegments As
(
Select S1.SegmentId, Min(S2.SegmentId) As MinSegmentId
From Segments As S1
Join Segments As S2
On S2.InvoiceDetailId = S1.InvoiceDetailId
And S2.SegmentId > S1.SegmentID
Group By S1.SegmentId
)
Select ...
From Segments As S1
Left Join (MinSegments As MS1
Join Segments As S2
On S2.SegmentId = MS1.MinSegmentId)
On MS1.SegmentId = S1.SegmentId
#2
1
Maybe I will use this variation of Thomas' query:
也许我会使用托马斯查询的这种变体:
WITH cte AS
(
SELECT *, Row_Number() Over( Partition By SegmentId Order By InvoiceDetailID, SegmentId ) As Num
FROM Segments)
SELECT seg.*, t.DepartureInfo AS ReturnDate, t.DepartureCityCode
FROM Segments seg LEFT JOIN cte t ON seg.InvoiceDetailID = t.InvoiceDetailID AND t.SegmentID > seg.SegmentID AND t.Num = 1
#1
4
First, get rid of the Top 100 Percent
in the CTE. You are not using TOP here and if you wanted the results sorted, you should add an Order By to the end of the entire statement. Second, to address your question about performance, and if forced to make a guess, my bet would be on the second form only because it has a single subquery instead of two. Third, another form which you might try would be:
首先,摆脱CTE中的前100%。你没有在这里使用TOP,如果你想对结果进行排序,你应该在整个语句的末尾添加一个Order By。第二,为了解决你关于表现的问题,如果*做出猜测,我的赌注只会在第二种形式,因为它有一个子查询而不是两个。第三,你可能尝试的另一种形式是:
With RankedSegments As
(
Select S1.SegmentId, ...
, Row_Number() Over( Partition By S1.SegmentId Order By S2.SegmentId ) As Num
From Segments As S1
Left Join Segments As S2
On S2.InvoiceDetailId = S1.InvoiceDetailId
And S2.SegmentId > S1.SegmentID
)
Select ...
From RankedSegments
Where Num = 1
Another possibility
With MinSegments As
(
Select S1.SegmentId, Min(S2.SegmentId) As MinSegmentId
From Segments As S1
Join Segments As S2
On S2.InvoiceDetailId = S1.InvoiceDetailId
And S2.SegmentId > S1.SegmentID
Group By S1.SegmentId
)
Select ...
From Segments As S1
Left Join (MinSegments As MS1
Join Segments As S2
On S2.SegmentId = MS1.MinSegmentId)
On MS1.SegmentId = S1.SegmentId
#2
1
Maybe I will use this variation of Thomas' query:
也许我会使用托马斯查询的这种变体:
WITH cte AS
(
SELECT *, Row_Number() Over( Partition By SegmentId Order By InvoiceDetailID, SegmentId ) As Num
FROM Segments)
SELECT seg.*, t.DepartureInfo AS ReturnDate, t.DepartureCityCode
FROM Segments seg LEFT JOIN cte t ON seg.InvoiceDetailID = t.InvoiceDetailID AND t.SegmentID > seg.SegmentID AND t.Num = 1