为什么where子句中没有窗口函数?

时间:2021-11-13 22:59:02

Title says it all, why can't I use a windowed function in a where clause in SQL Server?

标题说明了一切,为什么我不能在SQL Server的where子句中使用窗口函数呢?

This query makes perfect sense:

这个查询非常有意义:

select id, sales_person_id, product_type, product_id, sale_amount
from Sales_Log
where 1 = row_number() over(partition by sales_person_id, product_type, product_id order by sale_amount desc)

But it doesn't work. Is there a better way than a CTE/Subquery?

但它不工作。有比CTE/Subquery更好的方法吗?

EDIT

编辑

For what its worth this is the query with a CTE:

它的价值在于使用CTE的查询:

with Best_Sales as (
    select id, sales_person_id, product_type, product_id, sale_amount, row_number() over (partition by sales_person_id, product_type, product_id order by sales_amount desc) rank
    from Sales_log
)
select id, sales_person_id, product_type, product_id, sale_amount
from Best_Sales
where rank = 1

EDIT

编辑

+1 for the answers showing with a subquery, but really I'm looking for the reasoning behind not being able to use windowing functions in where clauses.

+1显示子查询的答案,但实际上我在寻找不能在where子句中使用窗口功能的原因。

7 个解决方案

#1


43  

really I'm looking for the reasoning behind not being able to use windowing functions in where clauses.

实际上,我在寻找不能在where子句中使用窗口功能的原因。

The reason that they are not allowed in the WHERE clause is that it would create ambiguity. Stealing Itzik Ben Gan's example from High-Performance T-SQL Using Window Functions (p.25)

在WHERE子句中不允许它们的原因是它会产生歧义。利用窗口函数从高性能T-SQL中窃取Itzik Ben Gan的示例(p.25)

Suppose your table was

假设你的表

CREATE TABLE T1
(
col1 CHAR(1) PRIMARY KEY
)

INSERT INTO T1 VALUES('A'),('B'),('C'),('D'),('E'),('F')

And your query

和你的查询

SELECT col1
FROM T1
WHERE ROW_NUMBER() OVER (ORDER BY col1) <= 3
AND col1 > 'B'

What would be the right result? Would you expect that the col1 > 'B' predicate ran before or after the row numbering?

正确的结果是什么?您期望col1 > 'B'谓词在行编号之前或之后运行吗?

The ROW_NUMBER is evaluated at the time of the SELECT on the result set remaining after all the WHERE/HAVING clauses have been dealt with.

在处理完所有WHERE/ have子句之后,在结果集中剩下的SELECT时对ROW_NUMBER进行计算。

#2


9  

There is no need for CTE, just use the windowing function in a subquery:

不需要CTE,只需在子查询中使用窗口函数:

select id, sales_person_id, product_type, product_id, sale_amount
from
(
  select id, sales_person_id, product_type, product_id, sale_amount,
    row_number() over(partition by sales_person_id, product_type, product_id order by sale_amount desc) rn
  from Sales_Log
) sl
where rn = 1

Edit, moving my comment to the answer.

编辑,移动我的评论到答案。

Windowing functions are not performed until the data is actually selected which is after the WHERE clause. So if you try to use a row_number in a WHERE clause the value is not yet assigned.

在实际选择数据(位于WHERE子句之后)之前,不会执行窗口函数。因此,如果您试图在WHERE子句中使用row_number,则该值尚未分配。

#3


4  

First of all it something called all-at-once operation

首先,它被称为一次性操作

"All-at-Once Operations" means that all expressions in the same logical query process phase are evaluated logically at the same time.

“all -at- once操作”意味着同一逻辑查询过程阶段中的所有表达式都同时进行逻辑计算。

And great chapter Impact on Window Functions:

对窗口功能的巨大影响:

Suppose you have:

假设你有:

CREATE TABLE #Test ( Id INT) ;

INSERT  INTO #Test VALUES  ( 1001 ), ( 1002 ) ;

SELECT Id
FROM #Test
WHERE Id = 1002
  AND ROW_NUMBER() OVER(ORDER BY Id) = 1;

All-at-Once operations tell us these two conditions evaluated logically at the same point of time. Therefore, SQL Server can evaluate conditions in WHERE clause in arbitrary order, based on estimated execution plan. So the main question here is which condition evaluates first.

一次性操作告诉我们这两个条件在同一时间点上进行逻辑评估。因此,SQL Server可以根据估计的执行计划,以任意顺序评估WHERE子句中的条件。这里的主要问题是哪个条件首先求值。

Case 1:

案例1:

If ( Id = 1002 ) is first, then if ( ROW_NUMBER() OVER(ORDER BY Id) = 1 )

如果(Id = 1002)是第一个,那么If (ROW_NUMBER() / (ORDER BY Id) = 1)

Result: 1002

结果:1002

Case 2:

案例2:

If ( ROW_NUMBER() OVER(ORDER BY Id) = 1 ), then check if ( Id = 1002 )

如果(ROW_NUMBER() / (ORDER BY Id) = 1),则检查是否(Id = 1002)

Result: empty

结果:空

So we have a paradox.

我们有一个悖论。

This example shows why we cannot use Window Functions in WHERE clause. You can think more about this and find why Window Functions are allowed to be used just in SELECT and ORDER BY clauses!

这个例子说明了为什么我们不能在WHERE子句中使用窗口函数。您可以对此进行更多的思考,并找出为什么只允许在SELECT和ORDER BY子句中使用窗口函数!

#4


3  

You don't necessarily need to use a CTE, you can query the result set after using row_number()

您不必使用CTE,您可以在使用row_number()之后查询结果集

select row, id, sales_person_id, product_type, product_id, sale_amount
from (
    select
        row_number() over(partition by sales_person_id, 
            product_type, product_id order by sale_amount desc) AS row,
        id, sales_person_id, product_type, product_id, sale_amount
    from Sales_Log 
    ) a
where row = 1

#5


1  

Yes unfortunately when you do a windowed function SQL gets mad at you even if your where predicate is legitimate. You make a cte or nested select having the value in your select statement, then reference your CTE or nested select with that value later. Simple example that should be self explanatory. If you really HATE cte's for some performance issue on doing a large data set you can always drop to temp table or table variable.

是的,不幸的是,当你做一个窗口函数时,SQL会对你生气,即使你的谓词是合法的。在select语句中创建一个cte或嵌套select,然后在后面引用cte或嵌套select。这个简单的例子应该是不言自明的。如果您真的讨厌cte的一些性能问题,比如在做大数据集时,您可以将它放到临时表或表变量中。

declare @Person table ( PersonID int identity, PersonName varchar(8));

insert into @Person values ('Brett'),('John');

declare @Orders table ( OrderID int identity, PersonID int, OrderName varchar(8));

insert into @Orders values (1, 'Hat'),(1,'Shirt'),(1, 'Shoes'),(2,'Shirt'),(2, 'Shoes');

--Select
--  p.PersonName
--, o.OrderName
--, row_number() over(partition by o.PersonID order by o.OrderID)
--from @Person p 
--  join @Orders o on p.PersonID = o.PersonID
--where row_number() over(partition by o.PersonID order by o.orderID) = 2

-- yields:
--Msg 4108, Level 15, State 1, Line 15
--Windowed functions can only appear in the SELECT or ORDER BY clauses.
;

with a as 
    (
    Select
    p.PersonName
,   o.OrderName
,   row_number() over(partition by o.PersonID order by o.OrderID) as rnk
from @Person p 
    join @Orders o on p.PersonID = o.PersonID
    )
select *
from a 
where rnk >= 2 -- only orders after the first one.

#6


1  

Finally, there's the old-fashioned, pre-SQL Server 2005 way, with a correlated subquery:

最后,还有一种老式的、sql Server 2005前的方式,使用相关的子查询:

select *
from   Sales_Log sl
where  sl.id = (
    Select Top 1 id
    from   Sales_Log sl2
    where  sales_person_id = sl.sales_person_id
       and product_type = sl.product_type
       and product_id = sl.product_id
    order by sale_amount desc
)

I give you this for completeness, merely.

我给你这个是为了完整性。

#7


0  

It's an old thread, but I'll try to answer specifically the question expressed in the topic.

这是一条古老的线索,但我将尝试具体回答这个主题中表达的问题。

Why no windowed functions in where clauses?

为什么where子句中没有窗口函数?

SELECT statement has following main clauses specified in keyed-in order:

SELECT语句有以下主要条款:

SELECT DISTINCT TOP list
FROM  JOIN ON / APPLY / PIVOT / UNPIVOT
WHERE
GROUP BY  WITH CUBE / WITH ROLLUP
HAVING
ORDER BY
OFFSET-FETCH

Logical Query Processing Order, or Binding Order, is conceptual interpretation order, it defines the correctness of the query. This order determines when the objects defined in one step are made available to the clauses in subsequent steps.

逻辑查询处理顺序或绑定顺序是概念解释顺序,它定义了查询的正确性。这个顺序决定了在一个步骤中定义的对象在后续步骤中何时对子句可用。

----- Relational result
  1. FROM
    1.1. ON JOIN / APPLY / PIVOT / UNPIVOT
  2. WHERE
  3. GROUP BY
    3.1. WITH CUBE / WITH ROLLUP
  4. HAVING
  ---- After the HAVING step the Underlying Query Result is ready
  5. SELECT
    5.1. SELECT list
    5.2. DISTINCT
----- Relational result

----- Non-relational result (a cursor)
  6. ORDER BY
  7. TOP / OFFSET-FETCH
----- Non-relational result (a cursor)

For example, if the query processor can bind to (access) the tables or views defined in the FROM clause, these objects and their columns are made available to all subsequent steps.

例如,如果查询处理器可以绑定到(访问)FROM子句中定义的表或视图,那么所有后续步骤都可以使用这些对象及其列。

Conversely, all clauses preceding the SELECT clause cannot reference any column aliases or derived columns defined in SELECT clause. However, those columns can be referenced by subsequent clauses such as the ORDER BY clause.

相反,SELECT子句前面的所有子句都不能引用SELECT子句中定义的任何列别名或派生列。但是,这些列可以被后续的子句引用,比如ORDER by子句。

OVER clause determines the partitioning and ordering of a row set before the associated window function is applied. That is, the OVER clause defines a window or user-specified set of rows within an Underlying Query Result set and window function computes result against that window.

OVER子句确定应用相关窗口函数之前行集的分区和排序。也就是说,OVER子句在底层查询结果集中定义一个窗口或用户指定的一组行,窗口函数根据该窗口计算结果。

Msg 4108, Level 15, State 1, …
Windowed functions can only appear in the SELECT or ORDER BY clauses.

The reason behind is because the way how Logical Query Processing works in T-SQL. Since the underlying query result is established only when logical query processing reaches the SELECT step 5.1. (that is, after processing the FROM, WHERE, GROUP BY and HAVING steps), window functions are allowed only in the SELECT and ORDER BY clauses of the query.

原因在于逻辑查询处理在T-SQL中的工作方式。由于只在逻辑查询处理达到SELECT步骤5.1时才建立底层查询结果。(即,在处理FROM、WHERE、GROUP BY和HAVING steps之后),窗口函数只允许在查询的SELECT和ORDER BY子句中使用。

Note to mention, window functions are still part of relational layer even Relational Model doesn't deal with ordered data. The result after the SELECT step 5.1. with any window function is still relational.

注意,窗口函数仍然是关系层的一部分,甚至关系模型也不处理有序数据。选择步骤5.1之后的结果。对于任何窗口函数都是关系的。

Also, speaking strictly, the reason why window function are not allowed in the WHERE clause is not because it would create ambiguity, but because the order how Logical Query Processing processes SELECT statement in T-SQL.

此外,严格地说,WHERE子句中不允许窗口函数的原因并不是因为它会产生歧义,而是因为逻辑查询处理过程在T-SQL中选择语句的顺序。

Links: here, here and here

链接:这里,这里和这里。

#1


43  

really I'm looking for the reasoning behind not being able to use windowing functions in where clauses.

实际上,我在寻找不能在where子句中使用窗口功能的原因。

The reason that they are not allowed in the WHERE clause is that it would create ambiguity. Stealing Itzik Ben Gan's example from High-Performance T-SQL Using Window Functions (p.25)

在WHERE子句中不允许它们的原因是它会产生歧义。利用窗口函数从高性能T-SQL中窃取Itzik Ben Gan的示例(p.25)

Suppose your table was

假设你的表

CREATE TABLE T1
(
col1 CHAR(1) PRIMARY KEY
)

INSERT INTO T1 VALUES('A'),('B'),('C'),('D'),('E'),('F')

And your query

和你的查询

SELECT col1
FROM T1
WHERE ROW_NUMBER() OVER (ORDER BY col1) <= 3
AND col1 > 'B'

What would be the right result? Would you expect that the col1 > 'B' predicate ran before or after the row numbering?

正确的结果是什么?您期望col1 > 'B'谓词在行编号之前或之后运行吗?

The ROW_NUMBER is evaluated at the time of the SELECT on the result set remaining after all the WHERE/HAVING clauses have been dealt with.

在处理完所有WHERE/ have子句之后,在结果集中剩下的SELECT时对ROW_NUMBER进行计算。

#2


9  

There is no need for CTE, just use the windowing function in a subquery:

不需要CTE,只需在子查询中使用窗口函数:

select id, sales_person_id, product_type, product_id, sale_amount
from
(
  select id, sales_person_id, product_type, product_id, sale_amount,
    row_number() over(partition by sales_person_id, product_type, product_id order by sale_amount desc) rn
  from Sales_Log
) sl
where rn = 1

Edit, moving my comment to the answer.

编辑,移动我的评论到答案。

Windowing functions are not performed until the data is actually selected which is after the WHERE clause. So if you try to use a row_number in a WHERE clause the value is not yet assigned.

在实际选择数据(位于WHERE子句之后)之前,不会执行窗口函数。因此,如果您试图在WHERE子句中使用row_number,则该值尚未分配。

#3


4  

First of all it something called all-at-once operation

首先,它被称为一次性操作

"All-at-Once Operations" means that all expressions in the same logical query process phase are evaluated logically at the same time.

“all -at- once操作”意味着同一逻辑查询过程阶段中的所有表达式都同时进行逻辑计算。

And great chapter Impact on Window Functions:

对窗口功能的巨大影响:

Suppose you have:

假设你有:

CREATE TABLE #Test ( Id INT) ;

INSERT  INTO #Test VALUES  ( 1001 ), ( 1002 ) ;

SELECT Id
FROM #Test
WHERE Id = 1002
  AND ROW_NUMBER() OVER(ORDER BY Id) = 1;

All-at-Once operations tell us these two conditions evaluated logically at the same point of time. Therefore, SQL Server can evaluate conditions in WHERE clause in arbitrary order, based on estimated execution plan. So the main question here is which condition evaluates first.

一次性操作告诉我们这两个条件在同一时间点上进行逻辑评估。因此,SQL Server可以根据估计的执行计划,以任意顺序评估WHERE子句中的条件。这里的主要问题是哪个条件首先求值。

Case 1:

案例1:

If ( Id = 1002 ) is first, then if ( ROW_NUMBER() OVER(ORDER BY Id) = 1 )

如果(Id = 1002)是第一个,那么If (ROW_NUMBER() / (ORDER BY Id) = 1)

Result: 1002

结果:1002

Case 2:

案例2:

If ( ROW_NUMBER() OVER(ORDER BY Id) = 1 ), then check if ( Id = 1002 )

如果(ROW_NUMBER() / (ORDER BY Id) = 1),则检查是否(Id = 1002)

Result: empty

结果:空

So we have a paradox.

我们有一个悖论。

This example shows why we cannot use Window Functions in WHERE clause. You can think more about this and find why Window Functions are allowed to be used just in SELECT and ORDER BY clauses!

这个例子说明了为什么我们不能在WHERE子句中使用窗口函数。您可以对此进行更多的思考,并找出为什么只允许在SELECT和ORDER BY子句中使用窗口函数!

#4


3  

You don't necessarily need to use a CTE, you can query the result set after using row_number()

您不必使用CTE,您可以在使用row_number()之后查询结果集

select row, id, sales_person_id, product_type, product_id, sale_amount
from (
    select
        row_number() over(partition by sales_person_id, 
            product_type, product_id order by sale_amount desc) AS row,
        id, sales_person_id, product_type, product_id, sale_amount
    from Sales_Log 
    ) a
where row = 1

#5


1  

Yes unfortunately when you do a windowed function SQL gets mad at you even if your where predicate is legitimate. You make a cte or nested select having the value in your select statement, then reference your CTE or nested select with that value later. Simple example that should be self explanatory. If you really HATE cte's for some performance issue on doing a large data set you can always drop to temp table or table variable.

是的,不幸的是,当你做一个窗口函数时,SQL会对你生气,即使你的谓词是合法的。在select语句中创建一个cte或嵌套select,然后在后面引用cte或嵌套select。这个简单的例子应该是不言自明的。如果您真的讨厌cte的一些性能问题,比如在做大数据集时,您可以将它放到临时表或表变量中。

declare @Person table ( PersonID int identity, PersonName varchar(8));

insert into @Person values ('Brett'),('John');

declare @Orders table ( OrderID int identity, PersonID int, OrderName varchar(8));

insert into @Orders values (1, 'Hat'),(1,'Shirt'),(1, 'Shoes'),(2,'Shirt'),(2, 'Shoes');

--Select
--  p.PersonName
--, o.OrderName
--, row_number() over(partition by o.PersonID order by o.OrderID)
--from @Person p 
--  join @Orders o on p.PersonID = o.PersonID
--where row_number() over(partition by o.PersonID order by o.orderID) = 2

-- yields:
--Msg 4108, Level 15, State 1, Line 15
--Windowed functions can only appear in the SELECT or ORDER BY clauses.
;

with a as 
    (
    Select
    p.PersonName
,   o.OrderName
,   row_number() over(partition by o.PersonID order by o.OrderID) as rnk
from @Person p 
    join @Orders o on p.PersonID = o.PersonID
    )
select *
from a 
where rnk >= 2 -- only orders after the first one.

#6


1  

Finally, there's the old-fashioned, pre-SQL Server 2005 way, with a correlated subquery:

最后,还有一种老式的、sql Server 2005前的方式,使用相关的子查询:

select *
from   Sales_Log sl
where  sl.id = (
    Select Top 1 id
    from   Sales_Log sl2
    where  sales_person_id = sl.sales_person_id
       and product_type = sl.product_type
       and product_id = sl.product_id
    order by sale_amount desc
)

I give you this for completeness, merely.

我给你这个是为了完整性。

#7


0  

It's an old thread, but I'll try to answer specifically the question expressed in the topic.

这是一条古老的线索,但我将尝试具体回答这个主题中表达的问题。

Why no windowed functions in where clauses?

为什么where子句中没有窗口函数?

SELECT statement has following main clauses specified in keyed-in order:

SELECT语句有以下主要条款:

SELECT DISTINCT TOP list
FROM  JOIN ON / APPLY / PIVOT / UNPIVOT
WHERE
GROUP BY  WITH CUBE / WITH ROLLUP
HAVING
ORDER BY
OFFSET-FETCH

Logical Query Processing Order, or Binding Order, is conceptual interpretation order, it defines the correctness of the query. This order determines when the objects defined in one step are made available to the clauses in subsequent steps.

逻辑查询处理顺序或绑定顺序是概念解释顺序,它定义了查询的正确性。这个顺序决定了在一个步骤中定义的对象在后续步骤中何时对子句可用。

----- Relational result
  1. FROM
    1.1. ON JOIN / APPLY / PIVOT / UNPIVOT
  2. WHERE
  3. GROUP BY
    3.1. WITH CUBE / WITH ROLLUP
  4. HAVING
  ---- After the HAVING step the Underlying Query Result is ready
  5. SELECT
    5.1. SELECT list
    5.2. DISTINCT
----- Relational result

----- Non-relational result (a cursor)
  6. ORDER BY
  7. TOP / OFFSET-FETCH
----- Non-relational result (a cursor)

For example, if the query processor can bind to (access) the tables or views defined in the FROM clause, these objects and their columns are made available to all subsequent steps.

例如,如果查询处理器可以绑定到(访问)FROM子句中定义的表或视图,那么所有后续步骤都可以使用这些对象及其列。

Conversely, all clauses preceding the SELECT clause cannot reference any column aliases or derived columns defined in SELECT clause. However, those columns can be referenced by subsequent clauses such as the ORDER BY clause.

相反,SELECT子句前面的所有子句都不能引用SELECT子句中定义的任何列别名或派生列。但是,这些列可以被后续的子句引用,比如ORDER by子句。

OVER clause determines the partitioning and ordering of a row set before the associated window function is applied. That is, the OVER clause defines a window or user-specified set of rows within an Underlying Query Result set and window function computes result against that window.

OVER子句确定应用相关窗口函数之前行集的分区和排序。也就是说,OVER子句在底层查询结果集中定义一个窗口或用户指定的一组行,窗口函数根据该窗口计算结果。

Msg 4108, Level 15, State 1, …
Windowed functions can only appear in the SELECT or ORDER BY clauses.

The reason behind is because the way how Logical Query Processing works in T-SQL. Since the underlying query result is established only when logical query processing reaches the SELECT step 5.1. (that is, after processing the FROM, WHERE, GROUP BY and HAVING steps), window functions are allowed only in the SELECT and ORDER BY clauses of the query.

原因在于逻辑查询处理在T-SQL中的工作方式。由于只在逻辑查询处理达到SELECT步骤5.1时才建立底层查询结果。(即,在处理FROM、WHERE、GROUP BY和HAVING steps之后),窗口函数只允许在查询的SELECT和ORDER BY子句中使用。

Note to mention, window functions are still part of relational layer even Relational Model doesn't deal with ordered data. The result after the SELECT step 5.1. with any window function is still relational.

注意,窗口函数仍然是关系层的一部分,甚至关系模型也不处理有序数据。选择步骤5.1之后的结果。对于任何窗口函数都是关系的。

Also, speaking strictly, the reason why window function are not allowed in the WHERE clause is not because it would create ambiguity, but because the order how Logical Query Processing processes SELECT statement in T-SQL.

此外,严格地说,WHERE子句中不允许窗口函数的原因并不是因为它会产生歧义,而是因为逻辑查询处理过程在T-SQL中选择语句的顺序。

Links: here, here and here

链接:这里,这里和这里。