为什么PostgreSQL中的聚合函数不适用于布尔数据类型

时间:2021-11-25 16:58:44

Why we cannot use boolean values in aggregate functions without casting to some integer type first? In many cases it makes perfect sense to calculate sum, average or correlation from columns of boolean data type.

为什么我们不能在聚合函数中使用布尔值而不首先转换为某种整数类型?在许多情况下,从布尔数据类型的列计算总和,平均值或相关性是完全合理的。

Consider the following example where boolean input has to be always casted to int in order to make it work:

请考虑以下示例,其中布尔输入必须始终转换为int以使其工作:

select
   sum(boolinput::int),
   avg(boolinput::int),
   max(boolinput::int),
   min(boolinput::int),
   stddev(boolinput::int),
   corr(boolinput::int,boolinputb::int)   
from
   (select 
      (random() > .5)::boolean as boolinput,
      (random() > .5)::boolean as boolinputB 
    from 
      generate_series(1,100)
   ) a

From PostgreSQL documentation:

从PostgreSQL文档:

Valid literal values for the "true" state are: TRUE 't' 'true' 'y' 'yes' 'on' '1'

“真实”状态的有效字面值为:'1'为'真''是''是''''1'

For the "false" state, the following values can be used: FALSE 'f' 'false' 'n' 'no' 'off' '0'

对于“假”状态,可以使用以下值:FALSE'f''false''n''no''off''0'

Because by definition TRUE equals 1 and FALSE equals 0 I do not understand why casting is necessary.

因为根据定义TRUE等于1而FALSE等于0我不明白为什么需要施法。

Allowing boolean in aggregation would have also interesting side effects - we can for example simplify many case statements:

在聚合中允许布尔值也会产生有趣的副作用 - 例如,我们可以简化许多case语句:

Current version (clean and easy to understand):

当前版本(干净且易于理解):

select sum(case when gs > 50 then 1 else 0 end) from generate_series(1,100) gs;

Using old fashioned casting operator :::

使用老式铸造操作员:::

select sum((gs > 50)::int) from generate_series(1,100) gs;

Direct aggregation of boolean values (not working currently):

直接聚合布尔值(当前不工作):

select sum(gs > 50) from generate_series(1,100) gs;

Is direct aggregation of boolean values possible in other DBMSs? Why this is not possible in PostgreSQL?

是否可以在其他DBMS中直接聚合布尔值?为什么在PostgreSQL中这是不可能的?

4 个解决方案

#1


5  

Because by definition TRUE equals 1 and FALSE equals 0 I do not understand why casting is necessary.

因为根据定义TRUE等于1而FALSE等于0我不明白为什么需要施法。

Per the docs you have quoted in your question, a boolean is not, by definition, 1 for TRUE and 0 for FALSE. It's not true in C either, where TRUE is anything non-zero.

根据您在问题中引用的文档,根据定义,布尔值不是1表示TRUE而0表示FALSE。在C中也不是这样,其中TRUE是非零的。

For that matter, nor is it for languages that mimic C in this respect, of which there are many. Nor is it for languages such as Ruby, where anything non-Nil/non-False evaluates to True, including zero and empty strings. Nor is it for POSIX shell and variations thereof, where testing a return code yields TRUE if it is zero, and FALSE for anything non-zero.

就此而言,也不是在这方面模仿C的语言,其中有很多。也不是像Ruby这样的语言,其中任何非Nil / non-False评估为True,包括零和空字符串。它也不适用于POSIX shell及其变体,其中测试返回码如果为零则产生TRUE,而对于任何非零值则产生FALSE。

Point is, a boolean is a boolean, with all sorts of colorful implementation details from a platform to the next; not an integer.

Point是,布尔值是一个布尔值,具有从平台到下一个平台的各种丰富的实现细节;不是整数。

It's unclear how you were expecting Postgres to average true/false values. I'm suspicious that many if any platform will yield a result for that.

目前还不清楚你是如何期待Postgres平均真假值的。我怀疑许多平台是否会产生结果。

Even summing booleans is awkward: would expecting Postgres to OR the input values, or to count TRUE values?

即使总结布尔值也很尴尬:期望Postgres输入OR值,还是计算TRUE值?

At any rate, there are some boolean aggregate functions, namely bool_or() and bool_and(). These replace the more standard any() and some(). The reason Postgres deviates from the standard here is due to potential ambiguity. Per the docs:

无论如何,有一些布尔聚合函数,即bool_or()和bool_and()。这些替换了更标准的any()和some()。 Postgres偏离标准的原因是由于潜在的模糊性。根据文档:

SELECT b1 = ANY((SELECT b2 FROM t2 ...)) FROM t1 ...;

Here ANY can be considered either as introducing a subquery, or as being an aggregate function, if the subquery returns one row with a Boolean value.

如果子查询返回带有布尔值的行,则可以将ANY视为引入子查询或者作为聚合函数。

http://www.postgresql.org/docs/current/static/functions-aggregate.html

http://www.postgresql.org/docs/current/static/functions-aggregate.html

#2


0  

To sum boolean values, I have created the following custom aggregate function:

为了对布尔值求和,我创建了以下自定义聚合函数:

create or replace function badd (bigint, boolean)
  returns bigint as
$body$
select $1 + case when true then 1 else 0 end;
$body$ language sql;

create aggregate sum(boolean) (
  sfunc=badd,
  stype=int8,
  initcond='0'
);

Now I can easily sum boolean values or count rows meeting specific condition:

现在我可以轻松地对布尔值求和或计算满足特定条件的行:

with test (a, b, c) as (
   values
      ('true'::boolean,'a'::varchar, 'd'::text),
      ('true'::boolean,'a'::varchar, 'e'::text),      
      ('false'::boolean,'a'::varchar, 'f'::text),
      ('true'::boolean,'b'::varchar, 'd'::text),
      ('false'::boolean,'b'::varchar, 'd'::text),
      ('true'::boolean,'c'::varchar, 'f'::text),                
      (NULL,'c'::varchar,'d')      
    ) 
select 
   b,
   bsum(a) as sum, -- sum boolean value (TRUE=1, FALSE=0)
   bsum(c = 'd') as dsum -- counts all rows where column c equals to value 'd'
from 
   test
group by
   b

#3


0  

Here are some possibilities

这是一些可能性

select max(c::int)::boolean, min(c::int)::boolean, bool_or(c) as max_b,bool_and(c) as min_b from
(
        select false as c
  union select true
  union select null
) t

#4


0  

Here is how one can achieve max(boolean)

这是如何实现max(布尔值)

CREATE AGGREGATE max(boolean) (
  SFUNC=boolor_statefunc,
  STYPE=bool,
  SORTOP=">"
);  

where "boolor_statefunc" is built in function

其中“boolor_statefunc”是在函数中构建的

#1


5  

Because by definition TRUE equals 1 and FALSE equals 0 I do not understand why casting is necessary.

因为根据定义TRUE等于1而FALSE等于0我不明白为什么需要施法。

Per the docs you have quoted in your question, a boolean is not, by definition, 1 for TRUE and 0 for FALSE. It's not true in C either, where TRUE is anything non-zero.

根据您在问题中引用的文档,根据定义,布尔值不是1表示TRUE而0表示FALSE。在C中也不是这样,其中TRUE是非零的。

For that matter, nor is it for languages that mimic C in this respect, of which there are many. Nor is it for languages such as Ruby, where anything non-Nil/non-False evaluates to True, including zero and empty strings. Nor is it for POSIX shell and variations thereof, where testing a return code yields TRUE if it is zero, and FALSE for anything non-zero.

就此而言,也不是在这方面模仿C的语言,其中有很多。也不是像Ruby这样的语言,其中任何非Nil / non-False评估为True,包括零和空字符串。它也不适用于POSIX shell及其变体,其中测试返回码如果为零则产生TRUE,而对于任何非零值则产生FALSE。

Point is, a boolean is a boolean, with all sorts of colorful implementation details from a platform to the next; not an integer.

Point是,布尔值是一个布尔值,具有从平台到下一个平台的各种丰富的实现细节;不是整数。

It's unclear how you were expecting Postgres to average true/false values. I'm suspicious that many if any platform will yield a result for that.

目前还不清楚你是如何期待Postgres平均真假值的。我怀疑许多平台是否会产生结果。

Even summing booleans is awkward: would expecting Postgres to OR the input values, or to count TRUE values?

即使总结布尔值也很尴尬:期望Postgres输入OR值,还是计算TRUE值?

At any rate, there are some boolean aggregate functions, namely bool_or() and bool_and(). These replace the more standard any() and some(). The reason Postgres deviates from the standard here is due to potential ambiguity. Per the docs:

无论如何,有一些布尔聚合函数,即bool_or()和bool_and()。这些替换了更标准的any()和some()。 Postgres偏离标准的原因是由于潜在的模糊性。根据文档:

SELECT b1 = ANY((SELECT b2 FROM t2 ...)) FROM t1 ...;

Here ANY can be considered either as introducing a subquery, or as being an aggregate function, if the subquery returns one row with a Boolean value.

如果子查询返回带有布尔值的行,则可以将ANY视为引入子查询或者作为聚合函数。

http://www.postgresql.org/docs/current/static/functions-aggregate.html

http://www.postgresql.org/docs/current/static/functions-aggregate.html

#2


0  

To sum boolean values, I have created the following custom aggregate function:

为了对布尔值求和,我创建了以下自定义聚合函数:

create or replace function badd (bigint, boolean)
  returns bigint as
$body$
select $1 + case when true then 1 else 0 end;
$body$ language sql;

create aggregate sum(boolean) (
  sfunc=badd,
  stype=int8,
  initcond='0'
);

Now I can easily sum boolean values or count rows meeting specific condition:

现在我可以轻松地对布尔值求和或计算满足特定条件的行:

with test (a, b, c) as (
   values
      ('true'::boolean,'a'::varchar, 'd'::text),
      ('true'::boolean,'a'::varchar, 'e'::text),      
      ('false'::boolean,'a'::varchar, 'f'::text),
      ('true'::boolean,'b'::varchar, 'd'::text),
      ('false'::boolean,'b'::varchar, 'd'::text),
      ('true'::boolean,'c'::varchar, 'f'::text),                
      (NULL,'c'::varchar,'d')      
    ) 
select 
   b,
   bsum(a) as sum, -- sum boolean value (TRUE=1, FALSE=0)
   bsum(c = 'd') as dsum -- counts all rows where column c equals to value 'd'
from 
   test
group by
   b

#3


0  

Here are some possibilities

这是一些可能性

select max(c::int)::boolean, min(c::int)::boolean, bool_or(c) as max_b,bool_and(c) as min_b from
(
        select false as c
  union select true
  union select null
) t

#4


0  

Here is how one can achieve max(boolean)

这是如何实现max(布尔值)

CREATE AGGREGATE max(boolean) (
  SFUNC=boolor_statefunc,
  STYPE=bool,
  SORTOP=">"
);  

where "boolor_statefunc" is built in function

其中“boolor_statefunc”是在函数中构建的