I have a query that looks something like this:
我有一个看起来像这样的查询:
SELECT COUNT(DISTINCT A) as a_distinct,
COUNT(DISTINCT B) as b_distinct,
COUNT(DISTINCT A)/COUNT(DISTINCT B) as a_b_ratio
FROM
sometable_ab
As we can see this looks very inefficient as aggregate functions are run twice even though they have been calculated. I could only think of one solution to the problem that is breaking it into two queries. Is that the only probably solution. Or is their a better more efficient solution that could be done. I am using Redshift DB which mostly uses postgresql but a solution with even MYSQL would be acceptable as I cannot think of a way in any DB to do this efficiently.
正如我们所看到的那样,这看起来非常低效,因为聚合函数即使经过计算也会运行两次。我只能想到一个问题的解决方案,将其分解为两个查询。这是唯一可能的解决方案。或者是他们可以做到的更有效的解决方案。我正在使用Redshift DB,它主要使用postgresql但是一个甚至MYSQL的解决方案都是可以接受的,因为我无法想到在任何DB中有效地执行此操作的方法。
2 个解决方案
#1
3
If you are worried about the performance impact, just use a subquery:
如果您担心性能影响,只需使用子查询:
SELECT a_distinct, b_distinct, a_distinct / b_distinct as a_b_ratio
FROM (SELECT COUNT(DISTINCT A) as a_distinct,
COUNT(DISTINCT B) as b_distinct
FROM sometable_ab
) ab
For most aggregation functions, this would be irrelevant, but count(distinct)
can be a performance hog.
对于大多数聚合函数,这将是无关紧要的,但count(distinct)可能是性能损失。
This is ANSI standard SQL and should work in any database you mention.
这是ANSI标准SQL,应该适用于您提到的任何数据库。
#2
0
Using a subquery still counts as one query for any RDBMS. More importantly, count()
never returns NULL, but 0 if no row is found (or no non-null value for the given expression in any row). This would lead you straight into a division by zero exception. Fix it with NULLIF
(also standard SQL). You'll get NULL in this case.
使用子查询仍然算作任何RDBMS的一个查询。更重要的是,count()永远不会返回NULL,但如果没有找到行则为0(或者任何行中给定表达式没有非null值)。这将导致你直接进入除零异常。使用NULLIF(也是标准SQL)修复它。在这种情况下你会得到NULL。
SELECT *, a_distinct / NULLIF(b_distinct, 0) AS a_b_ratio
FROM (
SELECT count(DISTINCT a) AS a_distinct
, count(DISTINCT b) AS b_distinct
FROM sometable_ab
) sub;
#1
3
If you are worried about the performance impact, just use a subquery:
如果您担心性能影响,只需使用子查询:
SELECT a_distinct, b_distinct, a_distinct / b_distinct as a_b_ratio
FROM (SELECT COUNT(DISTINCT A) as a_distinct,
COUNT(DISTINCT B) as b_distinct
FROM sometable_ab
) ab
For most aggregation functions, this would be irrelevant, but count(distinct)
can be a performance hog.
对于大多数聚合函数,这将是无关紧要的,但count(distinct)可能是性能损失。
This is ANSI standard SQL and should work in any database you mention.
这是ANSI标准SQL,应该适用于您提到的任何数据库。
#2
0
Using a subquery still counts as one query for any RDBMS. More importantly, count()
never returns NULL, but 0 if no row is found (or no non-null value for the given expression in any row). This would lead you straight into a division by zero exception. Fix it with NULLIF
(also standard SQL). You'll get NULL in this case.
使用子查询仍然算作任何RDBMS的一个查询。更重要的是,count()永远不会返回NULL,但如果没有找到行则为0(或者任何行中给定表达式没有非null值)。这将导致你直接进入除零异常。使用NULLIF(也是标准SQL)修复它。在这种情况下你会得到NULL。
SELECT *, a_distinct / NULLIF(b_distinct, 0) AS a_b_ratio
FROM (
SELECT count(DISTINCT a) AS a_distinct
, count(DISTINCT b) AS b_distinct
FROM sometable_ab
) sub;