获取连接表列的不同总和

时间:2021-02-12 01:34:58

I have a problem here, and I'm hoping there is an easy solution. I'll try to make this as simple as possible:

我在这里有一个问题,我希望有一个简单的解决方案。我会尝试尽可能简单:

  • A ticket belongs to an attendee
  • 票证属于与会者
  • Example:
  • 例:

select * from tickets JOIN attendees ON attendee.id = tickets.attendee_id
  • An attendee has a decimal column called "revenue"
  • 与会者有一个名为“收入”的小数列

That said, I need to run a query that will return a variety of information about the tickets, including the total revenue. The problem is that if 2 tickets belong to the same attendee, it counts their revenue twice. How can I sum the attendee revenue only once?

也就是说,我需要运行一个查询,返回有关票证的各种信息,包括总收入。问题是,如果2张门票属于同一位与会者,则会将其收入计入两次。如何仅将与会者收入相加一次?

I don't want to use subqueries as my ORM makes this difficult. Plus a sub query solution doesn't scale if I want to do this for multiple columns.

我不想使用子查询,因为我的ORM使这很困难。如果我想对多列执行此操作,则子查询解决方案不会扩展。

Here's what I have:

这就是我所拥有的:

  • 1 attendees with a revenue of 100
  • 1位收入为100的与会者
  • 2 tickets that both belong to that attendee
  • 两张属于该与会者的门票

Select count(tickets.*) as tickets_count
     , sum(attendees.revenue) as atendees_revenue
from tickets LEFT OUTER JOIN attendees ON attendees.id = tickets.attendee_id;

=> This tells me that attendees_revenue is 200. I want it to be 100. Since there is one attendee in the database with an existing_revenue of 100. I do NOT want the attendee to be double counted.

=>这告诉我,attendees_revenue是200.我希望它为100.因为数据库中有一个参与者,其现有的回报为100.我不希望参与者被重复计算。

Please let me know if this is possible.

如果可能,请告诉我。

4 个解决方案

#1


7  

To get the result without subquery, you have to resort to advanced window function trickery:

要在没有子查询的情况下获得结果,您必须使用高级窗口函数技巧:

SELECT sum(count(*))       OVER () AS tickets_count
      ,sum(min(a.revenue)) OVER () AS atendees_revenue
FROM   tickets   t
JOIN   attendees a ON a.id = t.attendee_id
GROUP  BY t.attendee_id
LIMIT  1;

SQL Fiddle.

SQL小提琴。

Explain

The key to understanding this is the sequence of events in the query:

理解这一点的关键是查询中的事件序列:

aggregate functions -> window functions -> DISTINCT -> LIMIT

聚合函数 - >窗口函数 - > DISTINCT - > LIMIT

More details here:

更多细节在这里:

Step by step:

一步步:

  1. I GROUP BY t.attendee_id - which you would normally do in a subquery.

    I GROUP BY t.attendee_id - 您通常在子查询中执行此操作。

  2. Then I sum over the counts to get the total count of tickets. Not very efficient, but forced by your requirement. The aggregate function count(*) is wrapped in the window function sum( ... ) OVER () to arrive at the not-so-common expression: sum(count(*)) OVER ().

    然后我总结了计数以获得门票的总数。不是很有效,但是强迫你的要求。聚合函数count(*)包含在窗函数sum(...)OVER()中,以得到不常见的表达式:sum(count(*))OVER()。

    And sum the minimum revenue per attendee to get the sum without duplicates.

    并总结每位与会者的最低收入,以获得没有重复的金额。

    You could also use max() or avg() instead of min() to the same effect as revenue is guaranteed to be the same for every row per attendee.

    您还可以使用max()或avg()而不是min()来达到相同的效果,因为每位与会者的每一行的收入保证相同。

    This could be simpler if DISTINCT was allowed in window functions, but PostgreSQL has not (yet) implemented this feature. Per documentation:

    如果在窗口函数中允许DISTINCT,这可能会更简单,但PostgreSQL还没有实现此功能。每个文件:

    Aggregate window functions, unlike normal aggregate functions, do not allow DISTINCT or ORDER BY to be used within the function argument list.

    与普通聚合函数不同,聚合窗口函数不允许在函数参数列表中使用DISTINCT或ORDER BY。

  3. Final step is to get a single row. This could be done with DISTINCT (SQL standard) since all rows are the same. LIMIT 1 will be faster, though. Or the SQL-standard form FETCH FIRST 1 ROWS ONLY.

    最后一步是获得一行。这可以使用DISTINCT(SQL标准)来完成,因为所有行都是相同的。但是LIMIT 1会更快。或者SQL标准形式FETCH FIRST 1 ROWS ONLY。

#2


3  

What about a simple division:

简单划分怎么样:

 Select count(tickets.*) as tickets_count
 , sum(attendees.revenue) / count(attendees.id) as atendees_revenue
 from tickets LEFT OUTER JOIN attendees ON attendees.id = 
 tickets.attendee_id;

This should handle duplicates, triplicates, etcetera.

这应该处理重复,重复,等等。

#3


0  

You were actually pretty close, there's many way to do this and if I understand your question correctly this should do it :

你实际上非常接近,有很多方法可以做到这一点,如果我理解你的问题,这应该做到:

SELECT
   COUNT(*) AS ticketsCount,
   SUM(DISTINCT attendees.revenue) AS revenueSum
FROM
   tickets
   LEFT JOIN attendees ON
      attendees.id = tickets.attendee_id

#4


0  

Previous answer is nearly correct. You just need to make distinct work well in case identical revenues. You can fix this really simple if your id has numeric type:

以前的答案几乎是正确的。你需要在相同的收入情况下做出明显的工作。如果你的id有数字类型,你可以解决这个问题:

SELECT
  COUNT(*) AS ticketsCount,
  SUM(DISTINCT attendees.id + attendees.revenue) - SUM(DISTINCT attendees.id) AS revenueSum
FROM
  tickets
LEFT JOIN attendees ON
  attendees.id = tickets.attendee_id

#1


7  

To get the result without subquery, you have to resort to advanced window function trickery:

要在没有子查询的情况下获得结果,您必须使用高级窗口函数技巧:

SELECT sum(count(*))       OVER () AS tickets_count
      ,sum(min(a.revenue)) OVER () AS atendees_revenue
FROM   tickets   t
JOIN   attendees a ON a.id = t.attendee_id
GROUP  BY t.attendee_id
LIMIT  1;

SQL Fiddle.

SQL小提琴。

Explain

The key to understanding this is the sequence of events in the query:

理解这一点的关键是查询中的事件序列:

aggregate functions -> window functions -> DISTINCT -> LIMIT

聚合函数 - >窗口函数 - > DISTINCT - > LIMIT

More details here:

更多细节在这里:

Step by step:

一步步:

  1. I GROUP BY t.attendee_id - which you would normally do in a subquery.

    I GROUP BY t.attendee_id - 您通常在子查询中执行此操作。

  2. Then I sum over the counts to get the total count of tickets. Not very efficient, but forced by your requirement. The aggregate function count(*) is wrapped in the window function sum( ... ) OVER () to arrive at the not-so-common expression: sum(count(*)) OVER ().

    然后我总结了计数以获得门票的总数。不是很有效,但是强迫你的要求。聚合函数count(*)包含在窗函数sum(...)OVER()中,以得到不常见的表达式:sum(count(*))OVER()。

    And sum the minimum revenue per attendee to get the sum without duplicates.

    并总结每位与会者的最低收入,以获得没有重复的金额。

    You could also use max() or avg() instead of min() to the same effect as revenue is guaranteed to be the same for every row per attendee.

    您还可以使用max()或avg()而不是min()来达到相同的效果,因为每位与会者的每一行的收入保证相同。

    This could be simpler if DISTINCT was allowed in window functions, but PostgreSQL has not (yet) implemented this feature. Per documentation:

    如果在窗口函数中允许DISTINCT,这可能会更简单,但PostgreSQL还没有实现此功能。每个文件:

    Aggregate window functions, unlike normal aggregate functions, do not allow DISTINCT or ORDER BY to be used within the function argument list.

    与普通聚合函数不同,聚合窗口函数不允许在函数参数列表中使用DISTINCT或ORDER BY。

  3. Final step is to get a single row. This could be done with DISTINCT (SQL standard) since all rows are the same. LIMIT 1 will be faster, though. Or the SQL-standard form FETCH FIRST 1 ROWS ONLY.

    最后一步是获得一行。这可以使用DISTINCT(SQL标准)来完成,因为所有行都是相同的。但是LIMIT 1会更快。或者SQL标准形式FETCH FIRST 1 ROWS ONLY。

#2


3  

What about a simple division:

简单划分怎么样:

 Select count(tickets.*) as tickets_count
 , sum(attendees.revenue) / count(attendees.id) as atendees_revenue
 from tickets LEFT OUTER JOIN attendees ON attendees.id = 
 tickets.attendee_id;

This should handle duplicates, triplicates, etcetera.

这应该处理重复,重复,等等。

#3


0  

You were actually pretty close, there's many way to do this and if I understand your question correctly this should do it :

你实际上非常接近,有很多方法可以做到这一点,如果我理解你的问题,这应该做到:

SELECT
   COUNT(*) AS ticketsCount,
   SUM(DISTINCT attendees.revenue) AS revenueSum
FROM
   tickets
   LEFT JOIN attendees ON
      attendees.id = tickets.attendee_id

#4


0  

Previous answer is nearly correct. You just need to make distinct work well in case identical revenues. You can fix this really simple if your id has numeric type:

以前的答案几乎是正确的。你需要在相同的收入情况下做出明显的工作。如果你的id有数字类型,你可以解决这个问题:

SELECT
  COUNT(*) AS ticketsCount,
  SUM(DISTINCT attendees.id + attendees.revenue) - SUM(DISTINCT attendees.id) AS revenueSum
FROM
  tickets
LEFT JOIN attendees ON
  attendees.id = tickets.attendee_id