使用SQL根据日期计算总和的最佳方法

时间:2020-12-07 08:49:38

I don't know a good way to maintain sums depending on dates in a SQL database.

我不知道根据SQL数据库中的日期维护总和的好方法。

Take a database with two tables:

获取包含两个表的数据库:

Client

  • clientID
  • name
  • overdueAmount

Invoice

  • clientID
  • invoiceID
  • amount
  • dueDate
  • paymentDate

I need to propose a list of the clients and order it by overdue amount (sum of not paid past invoices of the client). On big database it isn't possible to calculate it in real time.

我需要提出一份客户名单,并按逾期金额(客户未支付的过期发票金额)进行订购。在大数据库上,无法实时计算。

The problem is the maintenance of an overdue amount field on the client. The amount of this field can change at midnight from one day to the other even if nothing changed on the invoices of the client.

问题是在客户端上维护过期的金额字段。即使客户发票上没有任何变化,此字段的数量也可能在午夜从一天变为另一天。

This sum changes if the invoice is paid, a new invoice is created and due date is past, a due date is now past and wasn't yesterday...

如果支付发票,创建新发票并且截止日期已过,截止日期已过,而不是昨天......

The only solution I found is to recalculate every night this field on every client by summing the invoices respecting the conditions. But it's not efficient on very big databases.

我找到的唯一解决方案是通过总结尊重条件的发票,每天晚上重新计算每个客户的这个字段。但它在非常大的数据库上效率不高。

I think it's a common problem and I would like to know if a best practice exists?

我认为这是一个常见的问题,我想知道是否存在最佳实践?

2 个解决方案

#1


1  

You should read about data warehousing. It will help you to solve this problem. It looks similar as what you just said

您应该阅读有关数据仓库的信息。它将帮助您解决此问题。它看起来和你刚才说的相似

"The only solution I found is to recalculate every night this field on every client by summing the invoices respecting the conditions. But it's not efficient on very big databases."

“我找到的唯一解决方案是通过总结尊重条件的发票,每个晚上重新计算每个客户的这个字段。但它在非常大的数据库上效率不高。”

But it has something more than that. When you read it, try to forget about normalization. Its main intention is for 'show' data, not 'manage' data. So, you would feel weird at beginning but if you understand 'why we need data warehousing', it will be very very interesting.

但它有更多的东西。当你阅读它时,试着忘记规范化。它的主要目的是“显示”数据,而不是“管理”数据。所以,一开始你会感到很奇怪,但如果你理解“为什么我们需要数据仓库”,那将非常有趣。

This is a book that can be a good start http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247 , classic one.

这本书可以是一个很好的开始http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247,经典之作。

#2


1  

Firstly, I'd like to understand what you mean by "very big databases" - most RDBMS systems running on decent hardware should be able to calculate this in real time for anything less than hundreds of millions of invoices. I speak from experience here.

首先,我想了解“非常大的数据库”的含义 - 大多数运行在不错的硬件上的RDBMS系统应该能够实时计算出数以下的数以千计的发票。我是根据这里的经验说的。

Secondly, "best practice" is one of those expressions that mean very little - it's often used to present someone's opinion as being more meaningful than simply an opinion.

其次,“最佳实践”是那些意味着很少的表达方式之一 - 它通常用于表达某人的意见,而不仅仅是一种意见。

In my opinion, by far the best option is to calculate it on the fly.

在我看来,到目前为止最好的选择是动态计算。

If your database is so big that you really can't do this, I'd consider a nightly batch (as you describe). Nightly batch runs are a pain - especially for systems that need to be available 24/7, but they have the benefit of keeping all the logic in a single place.

如果您的数据库太大而您实际上无法做到这一点,我会考虑每晚批处理(如您所述)。每夜批量运行都很痛苦 - 特别是对于需要全天候可用的系统,但它们具有将所有逻辑保存在一个地方的好处。

If you want to avoid nightly batches, you can use triggers to populate an "unpaid_invoices" table. When you create a new invoice record, a trigger copies that invoice to the "unpaid_invoices" table; when you update the invoice with a payment, and the payment amount equals the outstanding amount, you delete from the unpaid_invoices table. By definition, the unpaid_invoices table should be far smaller than the total number of invoices; calculating the outstanding amount for a given customer on the fly should be okay.

如果要避免每晚批次,可以使用触发器填充“unpaid_invoices”表。创建新发票记录时,触发器会将该发票复制到“unpaid_invoices”表中;当您使用付款更新发票,并且付款金额等于未付金额时,您将从unpaid_invoices表中删除。根据定义,unpaid_invoices表应远小于发票总数;在运行中计算给定客户的未付金额应该没问题。

However, triggers are nasty, evil things, with exotic failure modes that can stump the unsuspecting developer, so only consider this if you have a ninja SQL developer on hand. Absolutely make sure you have a SQL query which checks the validity of your unpaid_invoices table, and ideally schedule it as a regular task.

然而,触发器是令人讨厌的,邪恶的东西,具有异乎寻常的失败模式,可以阻止毫无戒心的开发人员,所以只有在你手头有一个忍者SQL开发人员时才考虑这个。绝对要确保您有一个SQL查询来检查unpaid_invoices表的有效性,并将其理想地安排为常规任务。

#1


1  

You should read about data warehousing. It will help you to solve this problem. It looks similar as what you just said

您应该阅读有关数据仓库的信息。它将帮助您解决此问题。它看起来和你刚才说的相似

"The only solution I found is to recalculate every night this field on every client by summing the invoices respecting the conditions. But it's not efficient on very big databases."

“我找到的唯一解决方案是通过总结尊重条件的发票,每个晚上重新计算每个客户的这个字段。但它在非常大的数据库上效率不高。”

But it has something more than that. When you read it, try to forget about normalization. Its main intention is for 'show' data, not 'manage' data. So, you would feel weird at beginning but if you understand 'why we need data warehousing', it will be very very interesting.

但它有更多的东西。当你阅读它时,试着忘记规范化。它的主要目的是“显示”数据,而不是“管理”数据。所以,一开始你会感到很奇怪,但如果你理解“为什么我们需要数据仓库”,那将非常有趣。

This is a book that can be a good start http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247 , classic one.

这本书可以是一个很好的开始http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247,经典之作。

#2


1  

Firstly, I'd like to understand what you mean by "very big databases" - most RDBMS systems running on decent hardware should be able to calculate this in real time for anything less than hundreds of millions of invoices. I speak from experience here.

首先,我想了解“非常大的数据库”的含义 - 大多数运行在不错的硬件上的RDBMS系统应该能够实时计算出数以下的数以千计的发票。我是根据这里的经验说的。

Secondly, "best practice" is one of those expressions that mean very little - it's often used to present someone's opinion as being more meaningful than simply an opinion.

其次,“最佳实践”是那些意味着很少的表达方式之一 - 它通常用于表达某人的意见,而不仅仅是一种意见。

In my opinion, by far the best option is to calculate it on the fly.

在我看来,到目前为止最好的选择是动态计算。

If your database is so big that you really can't do this, I'd consider a nightly batch (as you describe). Nightly batch runs are a pain - especially for systems that need to be available 24/7, but they have the benefit of keeping all the logic in a single place.

如果您的数据库太大而您实际上无法做到这一点,我会考虑每晚批处理(如您所述)。每夜批量运行都很痛苦 - 特别是对于需要全天候可用的系统,但它们具有将所有逻辑保存在一个地方的好处。

If you want to avoid nightly batches, you can use triggers to populate an "unpaid_invoices" table. When you create a new invoice record, a trigger copies that invoice to the "unpaid_invoices" table; when you update the invoice with a payment, and the payment amount equals the outstanding amount, you delete from the unpaid_invoices table. By definition, the unpaid_invoices table should be far smaller than the total number of invoices; calculating the outstanding amount for a given customer on the fly should be okay.

如果要避免每晚批次,可以使用触发器填充“unpaid_invoices”表。创建新发票记录时,触发器会将该发票复制到“unpaid_invoices”表中;当您使用付款更新发票,并且付款金额等于未付金额时,您将从unpaid_invoices表中删除。根据定义,unpaid_invoices表应远小于发票总数;在运行中计算给定客户的未付金额应该没问题。

However, triggers are nasty, evil things, with exotic failure modes that can stump the unsuspecting developer, so only consider this if you have a ninja SQL developer on hand. Absolutely make sure you have a SQL query which checks the validity of your unpaid_invoices table, and ideally schedule it as a regular task.

然而,触发器是令人讨厌的,邪恶的东西,具有异乎寻常的失败模式,可以阻止毫无戒心的开发人员,所以只有在你手头有一个忍者SQL开发人员时才考虑这个。绝对要确保您有一个SQL查询来检查unpaid_invoices表的有效性,并将其理想地安排为常规任务。