如何最好地处理历史数据的存储?

时间:2021-11-10 16:58:22

I'm trying to determine how I should store historical transactional data.

我正在尝试确定如何存储历史交易数据。

Should I store it in a single table where the record just gets reinserted with a new timestamp each time?

我应该将它存储在一个表中,每次重新插入记录时都会重新插入新的时间戳吗?

Should I break out the historical data into a separate 'history' table and only keep current data in the 'active' table.

我应该将历史数据分解为单独的“历史”表,并仅将当前数据保存在“活动”表中。

If so, how do I best do that? With a trigger that automatically copies the data to the history table? Or with logic in my application?

如果是这样,我该如何做到最好?使用触发器自动将数据复制到历史表?或者我的应用程序中有逻辑?

Update per Welbog's comment:

根据Welbog的评论更新:

There will be large amounts of historical data (hundreds of thousands of rows - eventually potentially millions)

将有大量的历史数据(数十万行 - 最终可能数百万)

Primarily searches and reporting operations will be run on the historical data.

主要是对历史数据进行搜索和报告操作。

Performance is a concern. The searches shouldn't have to run all night to produce results.

性能是一个问题。搜索不应该整夜运行以产生结果。

2 个解决方案

#1


8  

If the requirement is solely for reporting, consider building a separate data warehouse. This lets you use data structures like slowly changing dimensions that are much better for historical reporting but don't work well in a transactional system. The resulting combination also moves the historical reporting off your production database which will be a performance and maintenance win.

如果要求仅用于报告,请考虑构建单独的数据仓库。这使您可以使用数据结构,例如缓慢变化的维度,这些数据对历史报告更好,但在事务系统中不能很好地工作。由此产生的组合还会将历史报告从生产数据库中移除,这将是性能和维护的胜利。

If you need this history to be available within the application then you should implement some sort of versioning or logical deletion feature or make everything fully contra and restate (i.e. transactions never get deleted, just reversed out and restated). Think very carefully about whether you really need this as it will add a lot of complexity. Making a transactional application that can reconstruct historical state correctly is considerably harder than it looks. Financial software (e.g. insurance underwriting sytems) fails to do this a lot more than you might think.

如果您需要在应用程序中使用此历史记录,那么您应该实现某种版本控制或逻辑删除功能,或者使所有内容完全反对并重新进行(即事务永远不会被删除,只需撤消并重新进行重述)。仔细考虑是否真的需要它,因为它会增加很多复杂性。制作可以正确重建历史状态的事务性应用程序比看起来要困难得多。财务软件(例如保险承保系统)未能比您想象的那么多。

If you need the history solely for audit logging, make shadow tables and audit logging triggers. This is much simpler and more robust than trying to correctly and comprehensively implement audit logging within the application. The triggers will also pick up changes to the database from sources outside the application.

如果您只需要历史记录用于审计日志记录,请创建影子表和审计日志记录触发器。与在应用程序中正确和全面地实现审计日志记录相比,这更加简单和强大。触发器还将从应用程序外部的源中获取对数据库的更改。

#2


2  

This question goes along the line of Business Logic. Know your business requirements first then start from there. A Data Warehouse is a nice solution for this kind of situation. ETL will give you lots of options in dealing with data flows. Your basic concept of 'History' vs 'Active' is quite correct. Your history data will be more efficient and flexible if kept in a data warehouse with all their dimension and fact tables.

这个问题沿着商业逻辑的路线。首先了解您的业务需求,然后从那里开始。对于这种情况,数据仓库是一个很好的解决方案。 ETL将为您提供处理数据流的许多选项。您对“历史”与“活跃”的基本概念非常正确。如果将数据仓库中的所有维度和事实表保存在数据仓库中,那么您的历史数据将更加高效和灵活。

#1


8  

If the requirement is solely for reporting, consider building a separate data warehouse. This lets you use data structures like slowly changing dimensions that are much better for historical reporting but don't work well in a transactional system. The resulting combination also moves the historical reporting off your production database which will be a performance and maintenance win.

如果要求仅用于报告,请考虑构建单独的数据仓库。这使您可以使用数据结构,例如缓慢变化的维度,这些数据对历史报告更好,但在事务系统中不能很好地工作。由此产生的组合还会将历史报告从生产数据库中移除,这将是性能和维护的胜利。

If you need this history to be available within the application then you should implement some sort of versioning or logical deletion feature or make everything fully contra and restate (i.e. transactions never get deleted, just reversed out and restated). Think very carefully about whether you really need this as it will add a lot of complexity. Making a transactional application that can reconstruct historical state correctly is considerably harder than it looks. Financial software (e.g. insurance underwriting sytems) fails to do this a lot more than you might think.

如果您需要在应用程序中使用此历史记录,那么您应该实现某种版本控制或逻辑删除功能,或者使所有内容完全反对并重新进行(即事务永远不会被删除,只需撤消并重新进行重述)。仔细考虑是否真的需要它,因为它会增加很多复杂性。制作可以正确重建历史状态的事务性应用程序比看起来要困难得多。财务软件(例如保险承保系统)未能比您想象的那么多。

If you need the history solely for audit logging, make shadow tables and audit logging triggers. This is much simpler and more robust than trying to correctly and comprehensively implement audit logging within the application. The triggers will also pick up changes to the database from sources outside the application.

如果您只需要历史记录用于审计日志记录,请创建影子表和审计日志记录触发器。与在应用程序中正确和全面地实现审计日志记录相比,这更加简单和强大。触发器还将从应用程序外部的源中获取对数据库的更改。

#2


2  

This question goes along the line of Business Logic. Know your business requirements first then start from there. A Data Warehouse is a nice solution for this kind of situation. ETL will give you lots of options in dealing with data flows. Your basic concept of 'History' vs 'Active' is quite correct. Your history data will be more efficient and flexible if kept in a data warehouse with all their dimension and fact tables.

这个问题沿着商业逻辑的路线。首先了解您的业务需求,然后从那里开始。对于这种情况,数据仓库是一个很好的解决方案。 ETL将为您提供处理数据流的许多选项。您对“历史”与“活跃”的基本概念非常正确。如果将数据仓库中的所有维度和事实表保存在数据仓库中,那么您的历史数据将更加高效和灵活。

相关文章