存储和搜索对象事务的最佳方法是什么?

时间:2022-11-11 16:50:45

We have a decent sized object-oriented application. Whenever an object in the app is changed, the object changes are saved back to the DB. However, this has become less than ideal.

我们有一个体面的面向对象的应用程序。每当应用程序中的对象发生更改时,对象更改都会保存回DB。然而,这已经不太理想了。

Currently, transactions are stored as a transaction and a set of transactionLI's.

目前,交易存储为事务和一组transactionLI。

The transaction table has fields for who, what, when, why, foreignKey, and foreignTable. The first four are self-explanatory. ForeignKey and foreignTable are used to determine which object changed.

事务表包含who,what,when,why,foreignKey和foreignTable的字段。前四个是不言自明的。 ForeignKey和foreignTable用于确定更改了哪个对象。

TransactionLI has timestamp, key, val, oldVal, and a transactionID. This is basically a key/value/oldValue storage system.

TransactionLI具有timestamp,key,val,oldVal和transactionID。这基本上是一个键/值/ oldValue存储系统。

The problem is that these two tables are used for every object in the application, so they're pretty big tables now. Using them for anything is slow. Indexes only help so much.

问题是这两个表用于应用程序中的每个对象,所以它们现在是相当大的表。将它们用于任何事情都很慢。索引只有很多帮助。

So we're thinking about other ways to do something like this. Things we've considered so far: - Sharding these tables by something like the timestamp. - Denormalizing the two tables and merge them into one. - A combination of the two above. - Doing something along the lines of serializing each object after a change and storing it in subversion. - Probably something else, but I can't think of it right now.

所以我们正在考虑其他方法来做这样的事情。到目前为止我们已经考虑过的事情: - 通过类似时间戳的方式对这些表进行分片。 - 对两个表进行非规范化并将它们合并为一个表。 - 上述两者的组合。 - 在更改后将每个对象序列化并将其存储在subversion中。 - 可能是别的什么,但我现在想不到它。

The whole problem is that we'd like to have some mechanism for properly storing and searching through transactional data. Yeah you can force feed that into a relational database, but really, it's transactional data and should be stored accordingly.

整个问题是我们希望有一些机制来正确存储和搜索事务数据。是的,你可以强制进入关系数据库,但实际上,它是事务性数据,应该相应地存储。

What is everyone else doing?

其他人在做什么?

4 个解决方案

#1


1  

We have taken the following approach:-

我们采取了以下方法: -

  1. All objects are serialised (using the standard XMLSeriliser) but we have decorated our classes with serialisation attributes so that the resultant XML is much smaller (storing elements as attributes and dropping vowels on field names for example). This could be taken a stage further by compressing the XML if necessary.

    所有对象都是序列化的(使用标准的XMLSeriliser),但我们使用序列化属性修饰了我们的类,以便生成的XML要小得多(例如,将元素存储为属性并在字段名称上删除元音)。如果需要,可以通过压缩XML来进一步采取这一步骤。

  2. The object repository is accessed via a SQL view. The view fronts a number of tables that are identical in structure but the table name appended with a GUID. A new table is generated when the previous table has reached critical mass (a pre-determined number of rows)

    可以通过SQL视图访问对象存储库。视图前面有许多表结构相同但表名附加GUID的表。当前一个表达到临界质量(预定行数)时生成一个新表

  3. We run a nightly archiving routine that generates the new tables and modifies the views accordingly so that calling applications do not see any differences.

    我们运行一个夜间归档例程,生成新表并相应地修改视图,以便调用应用程序看不到任何差异。

  4. Finally, as part of the overnight routine we archive any old object instances that are no longer required to disk (and then tape).

    最后,作为隔夜例程的一部分,我们将任何不再需要的旧对象实例归档到磁盘(然后是磁带)。

#2


0  

I've never found a great end all solution for this type of problem. Some things you can try is if your DB supports partioning (or even if it doesn't you can implement the same concept your self), but partion this log table by object type and then you can further partion by date/time or by your object ID (if your ID is a numeric this works nicely not sure how a guid would partion).

我从来没有找到一个很好的结束所有解决这类问题的方法。您可以尝试的一些事情是,如果您的数据库支持分区(或者即使它不能实现您自己的相同概念),但按对象类型分配此日志表,然后您可以按日期/时间或按您的方式进一步分配对象ID(如果你的ID是一个数字,这很好用,不确定guid会如何分配)。

This will help maintain the size of the table and keep all related transactions to a single instance of an object to itself.

这将有助于维护表的大小,并将所有相关事务保持为对象的单个实例。

One idea you could explore is instead of storing each field in a name value pair table, you could store the data as a blob (either text or binary). For example serialize the object to Xml and store it in a field.

您可以探索的一个想法是,不是将每个字段存储在名称值对表中,而是可以将数据存储为blob(文本或二进制)。例如,将对象序列化为Xml并将其存储在字段中。

The downside of this is that as your object changes you have to consider how this affects all historical data if your using Xml then there are easy ways to update the historical xml structures, if your using binary there are ways but you have to be more concious of the effort.

这样做的缺点是,当您的对象发生变化时,您必须考虑如果您使用Xml会如何影响所有历史数据,那么有更简单的方法来更新历史xml结构,如果您使用二进制文件有方法,但您必须更加有意思努力。

I've had awsome success storing a rather complex object model that has tons of interelations as a blob (the xml serializer in .net didn't handle the relationships btw the objects). I could very easily see myself storing the binary data. A huge downside of storing it as binary data is that to access it you have to take it out of the database with Xml if your using a modern database like MSSQL you can access the data.

我已经取得了很大的成功,存储了一个相当复杂的对象模型,它具有大量的互联作为blob(.net中的xml序列化器没有处理对象之间的关系)。我很容易看到自己存储二进制数据。将其存储为二进制数据的一个巨大缺点是要访问它,如果使用像MSSQL这样的现代数据库可以访问数据,则必须使用Xml将其从数据库中取出。

One last approach is to split the two patterns, you could define a Difference Schema (and I assume more then one property changes at a time) so for example imagine storing this xml:

最后一种方法是拆分两个模式,你可以定义一个差异模式(我假设一次更改一个属性),例如想象存储这个xml:

<objectDiff>
<field name="firstName" newValue="Josh" oldValue="joshua"/>
<field name="lastName" newValue="Box" oldValue="boxer"/>
</objectDiff>

This will help alleviate the number of rows, and if your using MSSQL you can define an XML Schema and get some of the rich querying ability around the object. You can still partition the table.

这将有助于减少行数,如果使用MSSQL,您可以定义XML Schema并获得围绕对象的一些丰富的查询功能。您仍然可以对表进行分区。

Josh

#3


0  

Depending on the characteristics of your specific application an alternative approach is to keep revisions of the entities themselves in their respective tables, together with the who, what, why and when per revision. The who, what and when can still be foreign keys.

根据您的特定应用程序的特征,另一种方法是将实体本身的修订保留在各自的表中,以及每个修订版的人员,内容,原因和时间。谁,什么,什么时候仍然是外键。

Although I would be very careful to use this approach, since this is only viable for applications with a relatively small amount of changes per entity/entity type.

虽然我会非常小心地使用这种方法,因为这仅适用于每个实体/实体类型具有相对少量更改的应用程序。

#4


0  

If querying the data is important I would use true Partitioning in SQL Server 2005 and above if you have enterprise edition of SQL Server. We have millions of rows partitioned by year down to day for the current month - you can be as granular as your application demands with a maximum number of 1000 partitions.

如果查询数据很重要,如果你有SQL Server企业版,我会在SQL Server 2005及更高版本中使用true Partitioning。我们有数百万行在当月按年分割 - 您可以按照应用程序的要求进行细化,最多可以有1000个分区。

Alternatively , if you are using SQL 2008 you could look into filtered indexes.

或者,如果您使用的是SQL 2008,则可以查看已过滤的索引。

These are solutions that will enable you to retain the simplified structure you have whilst providing the performance you need to query that data.

这些解决方案使您能够保留简化的结构,同时提供查询数据所需的性能。

Splitting/Archiving older changes obviously should be considered.

应该考虑拆分/存档较旧的更改。

#1


1  

We have taken the following approach:-

我们采取了以下方法: -

  1. All objects are serialised (using the standard XMLSeriliser) but we have decorated our classes with serialisation attributes so that the resultant XML is much smaller (storing elements as attributes and dropping vowels on field names for example). This could be taken a stage further by compressing the XML if necessary.

    所有对象都是序列化的(使用标准的XMLSeriliser),但我们使用序列化属性修饰了我们的类,以便生成的XML要小得多(例如,将元素存储为属性并在字段名称上删除元音)。如果需要,可以通过压缩XML来进一步采取这一步骤。

  2. The object repository is accessed via a SQL view. The view fronts a number of tables that are identical in structure but the table name appended with a GUID. A new table is generated when the previous table has reached critical mass (a pre-determined number of rows)

    可以通过SQL视图访问对象存储库。视图前面有许多表结构相同但表名附加GUID的表。当前一个表达到临界质量(预定行数)时生成一个新表

  3. We run a nightly archiving routine that generates the new tables and modifies the views accordingly so that calling applications do not see any differences.

    我们运行一个夜间归档例程,生成新表并相应地修改视图,以便调用应用程序看不到任何差异。

  4. Finally, as part of the overnight routine we archive any old object instances that are no longer required to disk (and then tape).

    最后,作为隔夜例程的一部分,我们将任何不再需要的旧对象实例归档到磁盘(然后是磁带)。

#2


0  

I've never found a great end all solution for this type of problem. Some things you can try is if your DB supports partioning (or even if it doesn't you can implement the same concept your self), but partion this log table by object type and then you can further partion by date/time or by your object ID (if your ID is a numeric this works nicely not sure how a guid would partion).

我从来没有找到一个很好的结束所有解决这类问题的方法。您可以尝试的一些事情是,如果您的数据库支持分区(或者即使它不能实现您自己的相同概念),但按对象类型分配此日志表,然后您可以按日期/时间或按您的方式进一步分配对象ID(如果你的ID是一个数字,这很好用,不确定guid会如何分配)。

This will help maintain the size of the table and keep all related transactions to a single instance of an object to itself.

这将有助于维护表的大小,并将所有相关事务保持为对象的单个实例。

One idea you could explore is instead of storing each field in a name value pair table, you could store the data as a blob (either text or binary). For example serialize the object to Xml and store it in a field.

您可以探索的一个想法是,不是将每个字段存储在名称值对表中,而是可以将数据存储为blob(文本或二进制)。例如,将对象序列化为Xml并将其存储在字段中。

The downside of this is that as your object changes you have to consider how this affects all historical data if your using Xml then there are easy ways to update the historical xml structures, if your using binary there are ways but you have to be more concious of the effort.

这样做的缺点是,当您的对象发生变化时,您必须考虑如果您使用Xml会如何影响所有历史数据,那么有更简单的方法来更新历史xml结构,如果您使用二进制文件有方法,但您必须更加有意思努力。

I've had awsome success storing a rather complex object model that has tons of interelations as a blob (the xml serializer in .net didn't handle the relationships btw the objects). I could very easily see myself storing the binary data. A huge downside of storing it as binary data is that to access it you have to take it out of the database with Xml if your using a modern database like MSSQL you can access the data.

我已经取得了很大的成功,存储了一个相当复杂的对象模型,它具有大量的互联作为blob(.net中的xml序列化器没有处理对象之间的关系)。我很容易看到自己存储二进制数据。将其存储为二进制数据的一个巨大缺点是要访问它,如果使用像MSSQL这样的现代数据库可以访问数据,则必须使用Xml将其从数据库中取出。

One last approach is to split the two patterns, you could define a Difference Schema (and I assume more then one property changes at a time) so for example imagine storing this xml:

最后一种方法是拆分两个模式,你可以定义一个差异模式(我假设一次更改一个属性),例如想象存储这个xml:

<objectDiff>
<field name="firstName" newValue="Josh" oldValue="joshua"/>
<field name="lastName" newValue="Box" oldValue="boxer"/>
</objectDiff>

This will help alleviate the number of rows, and if your using MSSQL you can define an XML Schema and get some of the rich querying ability around the object. You can still partition the table.

这将有助于减少行数,如果使用MSSQL,您可以定义XML Schema并获得围绕对象的一些丰富的查询功能。您仍然可以对表进行分区。

Josh

#3


0  

Depending on the characteristics of your specific application an alternative approach is to keep revisions of the entities themselves in their respective tables, together with the who, what, why and when per revision. The who, what and when can still be foreign keys.

根据您的特定应用程序的特征,另一种方法是将实体本身的修订保留在各自的表中,以及每个修订版的人员,内容,原因和时间。谁,什么,什么时候仍然是外键。

Although I would be very careful to use this approach, since this is only viable for applications with a relatively small amount of changes per entity/entity type.

虽然我会非常小心地使用这种方法,因为这仅适用于每个实体/实体类型具有相对少量更改的应用程序。

#4


0  

If querying the data is important I would use true Partitioning in SQL Server 2005 and above if you have enterprise edition of SQL Server. We have millions of rows partitioned by year down to day for the current month - you can be as granular as your application demands with a maximum number of 1000 partitions.

如果查询数据很重要,如果你有SQL Server企业版,我会在SQL Server 2005及更高版本中使用true Partitioning。我们有数百万行在当月按年分割 - 您可以按照应用程序的要求进行细化,最多可以有1000个分区。

Alternatively , if you are using SQL 2008 you could look into filtered indexes.

或者,如果您使用的是SQL 2008,则可以查看已过滤的索引。

These are solutions that will enable you to retain the simplified structure you have whilst providing the performance you need to query that data.

这些解决方案使您能够保留简化的结构,同时提供查询数据所需的性能。

Splitting/Archiving older changes obviously should be considered.

应该考虑拆分/存档较旧的更改。