表的垂直分区的优点

时间:2022-09-16 11:22:14

(Note that this situation isn't exactly how it is, but I made this as an example)

(请注意,这种情况并非如此,但我以此为例)

I have an entity in a table with data that is updated every 5 seconds (Kinematic Data: Speed, Heading, Lat, Long, and PositionTime), and other data that is updated hardly at all, if ever (Color, Make, OriginTime).

我在表中有一个实体,每5秒更新一次数据(运动数据:速度,标题,纬度,长度和位置时间),以及几乎没有更新的其他数据(如果有的话)(Color,Make,OriginTime) 。

alt text http://www.freeimagehosting.net/uploads/a67205e99e.jpg

alt text http://www.freeimagehosting.net/uploads/a67205e99e.jpg

Now my boss wants me to partition this data into separate tables in our database (With a One to One Relationship), as so:

现在我的老板要我将这些数据分成我们数据库中的单独表(具有一对一关系),如下所示:

alt text http://www.freeimagehosting.net/uploads/1c699bc3c5.jpg

alt text http://www.freeimagehosting.net/uploads/1c699bc3c5.jpg

He makes it sound "obvious" that it should be this way, but is there really any advantages to having this data separated as so for inserting and updating (For instance if I put an index on Color or Make)?

他让这听起来很“明显”应该是这样的,但将这些数据分开是为了插入和更新真的有什么好处(例如,如果我在Color或Make上放一个索引)?

4 个解决方案

#1


5  

It might make sense to do vertical partitioning like this. Or it might not.

像这样进行垂直分区可能是有意义的。或者它可能不会。

When you use a MVCC based engine, each time you update a row, it generally* copies the entire row and creates a new one with the modifications. This is so that other transactions which do not yet see the update can continue to read the original row if they need to.

当您使用基于MVCC的引擎时,每次更新行时,它通常*复制整行并创建一个带有修改的新行。这样,如果需要,其他尚未看到更新的事务可以继续读取原始行。

This means that updating a few small columns frequently in a very wide row causes the database to do a lot more writes than it needs to.

这意味着频繁地在非常宽的行中更新一些小列会导致数据库执行比它需要的更多写入。

But not that many, because generally the engine will only sync its transaction log, which will be the same size regardless of the size of non-updated columns, and also because the data rows are usually stored in blocks where a whole block needs to be written anyway, regardless of how much of it changed.

但不是很多,因为通常引擎只会同步其事务日志,无论非更新列的大小如何,它都将是相同的大小,并且因为数据行通常存储在需要整个块的块中无论如何,无论多少改变都写了。

So it sounds like a potentially pointless optimisation, which like any other, should be considered on the grounds of a) IS there really a performance problem (i.e. is ANY optimisation needed) and b) Is this particular optimisation the best way of fixing it?

因此,它听起来像一个潜在的无意义优化,与其他任何一样,都应该考虑以下因素:a)确实存在性能问题(即需要进行任何优化)和b)这种特殊优化是否是修复它的最佳方法?

I think the chance of a) is unlikely, and b) is also unlikely, so the chances of this being required is approximately unlikely-squared.

我认为a)的可能性不太可能,而且b)也不太可能,因此需要这种情况的可能性几乎不大。

* Some engines make an exception for very large columns such as big BLOBs or text columns, which are held elsewhere and not copied if other columns in the row are updated.

*某些引擎会对非常大的列(如大型BLOB或文本列)进行例外处理,这些列位于其他位置,如果更新了行中的其他列,则不会复制。

#2


1  

If the point of this design is to maintain a history of the kinematic data then the design makes sense. Although there doesn't seem to be a key in the CAR_KINEMATIC table which fits that usage. If on the other there is a one-to-one relationship between these two tables the division is meritless.

如果这个设计的目的是保持运动学数据的历史,那么设计是有意义的。虽然CAR_KINEMATIC表中似乎没有符合该用法的键。如果另一方在这两个表之间存在一对一的关系,则除法是无效的。

#3


0  

I'm not sure that the question is fully clear. If you want to main a history of the kinematics then the appropriate structure would be to normalise the data into the car data and the heading data. The car data can be updated independently and would probably be much smaller overall than the kinematics data.

我不确定这个问题是否完全清楚。如果您想要掌握运动学的历史,那么适当的结构就是将数据标准化为汽车数据和航向数据。汽车数据可以独立更新,并且可能比运动学数据整体小得多。

If you want to maintain a flat record with the current state of the car, rather than maintaining a history, then leaving the data as it is is likely to be faster. The reason for this is that writing the whole record will probably entail just a single write operation in the majority of cases. Splitting it out into two tables guarantees that there will be at least two write operations.

如果您希望保持与汽车当前状态的平坦记录,而不是维护历史记录,那么保留数据可能会更快。这样做的原因是在大多数情况下写入整个记录可能只需要一次写操作。将其拆分为两个表可确保至少有两个写操作。

In the first case you are just normalising the data; in the second case the current data structure is probably the most efficient.

在第一种情况下,您只是规范化数据;在第二种情况下,当前的数据结构可能是最有效的。

Vertical partitioning is actually not that commonly used (except when it is, see below). Some scenarios where you might want to use vertical partitioning are:

垂直分区实际上并不常用(除非它是,见下文)。您可能希望使用垂直分区的一些场景是:

  • The table is very wide and only some of it is used frequently. For example, if you have a table with 250 columns with 5 getting frequent state changes updated and a small subset of columns being used frequently by the application.

    该表非常宽,只有一些经常使用。例如,如果您有一个包含250列的表,其中5列表更新了频繁的状态更改,并且应用程序经常使用一小部分列。

  • For security reasons, you may have a mix of confidential and not-so-sensitive data that lives in a 1:1 relationship. You can move the confidential data into another table with a different set of permissions. Historically, not all DBMS platforms allowed you to set permissions at column level.

    出于安全原因,您可能会以1:1的关系混合使用机密和不那么敏感的数据。您可以将机密数据移动到具有不同权限集的另一个表中。从历史上看,并非所有DBMS平台都允许您在列级别设置权限。

  • A combination of the previous two, where changes to certain fields must be logged to an audit table, but other fields are updated very frequently without the requirement for logging. In order to avoid generating lots of spurious audit logging data the auditable fields can live in their own table with audit logging triggers.

    前两者的组合,其中必须将对某些字段的更改记录到审计表中,但其他字段会非常频繁地更新,而不需要记录。为了避免生成大量虚假审计日志记录数据,可审计字段可以使用审计日志记录触发器存在于自己的表中。

Finally, you do get vertical partitioning behind the scenes in certain circumstances (i.e. it is not explicit in the schema but the physical storage works in this way). For example, many DBMS platforms store LOBs separately from normal table data, which results in a sort of implicit vertical partitioning of the table.

最后,在某些情况下,您确实在后台进行垂直分区(即,它在模式中不明确,但物理存储以这种方式工作)。例如,许多DBMS平台将LOB与正常表数据分开存储,这导致表的一种隐式垂直分区。

In fact, this particular situation makes tables with LOB columns quite expensive to do operations on, so moving the LOB column off into a separate table might well be a good application for vertical partitioning.

实际上,这种特殊情况使得具有LOB列的表在进行操作时非常昂贵,因此将LOB列移到单独的表中可能是垂直分区的良好应用。

There are not many uses for vertical partitioning, and it always adds the overhead of additional I/O. You need to be avoiding a large overhead or have specific reasons such as security concerns for there to be much point in using it.

垂直分区的用途并不多,而且总是会增加额外I / O的开销。您需要避免大量开销或具有特定原因(例如安全性问题)才能使用它。

#4


0  

Your boss is right. And this has nothing to do with "partitioning", its called normalization.

你的老板是对的。这与“分区”无关,它称为规范化。

Read this article.

阅读这篇文章。

EDIT: Ok, "vertical partitioning" is a well-known term, and normalization is one method of vertical partitioning. But in this case, normalization seems to be the right answer, which explains the question (Quote: "... is there really any advantages to having this data separated as so for inserting and updating"). Advantages and disadvantages of normalization are very well-known. The wikipedia article is a good starting point.

编辑:好的,“垂直分区”是一个众所周知的术语,规范化是垂直分区的一种方法。但在这种情况下,规范化似乎是正确的答案,这解释了这个问题(引用:“......将这些数据分开以实现插入和更新真的有任何优势”)。归一化的优点和缺点是众所周知的。*的文章是一个很好的起点。

And BTW, to keep the flames of "Erwin Smout" burning: "vertical decomposition" does not seem to be a commonly used term here. Right?

而BTW,为了保持“Erwin Smout”燃烧的火焰:“垂直分解”似乎不是这里常用的术语。对?

#1


5  

It might make sense to do vertical partitioning like this. Or it might not.

像这样进行垂直分区可能是有意义的。或者它可能不会。

When you use a MVCC based engine, each time you update a row, it generally* copies the entire row and creates a new one with the modifications. This is so that other transactions which do not yet see the update can continue to read the original row if they need to.

当您使用基于MVCC的引擎时,每次更新行时,它通常*复制整行并创建一个带有修改的新行。这样,如果需要,其他尚未看到更新的事务可以继续读取原始行。

This means that updating a few small columns frequently in a very wide row causes the database to do a lot more writes than it needs to.

这意味着频繁地在非常宽的行中更新一些小列会导致数据库执行比它需要的更多写入。

But not that many, because generally the engine will only sync its transaction log, which will be the same size regardless of the size of non-updated columns, and also because the data rows are usually stored in blocks where a whole block needs to be written anyway, regardless of how much of it changed.

但不是很多,因为通常引擎只会同步其事务日志,无论非更新列的大小如何,它都将是相同的大小,并且因为数据行通常存储在需要整个块的块中无论如何,无论多少改变都写了。

So it sounds like a potentially pointless optimisation, which like any other, should be considered on the grounds of a) IS there really a performance problem (i.e. is ANY optimisation needed) and b) Is this particular optimisation the best way of fixing it?

因此,它听起来像一个潜在的无意义优化,与其他任何一样,都应该考虑以下因素:a)确实存在性能问题(即需要进行任何优化)和b)这种特殊优化是否是修复它的最佳方法?

I think the chance of a) is unlikely, and b) is also unlikely, so the chances of this being required is approximately unlikely-squared.

我认为a)的可能性不太可能,而且b)也不太可能,因此需要这种情况的可能性几乎不大。

* Some engines make an exception for very large columns such as big BLOBs or text columns, which are held elsewhere and not copied if other columns in the row are updated.

*某些引擎会对非常大的列(如大型BLOB或文本列)进行例外处理,这些列位于其他位置,如果更新了行中的其他列,则不会复制。

#2


1  

If the point of this design is to maintain a history of the kinematic data then the design makes sense. Although there doesn't seem to be a key in the CAR_KINEMATIC table which fits that usage. If on the other there is a one-to-one relationship between these two tables the division is meritless.

如果这个设计的目的是保持运动学数据的历史,那么设计是有意义的。虽然CAR_KINEMATIC表中似乎没有符合该用法的键。如果另一方在这两个表之间存在一对一的关系,则除法是无效的。

#3


0  

I'm not sure that the question is fully clear. If you want to main a history of the kinematics then the appropriate structure would be to normalise the data into the car data and the heading data. The car data can be updated independently and would probably be much smaller overall than the kinematics data.

我不确定这个问题是否完全清楚。如果您想要掌握运动学的历史,那么适当的结构就是将数据标准化为汽车数据和航向数据。汽车数据可以独立更新,并且可能比运动学数据整体小得多。

If you want to maintain a flat record with the current state of the car, rather than maintaining a history, then leaving the data as it is is likely to be faster. The reason for this is that writing the whole record will probably entail just a single write operation in the majority of cases. Splitting it out into two tables guarantees that there will be at least two write operations.

如果您希望保持与汽车当前状态的平坦记录,而不是维护历史记录,那么保留数据可能会更快。这样做的原因是在大多数情况下写入整个记录可能只需要一次写操作。将其拆分为两个表可确保至少有两个写操作。

In the first case you are just normalising the data; in the second case the current data structure is probably the most efficient.

在第一种情况下,您只是规范化数据;在第二种情况下,当前的数据结构可能是最有效的。

Vertical partitioning is actually not that commonly used (except when it is, see below). Some scenarios where you might want to use vertical partitioning are:

垂直分区实际上并不常用(除非它是,见下文)。您可能希望使用垂直分区的一些场景是:

  • The table is very wide and only some of it is used frequently. For example, if you have a table with 250 columns with 5 getting frequent state changes updated and a small subset of columns being used frequently by the application.

    该表非常宽,只有一些经常使用。例如,如果您有一个包含250列的表,其中5列表更新了频繁的状态更改,并且应用程序经常使用一小部分列。

  • For security reasons, you may have a mix of confidential and not-so-sensitive data that lives in a 1:1 relationship. You can move the confidential data into another table with a different set of permissions. Historically, not all DBMS platforms allowed you to set permissions at column level.

    出于安全原因,您可能会以1:1的关系混合使用机密和不那么敏感的数据。您可以将机密数据移动到具有不同权限集的另一个表中。从历史上看,并非所有DBMS平台都允许您在列级别设置权限。

  • A combination of the previous two, where changes to certain fields must be logged to an audit table, but other fields are updated very frequently without the requirement for logging. In order to avoid generating lots of spurious audit logging data the auditable fields can live in their own table with audit logging triggers.

    前两者的组合,其中必须将对某些字段的更改记录到审计表中,但其他字段会非常频繁地更新,而不需要记录。为了避免生成大量虚假审计日志记录数据,可审计字段可以使用审计日志记录触发器存在于自己的表中。

Finally, you do get vertical partitioning behind the scenes in certain circumstances (i.e. it is not explicit in the schema but the physical storage works in this way). For example, many DBMS platforms store LOBs separately from normal table data, which results in a sort of implicit vertical partitioning of the table.

最后,在某些情况下,您确实在后台进行垂直分区(即,它在模式中不明确,但物理存储以这种方式工作)。例如,许多DBMS平台将LOB与正常表数据分开存储,这导致表的一种隐式垂直分区。

In fact, this particular situation makes tables with LOB columns quite expensive to do operations on, so moving the LOB column off into a separate table might well be a good application for vertical partitioning.

实际上,这种特殊情况使得具有LOB列的表在进行操作时非常昂贵,因此将LOB列移到单独的表中可能是垂直分区的良好应用。

There are not many uses for vertical partitioning, and it always adds the overhead of additional I/O. You need to be avoiding a large overhead or have specific reasons such as security concerns for there to be much point in using it.

垂直分区的用途并不多,而且总是会增加额外I / O的开销。您需要避免大量开销或具有特定原因(例如安全性问题)才能使用它。

#4


0  

Your boss is right. And this has nothing to do with "partitioning", its called normalization.

你的老板是对的。这与“分区”无关,它称为规范化。

Read this article.

阅读这篇文章。

EDIT: Ok, "vertical partitioning" is a well-known term, and normalization is one method of vertical partitioning. But in this case, normalization seems to be the right answer, which explains the question (Quote: "... is there really any advantages to having this data separated as so for inserting and updating"). Advantages and disadvantages of normalization are very well-known. The wikipedia article is a good starting point.

编辑:好的,“垂直分区”是一个众所周知的术语,规范化是垂直分区的一种方法。但在这种情况下,规范化似乎是正确的答案,这解释了这个问题(引用:“......将这些数据分开以实现插入和更新真的有任何优势”)。归一化的优点和缺点是众所周知的。*的文章是一个很好的起点。

And BTW, to keep the flames of "Erwin Smout" burning: "vertical decomposition" does not seem to be a commonly used term here. Right?

而BTW,为了保持“Erwin Smout”燃烧的火焰:“垂直分解”似乎不是这里常用的术语。对?