增量列是否使列上的b-tree索引不平衡?

时间:2022-08-27 10:59:02

I have been thinking about two questions. Couldn't find any resources on the internet about this. How do dbms handle it ? Or do they ? Especially Oracle.

我一直在考虑两个问题。无法在互联网上找到任何关于此的资源。 dbms如何处理它?或者他们呢?特别是Oracle。

Before the questions, here is an example: Say I have a master table "MASTER" and slave table "SLAVE". Master table has an "ID" column which is the primary key and index is created by Oracle.Slave table has the foreign key "MASTER_ID" which refers to master table and "SLAVE_NO". These two together is the primary key of slave table, which is again indexed.

在提出问题之前,这里有一个例子:假设我有一个主表“MASTER”和从表“SLAVE”。主表有一个“ID”列,它是主键,索引由Oracle创建.Slave表具有引用主表和“SLAVE_NO”的外键“MASTER_ID”。这两个一起是slave表的主键,它再次被索引。

 **MASTER**  |  **SLAVE**
     (P) ID <------> (P)(F) MASTER_ID 
                     (P) SLAVE_NO

Now the questions;

现在的问题;

1- If MASTER_ID is an autoincremented column, and no record is ever deleted, doesn't this get the table's index unbalanced ? Does Oracle rebuilds indexes periodically ? As far as i know Oracle only balances index branches at build time. Does Oracle re-builds indexes Automatically ever ? Say if the level goes up to some high levels ?

1-如果MASTER_ID是一个自动增量列,并且没有删除任何记录,这是否会使表的索引不平衡? Oracle是否会定期重建索引?据我所知,Oracle只在构建时平衡索引分支。 Oracle是否会自动重建索引?如果水平上升到某个高水平?

2- Assuming Oracle does not rebuild automatically, apart from scheduling a job that rebuilds index periodically, would it be wiser to order SLAVE table's primary key columns reverse ? I mean instead of "MASTER_ID", "SLAVE_NO" ordering it as "SLAVE_NO", "MASTER_ID"i, would it help the slave table's b-tree index be more balanced ? (Well each master table might not have exact number of slave records, but still, seems better than reverse order)

2-假设Oracle不会自动重建,除了安排定期重建索引的作业之外,订购SLAVE表的主键列是否更明智?我的意思是代替“MASTER_ID”,“SLAVE_NO”将其命名为“SLAVE_NO”,“MASTER_ID”i,它会帮助奴隶表的b树索引更加平衡吗? (那么每个主表可能没有确切数量的从属记录,但仍然看起来好于逆序)

Anyone know anything about that ? Or opinions ?

任何人都知道这件事吗?还是意见?

2 个解决方案

#1


7  

If MASTER_ID is an autoincremented column, and no record is ever deleted, doesn't this get the table's index unbalanced ?

如果MASTER_ID是一个自动增量列,并且没有删除任何记录,这是否会使表的索引不平衡?

Oracle's indexes are never "unbalanced": every leaf in the index is at the same depth as any other leaf.

Oracle的索引永远不会“不平衡”:索引中的每个叶子都与任何其他叶子处于同一深度。

No page split introduces a new level by itself: a leaf page does not become a parent for new pages like it would be on a non-self-balancing tree.

没有页面拆分本身会引入一个新的级别:叶子页面不会成为新页面的父级,就像在非自平衡树上一样。

Instead, a sibling for the split page is made and the new record (plus possibly some of the records from the old page) go to the new page. A pointer to the new page is added to the parent.

相反,会生成拆分页面的兄弟,并且新记录(加上可能是旧页面中的一些记录)将转到新页面。指向新页面的指针将添加到父页面。

If the parent page is out of space too (can't accept the pointer to the newly created leaf page), it gets split as well, and so on.

如果父页面也是空间不足(无法接受指向新创建的叶页的指针),它也会被拆分,依此类推。

These splits can propagate up to the root page, whose split is the only thing which increases the index depth (and does it for all pages at once).

这些拆分可以传播到根页面,根页面的分割是增加索引深度的唯一因素(并且一次性对所有页面执行此操作)。

Index pages are additionally organized into double-linked lists, each list on its own level. This would be impossible if the tree were unbalanced.

索引页面另外被组织成双链表,每个列表都在其自己的级别上。如果树不平衡,这将是不可能的。

If master_id is auto-incremented this means that all splits occur at the end (such called 90/10 splits) which makes the most dense index possible.

如果master_id是自动递增的,则意味着所有拆分都在最后发生(例如称为90/10拆分),这使得最密集的索引成为可能。

Would it be wiser to order SLAVE table's primary key columns reverse?

订购SLAVE表的主键列是否更明智?

No, it would not, for the reasons above.

不,由于上述原因,它不会。

If you join slave to master often, you may consider creating a CLUSTER of the two tables, indexed by master_id. This means that the records from both tables, sharing the same master_id, go to the same or nearby data pages which makes a join between them very fast.

如果经常将slave连接到master,您可以考虑创建两个表的CLUSTER,由master_id索引。这意味着来自两个表的共享相同master_id的记录转到相同或附近的数据页面,这使得它们之间的连接非常快。

When the engine found a record from master, with an index or whatever, this also means it already has found the records from slave to be joined with that master. And vice versa, locating a slave also means locating its master.

当引擎从master发现一条记录,带有索引或其他内容时,这也意味着它已经找到了slave中与该master连接的记录。反之亦然,定位奴隶也意味着找到它的主人。

#2


4  

The b-tree index on MASTER_ID will remain balanced for most useful definitions of "balanced". In particular, the distance between the root block and any child block will always be the same and the amount of data in any branch will be at least roughly euqal to the amount of data in any other sibling branch.

对于“平衡”的大多数有用定义,MASTER_ID上的b树索引将保持平衡。特别是,根块和任何子块之间的距离将始终相同,并且任何分支中的数据量将至少与任何其他同级分支中的数据量大致相等。

As you insert data into an index on a sequence-generated column, Oracle will perform 90-10 block splits on the leaves when the amount of data in any particular level increases. If, for example, you have a leaf block that can hold 10 rows, when it is full and you want to add an 11th row, Oracle creates a new block, leaves the first 9 entries in the first block, puts 2 entries in the new block, and updates the parent block with the address of the new block. If the parent block needs to split because it is holding addresses for too many children, a similar process takes place. That leaves the indexes relatively balanced throughout their life. Richard Foote (the expert on Oracle indexes) has an excellent blog on When Does an Oracle Index Increase in Height that goes into much more detail about how this works.

当您将数据插入到序列生成列的索引中时,当任何特定级别的数据量增加时,Oracle将对叶子执行90-10个块分割。例如,如果您有一个可以容纳10行的叶子块,当它已满并且您想要添加第11行时,Oracle会创建一个新块,在第一个块中保留前9个条目,将2个条目放入新块,并使用新块的地址更新父块。如果父块需要拆分,因为它为太多子节点保存地址,则会发生类似的过程。这使得指数在其整个生命中相对平衡。 Richard Foote(Oracle索引专家)有一篇很棒的博客,关于何时Oracle索引增加高度,更详细地介绍了它的工作原理。

The only case that you potentially have to worry about an index becoming skewed is when you regularly delete most but not all blocks from the left-hand side of the index. If, for example, you decide to delete 90% of the data from the left-hand side of the index leaving one row in every leaf node pointing to data, your index can become unbalanced in the sense that some paths lead to vastly more data than other paths. Oracle doesn't reclaim index leaf nodes until they are completely empty so you can end up with the left-hand side of the index using a lot more space than is really necessary. This doesn't really affect the performance of the system much-- it is more of a space utilization issue-- but it can be fixed by coalescing the index after you've purged the data (or structuring your purges so that you don't leave lots of sparse blocks).

您可能不得不担心索引变得偏斜的唯一情况是,您经常从索引的左侧删除大多数但不是所有块。例如,如果您决定从索引的左侧删除90%的数据,而每个叶节点中的一行指向数据,那么在某些路径导致数据量大得多的意义上,您的索引可能会变得不平衡比其他路径。 Oracle不会回收索引叶子节点,直到它们完全为空,因此您可以使用比实际需要更多的空间来结束索引的左侧。这并不会真正影响系统的性能 - 它更多地是一个空间利用问题 - 但是可以通过在清除数据后合并索引来修复(或者构建清除以便你不要留下许多稀疏的块)。

#1


7  

If MASTER_ID is an autoincremented column, and no record is ever deleted, doesn't this get the table's index unbalanced ?

如果MASTER_ID是一个自动增量列,并且没有删除任何记录,这是否会使表的索引不平衡?

Oracle's indexes are never "unbalanced": every leaf in the index is at the same depth as any other leaf.

Oracle的索引永远不会“不平衡”:索引中的每个叶子都与任何其他叶子处于同一深度。

No page split introduces a new level by itself: a leaf page does not become a parent for new pages like it would be on a non-self-balancing tree.

没有页面拆分本身会引入一个新的级别:叶子页面不会成为新页面的父级,就像在非自平衡树上一样。

Instead, a sibling for the split page is made and the new record (plus possibly some of the records from the old page) go to the new page. A pointer to the new page is added to the parent.

相反,会生成拆分页面的兄弟,并且新记录(加上可能是旧页面中的一些记录)将转到新页面。指向新页面的指针将添加到父页面。

If the parent page is out of space too (can't accept the pointer to the newly created leaf page), it gets split as well, and so on.

如果父页面也是空间不足(无法接受指向新创建的叶页的指针),它也会被拆分,依此类推。

These splits can propagate up to the root page, whose split is the only thing which increases the index depth (and does it for all pages at once).

这些拆分可以传播到根页面,根页面的分割是增加索引深度的唯一因素(并且一次性对所有页面执行此操作)。

Index pages are additionally organized into double-linked lists, each list on its own level. This would be impossible if the tree were unbalanced.

索引页面另外被组织成双链表,每个列表都在其自己的级别上。如果树不平衡,这将是不可能的。

If master_id is auto-incremented this means that all splits occur at the end (such called 90/10 splits) which makes the most dense index possible.

如果master_id是自动递增的,则意味着所有拆分都在最后发生(例如称为90/10拆分),这使得最密集的索引成为可能。

Would it be wiser to order SLAVE table's primary key columns reverse?

订购SLAVE表的主键列是否更明智?

No, it would not, for the reasons above.

不,由于上述原因,它不会。

If you join slave to master often, you may consider creating a CLUSTER of the two tables, indexed by master_id. This means that the records from both tables, sharing the same master_id, go to the same or nearby data pages which makes a join between them very fast.

如果经常将slave连接到master,您可以考虑创建两个表的CLUSTER,由master_id索引。这意味着来自两个表的共享相同master_id的记录转到相同或附近的数据页面,这使得它们之间的连接非常快。

When the engine found a record from master, with an index or whatever, this also means it already has found the records from slave to be joined with that master. And vice versa, locating a slave also means locating its master.

当引擎从master发现一条记录,带有索引或其他内容时,这也意味着它已经找到了slave中与该master连接的记录。反之亦然,定位奴隶也意味着找到它的主人。

#2


4  

The b-tree index on MASTER_ID will remain balanced for most useful definitions of "balanced". In particular, the distance between the root block and any child block will always be the same and the amount of data in any branch will be at least roughly euqal to the amount of data in any other sibling branch.

对于“平衡”的大多数有用定义,MASTER_ID上的b树索引将保持平衡。特别是,根块和任何子块之间的距离将始终相同,并且任何分支中的数据量将至少与任何其他同级分支中的数据量大致相等。

As you insert data into an index on a sequence-generated column, Oracle will perform 90-10 block splits on the leaves when the amount of data in any particular level increases. If, for example, you have a leaf block that can hold 10 rows, when it is full and you want to add an 11th row, Oracle creates a new block, leaves the first 9 entries in the first block, puts 2 entries in the new block, and updates the parent block with the address of the new block. If the parent block needs to split because it is holding addresses for too many children, a similar process takes place. That leaves the indexes relatively balanced throughout their life. Richard Foote (the expert on Oracle indexes) has an excellent blog on When Does an Oracle Index Increase in Height that goes into much more detail about how this works.

当您将数据插入到序列生成列的索引中时,当任何特定级别的数据量增加时,Oracle将对叶子执行90-10个块分割。例如,如果您有一个可以容纳10行的叶子块,当它已满并且您想要添加第11行时,Oracle会创建一个新块,在第一个块中保留前9个条目,将2个条目放入新块,并使用新块的地址更新父块。如果父块需要拆分,因为它为太多子节点保存地址,则会发生类似的过程。这使得指数在其整个生命中相对平衡。 Richard Foote(Oracle索引专家)有一篇很棒的博客,关于何时Oracle索引增加高度,更详细地介绍了它的工作原理。

The only case that you potentially have to worry about an index becoming skewed is when you regularly delete most but not all blocks from the left-hand side of the index. If, for example, you decide to delete 90% of the data from the left-hand side of the index leaving one row in every leaf node pointing to data, your index can become unbalanced in the sense that some paths lead to vastly more data than other paths. Oracle doesn't reclaim index leaf nodes until they are completely empty so you can end up with the left-hand side of the index using a lot more space than is really necessary. This doesn't really affect the performance of the system much-- it is more of a space utilization issue-- but it can be fixed by coalescing the index after you've purged the data (or structuring your purges so that you don't leave lots of sparse blocks).

您可能不得不担心索引变得偏斜的唯一情况是,您经常从索引的左侧删除大多数但不是所有块。例如,如果您决定从索引的左侧删除90%的数据,而每个叶节点中的一行指向数据,那么在某些路径导致数据量大得多的意义上,您的索引可能会变得不平衡比其他路径。 Oracle不会回收索引叶子节点,直到它们完全为空,因此您可以使用比实际需要更多的空间来结束索引的左侧。这并不会真正影响系统的性能 - 它更多地是一个空间利用问题 - 但是可以通过在清除数据后合并索引来修复(或者构建清除以便你不要留下许多稀疏的块)。