Oracle SQL中的索引，EXPLAIN PLAN和记录访问

I have been learning about indexes in Oracle SQL, and I wanted to conduct a small experiment with a test table to see how indexes really worked. As I discovered from an earlier post made here, the best way to do this is with EXPLAIN PLAN. However, I am running into something which confuses me.

我一直在学习Oracle SQL中的索引，我想用测试表进行一个小实验，看看索引是如何工作的。正如我在此处发表的一篇文章中所发现的那样，最好的方法是使用EXPLAIN PLAN。但是，我遇到了让我困惑的事情。

My sample table contains attributes (EmpID, Fname, Lname, Occupation, .... etc). I populated it with 500,000 records using a java program I wrote (random names, occupations, etc). Now, here are some sample queries with and without indexes:

我的示例表包含属性（EmpID，Fname，Lname，Occupation，....等）。我使用我编写的java程序（随机名称，职业等）用500,000条记录填充它。现在，这里有一些带索引和不带索引的示例查询：

NO INDEX:

没有索引：

SELECT Fname FROM EMPLOYEE WHERE Occupation = 'DOCTOR';

EXPLAIN PLAN says:

EXPLAIN PLAN说：

OPERATION                         OPTIMIZER COST
TABLE ACCESS(FULL) TEST.EMPLOYEE  ANALYZED  1169

Now I create index:

现在我创建索引：

CREATE INDEX occupation_idx
    ON EMPLOYEE (Occupation);

WITH INDEX "occupation_idx":

WITH INDEX“occupation_idx”：

SELECT Fname FROM EMPLOYEE WHERE Occupation = 'DOCTOR';

EXPLAIN PLAN says:

EXPLAIN PLAN说：

OPERATION                         OPTIMIZER COST
TABLE ACCESS(FULL) TEST.EMPLOYEE  ANALYZED  1169

So... the cost is STILL the same, 1169? Now I try this:

所以...成本仍然相同，1169？现在我试试这个：

WITH INDEX "occupation_idx":

WITH INDEX“occupation_idx”：

SELECT Occupation FROM EMPLOYEE WHERE Occupation = 'DOCTOR';

EXPLAIN PLAN says:

EXPLAIN PLAN说：

OPERATION                              OPTIMIZER COST
INDEX(RANGE SCAN) TEST.OCCUPATION_IDX  ANALYZED  67

So, it appears that the index only is utilized when that column is the only one I'm pulling values from. But I thought that the point of an index was to unlock the entire record using the indexed column as the key? The search above is a pretty pointless one... it searches for values which you already know. The only worthwhile query I can think of which ONLY involves an indexed column's value (and not the rest of the record) would be an aggregate such as COUNT or something.

因此，似乎仅当该列是我从中提取值的列时才使用索引。但我认为索引的重点是使用索引列作为关键来解锁整个记录？上面的搜索是一个非常毫无意义的...它会搜索您已经知道的值。唯一有价值的查询我可以想到哪个只涉及索引列的值（而不是记录的其余部分）将是一个聚合，如COUNT或其他东西。

What am I missing?

我错过了什么？

4 个解决方案

#1

Even with your index, Oracle decided to do a full scan for the second query.

即使使用索引，Oracle也决定对第二个查询进行全面扫描。

Why did it do this? Oracle would have created two plans and come up with a cost for each:-

为什么这样做？甲骨文会创建两个计划并为每个计划提出成本： -

1) Full scan

1）全扫描

2) Index access

2）索引访问

Oracle selected the plan with the lower cost. Obviously it came up with the full scan as the lower cost.

Oracle以较低的成本选择了该计划。显然，它提出了全扫描，因为成本较低。

If you want to see the cost of the index plan, you can do an explain plan with a hint like this to force the index usage:

如果您想查看索引计划的成本，可以使用这样的提示来执行解释计划以强制索引使用：

SELECT /*+ INDEX(EMPLOYEE occupation_idx) */ Fname
FROM EMPLOYEE WHERE Occupation = 'DOCTOR';

If you do an explain plan on the above, you will see that the cost is greater than the full scan cost. This is why Oracle did not choose to use the index.

如果您对上述内容做了解释计划，您会发现成本高于完整扫描成本。这就是Oracle没有选择使用索引的原因。

A simple way to consider the cost of the index plan is:-

考虑指数计划成本的一种简单方法是： -

The blevel of the index (how many blocks must be read from top to bottom)
索引的瑕疵（必须从上到下读取多少个块）
The number of table blocks that must be subsequently read for records matching in the index. This relies on Oracle's estimate of the number of employees that have an occupation of 'DOCTOR'. In your simple example, this would be:

对于索引中匹配的记录，必须随后读取的表块数。这取决于甲骨文对拥有“医生”职业的员工数量的估计。在您的简单示例中，这将是：

number of rows / number of distinct values

行数/不同值的数量

More complicated considerations include the clustering factory and index cost adjustments which both reflect the likelyhood that a block that is read is already in memory and hence does not need to read from disk.

更复杂的考虑因素包括聚类工厂和索引成本调整，它们都反映了读取的块已经存储在内存中的可能性，因此不需要从磁盘读取。

Perhaps you could update your question with the results from your query with the index hint and also the results of this query:-

也许您可以使用索引提示的查询结果以及此查询的结果更新您的问题： -

SELECT COUNT(*), COUNT(DISTINCT( Occupation ))
FROM EMPLOYEE;

This will allow people to comment on the cost of the index plan.

这将允许人们评论指数计划的成本。

#2

I think I see what's happening here.

我想我看到这里发生了什么。

When you have the index in place, and you do:

当您拥有索引时，您执行以下操作：

SELECT Occupation FROM EMPLOYEE WHERE Occupation = 'DOCTOR';

The execution plan will use the index. This is a no-brainer, cause all the data that's required to satisfy the query is right there in the index, and Oracle never even has to reference the table at all.

执行计划将使用索引。这是一个明智的选择，因为满足查询所需的所有数据都在索引中，Oracle甚至根本不需要引用该表。

However, when you do:

但是，当你这样做时：

SELECT Fname FROM EMPLOYEE WHERE Occupation = 'DOCTOR';

then, if Oracle uses the index, it will do an INDEX RANGE SCAN followed by a TABLE ACCESS BY ROWID to look up the Fname that corresponds to that Occupation. Now, depending on how many rows have DOCTOR for Occupation, Oracle will have to make one or more trips to the table, to look up the Fname. If, for example, you have a table, and all the employees have Occupation set to 'DOCTOR', the index isn't of much use, and Oracle will simply do a FULL TABLE SCAN of the table. If there are 10,000 employees, and only one is a DOCTOR, then again, it's a no-brainer, and Oracle will use the index.

然后，如果Oracle使用索引，它将执行INDEX RANGE SCAN，然后执行TABLE ACCESS BY ROWID以查找与该Occupation对应的Fname。现在，根据有多少行拥有DOCTOR for Occupation，Oracle将不得不进行一次或多次访问该表，以查找Fname。例如，如果您有一个表，并且所有员工都将Occupation设置为'DOCTOR'，则索引没有多大用处，Oracle将只执行该表的FULL TABLE SCAN。如果有10,000名员工，而且只有一名是医生，那么再次，这是一个明智的做法，Oracle将使用该索引。

But there are some subtleties, when you're somewhere between those two extremes. People like to talk about 'selectivity', i.e., how many rows are identifed by the index, vs. the size of the table, when discussing whether the index will be used. But, that's not really true. What Oracle really cares about is block selectivity. That is, how many blocks does it have to visit, to satisfy the query? So, first, how "wide" is the RANGE SCAN? The more limited the range of values specified by the predicate values, the better. Second, when your query needs to do table lookups, how many different blocks will it have to visit to find all the data it needs. That is, how "random" is the data in the table relative to the index order? This is called the CLUSTERING_FACTOR. If you analyze the index to collect statistics, and then look at USER_INDEXES, you'll see that the CLUSTERING_FACTOR is now populated.

但是当你处于这两个极端之间时，有一些微妙之处。在讨论是否使用索引时，人们喜欢谈论“选择性”，即索引标识了多少行，而不是表的大小。但是，这不是真的。 Oracle真正关心的是块选择性。也就是说，为了满足查询需要访问多少块？那么，首先，RANGE SCAN的“宽”程度如何？谓词值指定的值范围越有限，越好。其次，当您的查询需要进行表查找时，必须访问多少个不同的块才能找到所需的所有数据。也就是说，表中的数据相对于索引顺序的“随机”程度如何？这称为CLUSTERING_FACTOR。如果您分析索引以收集统计信息，然后查看USER_INDEXES，您将看到现在已填充CLUSTERING_FACTOR。

So, what's CLUSTERING_FACTOR? CLUSTERING_FACTOR is the "orderedness" of the table, with respect to the index's key column(s). The value of CLUSTERING_FACTOR will always be between the number of blocks in a table and the number of rows in a table. A low CLUSTERING_FACTOR, that is, one that is very near to the number of blocks in the table, indicates a table that's very ordered, relative to the index. A high CLUSTERING_FACTOR, that is, one that is very near to the number of rows in the table, is very unordered, relative to the index.

那么，什么是CLUSTERING_FACTOR？ CLUSTERING_FACTOR是表的“有序性”，与索引的键列有关。 CLUSTERING_FACTOR的值将始终位于表中的块数和表中的行数之间。低CLUSTERING_FACTOR，即一个非常接近表中块数的表，表示相对于索引非常有序的表。高CLUSTERING_FACTOR，即非常接近表中行数的CLUSTERING_FACTOR，相对于索引来说是非常无序的。

It's an important concept to understand that the CLUSTERING_FACTOR describes the order of data in the table relative to the index. So, rebuilding an index, for example, will not change the CLUSTERING_FACTOR. It's also important to understand that the same table could have two indexes, and one could have an excellent CLUSTERING_FACTOR, and the other could have an extremely poor CLUSTERING_FACTOR. The table itself can only be ordered in one way.

理解CLUSTERING_FACTOR描述表中数据相对于索引的顺序是一个重要的概念。因此，例如，重建索引不会更改CLUSTERING_FACTOR。同样重要的是要理解同一个表可以有两个索引，一个可以有一个优秀的CLUSTERING_FACTOR，另一个可能有一个非常差的CLUSTERING_FACTOR。表本身只能以一种方式订购。

So, why have I spent so much time describing CLUSTERING_FACTOR? Because when you have an execution plan that does an INDEX RANGE SCAN followed by TABLE ACCESS BY ROWID, you can be sure that the CLUSTERING_FACTOR has been considered by Oracle's optimizer, to come up with the execution plan. For example, suppose you have a 10,000 row table, and suppose 100 of the rows have Occupation = 'DOCTOR'. You write the query above, asking for the Fname of the employees whose occupation is DOCTOR. Well, Oracle can very easily and efficiently determine the rowids of the rows where occupation is DOCTOR. But, how many table blocks will Oracle need to visit, to do the Fname lookup? It could be only 1 or 2 table blocks, if the data is clustered (ordered) by Occupation in the table. But, it could be as many as 100, if the data is very unordered in the table! So, again, 10,000 row table, and, let's assume, (for the purposes of illustration and simple math) that the table has 100 rows/block, and so, 100 blocks. Depending on table order (i.e. CLUSTERING_FACTOR), the number of table block visits could be as few as 1, or as many as 100.

那么，为什么我花了这么多时间来描述CLUSTERING_FACTOR？因为当你有一个执行计划执行INDEX RANGE SCAN后跟TABLE ACCESS BY ROWID时，你可以确定Oracle的优化器已经考虑了CLUSTERING_FACTOR，以提出执行计划。例如，假设您有10,000行表，并假设其中100行具有Occupation ='DOCTOR'。您编写上面的查询，询问职业是医生的员工的Fname。好吧，Oracle可以非常轻松有效地确定占用是DOCTOR的行的rowid。但是，Oracle需要访问多少个表块来进行Fname查找？如果数据按表中的Occupation进行聚类（排序），则它可能只有1或2个表块。但是，如果表中的数据非常无序，它可能会多达100个！因此，再次，10,000行表，并且，假设，（为了说明和简单的数学的目的）该表具有100行/块，因此，100块。根据表顺序（即CLUSTERING_FACTOR），表块访问次数可以少至1次，也可以多达100次。

So, I hope this helps you understand why the optimizer may be reluctant to use an index in some cases.

因此，我希望这有助于您理解为什么优化器在某些情况下可能不愿意使用索引。

#3

An index is the copy of the table which only stores the following data:

索引是表的副本，它只存储以下数据：

Indexed field(s)
索引字段
A pointer to the original row (rowid).
指向原始行（rowid）的指针。

Say you have a table like this:

假设你有一个这样的表：

rowid    id  name  occupation
[1]      1   John  clerk
[2]      2   Jim   manager
[3]      3   Jane  boss

Then an index on occupation would look like this:

那么占领指数看起来像这样：

occupation  rowid
boss        [3]
manager     [2]
clerk       [1]

, with the records sorted on occupation in a B-Tree.

，记录按B树的职业排序。

As you can see, if you only select the indexed fields, you only need the index (the second table).

如您所见，如果您只选择索引字段，则只需要索引（第二个表）。

If you select anything other than occupation:

如果你选择职业以外的任何东西：

SELECT  *
FROM    mytable
WHERE   occupation = 'clerk'

then the engine should make two things: first find the relevant records in the index, second, find the records in the original table by rowid. It's like if you joined the two tables on rowid.

然后引擎应该做两件事：首先在索引中找到相关记录，第二，通过rowid查找原始表中的记录。就像你在rowid上加入了两个表一样。

Since the rowids in the index are not in order, the reads to the original table are not sequential and can be slow. It may be faster to read the original table in sequential order and just filter the records with occupation = 'clerk'.

由于索引中的rowid不是有序的，因此对原始表的读取不是顺序的，而且可能很慢。按顺序读取原始表可能会更快，只需使用Occup ='clerk'过滤记录。

The engine does not "unlock" the records: it just finds the rowid in the index, and if there are not enough data in the index itself, it looks up data in the original table by the rowid found.

引擎不会“解锁”记录：它只是在索引中找到rowid，如果索引本身没有足够的数据，它会通过找到的rowid查找原始表中的数据。

#4

As a WAG. Analyze the table, and the index, then see if the plan changes.

作为WAG。分析表和索引，然后查看计划是否更改。

When you are selecting just the occupation, the entire query can be satisfied from the index. The index literally has a copy of the occupation. The moment you add an additional column to the select, Oracle has to go to the data record, to get it. The optimizer chooses to read all of the data rows instead of all of the index rows, and the data rows. It's cheaper.

当您只选择职业时，可以从索引中满足整个查询。该指数字面上有一份职业。在向select中添加其他列的那一刻，Oracle必须转到数据记录才能获得它。优化器选择读取所有数据行而不是所有索引行和数据行。这个更便宜。

#1

Even with your index, Oracle decided to do a full scan for the second query.

即使使用索引，Oracle也决定对第二个查询进行全面扫描。

Why did it do this? Oracle would have created two plans and come up with a cost for each:-

为什么这样做？甲骨文会创建两个计划并为每个计划提出成本： -

1) Full scan

1）全扫描

2) Index access

2）索引访问

Oracle selected the plan with the lower cost. Obviously it came up with the full scan as the lower cost.

Oracle以较低的成本选择了该计划。显然，它提出了全扫描，因为成本较低。

If you want to see the cost of the index plan, you can do an explain plan with a hint like this to force the index usage:

如果您想查看索引计划的成本，可以使用这样的提示来执行解释计划以强制索引使用：

SELECT /*+ INDEX(EMPLOYEE occupation_idx) */ Fname
FROM EMPLOYEE WHERE Occupation = 'DOCTOR';

If you do an explain plan on the above, you will see that the cost is greater than the full scan cost. This is why Oracle did not choose to use the index.

如果您对上述内容做了解释计划，您会发现成本高于完整扫描成本。这就是Oracle没有选择使用索引的原因。

A simple way to consider the cost of the index plan is:-

考虑指数计划成本的一种简单方法是： -

The blevel of the index (how many blocks must be read from top to bottom)
索引的瑕疵（必须从上到下读取多少个块）
The number of table blocks that must be subsequently read for records matching in the index. This relies on Oracle's estimate of the number of employees that have an occupation of 'DOCTOR'. In your simple example, this would be:

对于索引中匹配的记录，必须随后读取的表块数。这取决于甲骨文对拥有“医生”职业的员工数量的估计。在您的简单示例中，这将是：

number of rows / number of distinct values

行数/不同值的数量

更复杂的考虑因素包括聚类工厂和索引成本调整，它们都反映了读取的块已经存储在内存中的可能性，因此不需要从磁盘读取。

Perhaps you could update your question with the results from your query with the index hint and also the results of this query:-

也许您可以使用索引提示的查询结果以及此查询的结果更新您的问题： -

SELECT COUNT(*), COUNT(DISTINCT( Occupation ))
FROM EMPLOYEE;

This will allow people to comment on the cost of the index plan.

这将允许人们评论指数计划的成本。

#2

I think I see what's happening here.

我想我看到这里发生了什么。

When you have the index in place, and you do:

当您拥有索引时，您执行以下操作：

SELECT Occupation FROM EMPLOYEE WHERE Occupation = 'DOCTOR';

执行计划将使用索引。这是一个明智的选择，因为满足查询所需的所有数据都在索引中，Oracle甚至根本不需要引用该表。

However, when you do:

但是，当你这样做时：

SELECT Fname FROM EMPLOYEE WHERE Occupation = 'DOCTOR';

So, I hope this helps you understand why the optimizer may be reluctant to use an index in some cases.

因此，我希望这有助于您理解为什么优化器在某些情况下可能不愿意使用索引。

#3

An index is the copy of the table which only stores the following data:

索引是表的副本，它只存储以下数据：

Indexed field(s)
索引字段
A pointer to the original row (rowid).
指向原始行（rowid）的指针。

Say you have a table like this:

假设你有一个这样的表：

rowid    id  name  occupation
[1]      1   John  clerk
[2]      2   Jim   manager
[3]      3   Jane  boss

Then an index on occupation would look like this:

那么占领指数看起来像这样：

occupation  rowid
boss        [3]
manager     [2]
clerk       [1]

, with the records sorted on occupation in a B-Tree.

，记录按B树的职业排序。

As you can see, if you only select the indexed fields, you only need the index (the second table).

如您所见，如果您只选择索引字段，则只需要索引（第二个表）。

If you select anything other than occupation:

如果你选择职业以外的任何东西：

SELECT  *
FROM    mytable
WHERE   occupation = 'clerk'

then the engine should make two things: first find the relevant records in the index, second, find the records in the original table by rowid. It's like if you joined the two tables on rowid.

然后引擎应该做两件事：首先在索引中找到相关记录，第二，通过rowid查找原始表中的记录。就像你在rowid上加入了两个表一样。

由于索引中的rowid不是有序的，因此对原始表的读取不是顺序的，而且可能很慢。按顺序读取原始表可能会更快，只需使用Occup ='clerk'过滤记录。

The engine does not "unlock" the records: it just finds the rowid in the index, and if there are not enough data in the index itself, it looks up data in the original table by the rowid found.

引擎不会“解锁”记录：它只是在索引中找到rowid，如果索引本身没有足够的数据，它会通过找到的rowid查找原始表中的数据。

#4

As a WAG. Analyze the table, and the index, then see if the plan changes.

作为WAG。分析表和索引，然后查看计划是否更改。

秒客网

Oracle SQL中的索引，EXPLAIN PLAN和记录访问

4 个解决方案

#1

#2

#3

#4

#1

#2

#3

#4

相关文章