I had a query where an index was not used when I thought it could be, so I reproduced it out of curiosity:
我有一个查询,当我认为它没有使用索引时,所以我出于好奇而转载它:
Create a test_table
with 1.000.000 rows (10 distinct values in col
, 500 bytes of data in some_data
).
创建一个包含1.000.000行的test_table(col中有10个不同的值,some_data中有500个字节的数据)。
CREATE TABLE test_table AS (
SELECT MOD(ROWNUM,10) col, LPAD('x', 500, 'x') some_data
FROM dual
CONNECT BY ROWNUM <= 1000000
);
Create an index and gather table stats:
创建索引并收集表统计信息:
CREATE INDEX test_index ON test_table ( col );
EXEC dbms_stats.gather_table_stats( 'MY_SCHEMA', 'TEST_TABLE' );
Try to get distinct values of col
and the COUNT
:
尝试获取col和COUNT的不同值:
EXPLAIN PLAN FOR
SELECT col, COUNT(*)
FROM test_table
GROUP BY col;
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 15816 (1)| 00:03:10
| 1 | HASH GROUP BY | | 10 | 30 | 15816 (1)| 00:03:10
| 2 | TABLE ACCESS FULL| TEST_TABLE | 994K| 2914K| 15755 (1)| 00:03:10
---------------------------------------------------------------------------------
The index is not used, providing the hint does not change this.
未使用索引,前提是提示不会更改此设置。
I guess, the index can't be used in this case, but why?
我猜,在这种情况下不能使用索引,但为什么呢?
4 个解决方案
#1
5
I ran Peter's original stuff and reproduced his results. I then applied dcp's suggestion...
我跑了彼得原创的东西并复制了他的结果。然后我应用了dcp的建议......
SQL> alter table test_table modify col not null;
Table altered.
SQL> EXEC dbms_stats.gather_table_stats( user, 'TEST_TABLE' , cascade=>true)
PL/SQL procedure successfully completed.
SQL> EXPLAIN PLAN FOR
2 SELECT col, COUNT(*)
3 FROM test_table
4 GROUP BY col;
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 2099921975
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 574 (9)| 00:00:07 |
| 1 | HASH GROUP BY | | 10 | 30 | 574 (9)| 00:00:07 |
| 2 | INDEX FAST FULL SCAN| TEST_INDEX | 1000K| 2929K| 532 (2)| 00:00:07 |
------------------------------------------------------------------------------------
9 rows selected.
SQL>
The reason this matters, is because NULL values are not included in a normal B-TREE index, but the GROUP BY has to include NULL as a grouping "value" in your query. By telling the optimizer that there are no NULLs in col
it is free to use the much more efficient index (I was getting an elapsed time of almost 3.55 seconds with the FTS). This is a classic example of how metadata can influence the optimizer.
这很重要的原因是因为NULL值不包含在普通的B-TREE索引中,但GROUP BY必须在查询中包含NULL作为分组“值”。通过告诉优化器col中没有NULL,可以*地使用效率更高的索引(我用FTS得到的时间差不多是3.55秒)。这是元数据如何影响优化器的典型示例。
Incidentally, this is obviously a 10g or 11g database, because it uses the HASH GROUP BY algorithm, instead of the older SORT (GROUP BY) algorithm.
顺便说一下,这显然是一个10g或11g的数据库,因为它使用HASH GROUP BY算法,而不是旧的SORT(GROUP BY)算法。
#2
13
UPDATE: Try making the col column NOT NULL. That is the reason it's not using the index. When it's not null, here's the plan.
更新:尝试使col列不为NULL。这就是它没有使用索引的原因。当它不为空时,这是计划。
SELECT STATEMENT, GOAL = ALL_ROWS 69 10 30
HASH GROUP BY 69 10 30
INDEX FAST FULL SCAN SANDBOX TEST_INDEX 56 98072 294216
If the optimizer determines that it's more efficient NOT to use the index (maybe due to rewriting the query), then it won't. Optimizer hints are just that, namely, hints to tell Oracle an index you'd like it to use. You can think of them as suggestions. But if the optimizer determines that it's better not to use the index (again, as result of query rewrite for example), then it's not going to.
如果优化器确定不使用索引更有效(可能是由于重写查询),那么它就不会。优化器提示就是这样,即提示告诉Oracle您希望它使用的索引。您可以将它们视为建议。但是如果优化器确定最好不使用索引(例如,再次,作为查询重写的结果),那么它就不会。
Refer to this link: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/hintsref.htm "Specifying one of these hints causes the optimizer to choose the specified access path only if the access path is available based on the existence of an index or cluster and on the syntactic constructs of the SQL statement. If a hint specifies an unavailable access path, then the optimizer ignores it."
请参阅此链接:http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/hintsref.htm“指定其中一个提示会导致优化器仅在访问路径时选择指定的访问路径是否存在基于索引或集群的存在以及SQL语句的语法结构。如果提示指定了不可用的访问路径,则优化器会忽略它。“
Since you are running a count(*) operation, the optimizer has determined that it's more efficient to just scan the whole table and hash instead of using your index.
由于您正在运行count(*)操作,因此优化器已确定仅扫描整个表和哈希而不是使用索引更有效。
Here's another handy link on hints: http://www.dba-oracle.com/t_hint_ignored.htm
这是关于提示的另一个方便的链接:http://www.dba-oracle.com/t_hint_ignored.htm
#3
10
you forgot this really important information: COL is not null
你忘了这个非常重要的信息:COL不是空的
If the column is NULLABLE, the index can not be used because there might be unindexed rows.
如果列为NULLABLE,则无法使用索引,因为可能存在未编入索引的行。
SQL> ALTER TABLE test_table MODIFY (col NOT NULL);
Table altered
SQL> EXPLAIN PLAN FOR
2 SELECT col, COUNT(*) FROM test_table GROUP BY col;
Explained
SQL> SELECT * FROM table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1077170955
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 1954 (1)| 00:00:2
| 1 | SORT GROUP BY NOSORT| | 10 | 30 | 1954 (1)| 00:00:2
| 2 | INDEX FULL SCAN | TEST_INDEX | 976K| 2861K| 1954 (1)| 00:00:2
--------------------------------------------------------------------------------
#4
0
bitmap index will do as well
位图索引也可以
Execution Plan ---------------------------------------------------------- Plan hash value: 2200191467 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 30 | 15983 (2)| 00:03:12 | | 1 | HASH GROUP BY | | 10 | 30 | 15983 (2)| 00:03:12 | | 2 | TABLE ACCESS FULL| TEST_TABLE | 1013K| 2968K| 15825 (1)| 00:03:10 | --------------------------------------------------------------------------------- SQL> create bitmap index test_index on test_table(col); Index created. SQL> EXEC dbms_stats.gather_table_stats( 'MY_SCHEMA', 'TEST_TABLE' ); PL/SQL procedure successfully completed. SQL> SELECT col, COUNT(*) 2 FROM test_table 3 GROUP BY col 4 / Execution Plan ---------------------------------------------------------- Plan hash value: 238193838 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 30 | 286 (0)| 00:00:04 | | 1 | SORT GROUP BY NOSORT | | 10 | 30 | 286 (0)| 00:00:04 | | 2 | BITMAP CONVERSION COUNT| | 1010K| 2961K| 286 (0)| 00:00:04 | | 3 | BITMAP INDEX FULL SCAN| TEST_INDEX | | | | | ---------------------------------------------------------------------------------------
#1
5
I ran Peter's original stuff and reproduced his results. I then applied dcp's suggestion...
我跑了彼得原创的东西并复制了他的结果。然后我应用了dcp的建议......
SQL> alter table test_table modify col not null;
Table altered.
SQL> EXEC dbms_stats.gather_table_stats( user, 'TEST_TABLE' , cascade=>true)
PL/SQL procedure successfully completed.
SQL> EXPLAIN PLAN FOR
2 SELECT col, COUNT(*)
3 FROM test_table
4 GROUP BY col;
Explained.
SQL> select * from table(dbms_xplan.display)
2 /
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 2099921975
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 574 (9)| 00:00:07 |
| 1 | HASH GROUP BY | | 10 | 30 | 574 (9)| 00:00:07 |
| 2 | INDEX FAST FULL SCAN| TEST_INDEX | 1000K| 2929K| 532 (2)| 00:00:07 |
------------------------------------------------------------------------------------
9 rows selected.
SQL>
The reason this matters, is because NULL values are not included in a normal B-TREE index, but the GROUP BY has to include NULL as a grouping "value" in your query. By telling the optimizer that there are no NULLs in col
it is free to use the much more efficient index (I was getting an elapsed time of almost 3.55 seconds with the FTS). This is a classic example of how metadata can influence the optimizer.
这很重要的原因是因为NULL值不包含在普通的B-TREE索引中,但GROUP BY必须在查询中包含NULL作为分组“值”。通过告诉优化器col中没有NULL,可以*地使用效率更高的索引(我用FTS得到的时间差不多是3.55秒)。这是元数据如何影响优化器的典型示例。
Incidentally, this is obviously a 10g or 11g database, because it uses the HASH GROUP BY algorithm, instead of the older SORT (GROUP BY) algorithm.
顺便说一下,这显然是一个10g或11g的数据库,因为它使用HASH GROUP BY算法,而不是旧的SORT(GROUP BY)算法。
#2
13
UPDATE: Try making the col column NOT NULL. That is the reason it's not using the index. When it's not null, here's the plan.
更新:尝试使col列不为NULL。这就是它没有使用索引的原因。当它不为空时,这是计划。
SELECT STATEMENT, GOAL = ALL_ROWS 69 10 30
HASH GROUP BY 69 10 30
INDEX FAST FULL SCAN SANDBOX TEST_INDEX 56 98072 294216
If the optimizer determines that it's more efficient NOT to use the index (maybe due to rewriting the query), then it won't. Optimizer hints are just that, namely, hints to tell Oracle an index you'd like it to use. You can think of them as suggestions. But if the optimizer determines that it's better not to use the index (again, as result of query rewrite for example), then it's not going to.
如果优化器确定不使用索引更有效(可能是由于重写查询),那么它就不会。优化器提示就是这样,即提示告诉Oracle您希望它使用的索引。您可以将它们视为建议。但是如果优化器确定最好不使用索引(例如,再次,作为查询重写的结果),那么它就不会。
Refer to this link: http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/hintsref.htm "Specifying one of these hints causes the optimizer to choose the specified access path only if the access path is available based on the existence of an index or cluster and on the syntactic constructs of the SQL statement. If a hint specifies an unavailable access path, then the optimizer ignores it."
请参阅此链接:http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/hintsref.htm“指定其中一个提示会导致优化器仅在访问路径时选择指定的访问路径是否存在基于索引或集群的存在以及SQL语句的语法结构。如果提示指定了不可用的访问路径,则优化器会忽略它。“
Since you are running a count(*) operation, the optimizer has determined that it's more efficient to just scan the whole table and hash instead of using your index.
由于您正在运行count(*)操作,因此优化器已确定仅扫描整个表和哈希而不是使用索引更有效。
Here's another handy link on hints: http://www.dba-oracle.com/t_hint_ignored.htm
这是关于提示的另一个方便的链接:http://www.dba-oracle.com/t_hint_ignored.htm
#3
10
you forgot this really important information: COL is not null
你忘了这个非常重要的信息:COL不是空的
If the column is NULLABLE, the index can not be used because there might be unindexed rows.
如果列为NULLABLE,则无法使用索引,因为可能存在未编入索引的行。
SQL> ALTER TABLE test_table MODIFY (col NOT NULL);
Table altered
SQL> EXPLAIN PLAN FOR
2 SELECT col, COUNT(*) FROM test_table GROUP BY col;
Explained
SQL> SELECT * FROM table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1077170955
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 30 | 1954 (1)| 00:00:2
| 1 | SORT GROUP BY NOSORT| | 10 | 30 | 1954 (1)| 00:00:2
| 2 | INDEX FULL SCAN | TEST_INDEX | 976K| 2861K| 1954 (1)| 00:00:2
--------------------------------------------------------------------------------
#4
0
bitmap index will do as well
位图索引也可以
Execution Plan ---------------------------------------------------------- Plan hash value: 2200191467 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 30 | 15983 (2)| 00:03:12 | | 1 | HASH GROUP BY | | 10 | 30 | 15983 (2)| 00:03:12 | | 2 | TABLE ACCESS FULL| TEST_TABLE | 1013K| 2968K| 15825 (1)| 00:03:10 | --------------------------------------------------------------------------------- SQL> create bitmap index test_index on test_table(col); Index created. SQL> EXEC dbms_stats.gather_table_stats( 'MY_SCHEMA', 'TEST_TABLE' ); PL/SQL procedure successfully completed. SQL> SELECT col, COUNT(*) 2 FROM test_table 3 GROUP BY col 4 / Execution Plan ---------------------------------------------------------- Plan hash value: 238193838 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 30 | 286 (0)| 00:00:04 | | 1 | SORT GROUP BY NOSORT | | 10 | 30 | 286 (0)| 00:00:04 | | 2 | BITMAP CONVERSION COUNT| | 1010K| 2961K| 286 (0)| 00:00:04 | | 3 | BITMAP INDEX FULL SCAN| TEST_INDEX | | | | | ---------------------------------------------------------------------------------------