My question concerns Oracle 11g and the use of indexes in SQL queries.
我的问题涉及Oracle 11g和SQL查询中索引的使用。
In my database, there is a table that is structured as followed:
在我的数据库中,有一个表结构如下:
Table tab (
rowid NUMBER(11),
unique_id_string VARCHAR2(2000),
year NUMBER(4),
dynamic_col_1 NUMBER(11),
dynamic_col_1_text NVARCHAR2(2000)
) TABLESPACE tabspace_data;
I have created two indexes:
我创建了两个索引:
CREATE INDEX Index_dyn_col1 ON tab (dynamic_col_1, dynamic_col_1_text) TABLESPACE tabspace_index;
CREATE INDEX Index_unique_id_year ON tab (unique_id_string, year) TABLESPACE tabspace_index;
The table contains around 1 to 2 million records. I extract the data from it by executing the following SQL command:
这个表格包含了大约1到2百万条记录。我通过执行以下SQL命令从其中提取数据:
SELECT distinct
"sub_select"."dynamic_col_1" "AS_dynamic_col_1","sub_select"."dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM
(
SELECT "tab".* FROM "tab"
where "tab".year = 2011
) "sub_select"
Unfortunately, the query needs around 1 hour to execute, although I created the both indexes described above. The explain plan shows that Oracle uses a "Table Full Access", i.e. a full table scan. Why is the index not used?
不幸的是,查询需要大约1个小时来执行,尽管我创建了上面描述的两个索引。explain计划显示Oracle使用“表全访问”,即全表扫描。为什么不使用索引?
As an experiment, I tested the following SQL command:
作为一个实验,我测试了以下SQL命令:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
Even in this case, the index is not used and a full table scan is performed.
即使在这种情况下,也不使用索引并执行完整的表扫描。
In my real database, the table contains more indexed columns like "dynamic_col_1" and "dynamic_col_1_text". The whole index file has a size of about 50 GB.
在我的真实数据库中,该表包含更多索引列,如“dynamic_col_1”和“dynamic_col_1_text”。整个索引文件的大小约为50 GB。
A few more informations:
更多信息:
- The database is Oracle 11g installed on my local computer.
- 数据库是安装在我本地计算机上的Oracle 11g。
- I use Windows 7 Enterprise 64bit.
- 我使用Windows 7 Enterprise 64位。
- The whole index is split over 3 dbf files with about 50GB size.
- 整个索引被分割为3个dbf文件,大小约为50GB。
I would really be glad, if someone could tell me how to make Oracle use the index in the first query. Because the first query is used by another program to extract the data from the database, it can hardly be changed. So it would be good to tweak the table instead.
如果有人能告诉我如何让Oracle在第一个查询中使用索引,我将非常高兴。因为第一个查询是另一个程序用来从数据库中提取数据的,所以很难对它进行修改。所以最好还是调整一下表格。
Thanks in advance.
提前谢谢。
[01.10.2011: UPDATE]
(01.10.2011:更新)
I think I've found the solution for the problem. Both columns dynamic_col_1
and dynamic_col_1_text
are nullable. After altering the table to prohibit "NULL"-values in both columns and adding a new index solely for the column year
, Oracle performs a Fast Index Scan. The advantage is that the query takes now about 5 seconds to execute and not 1 hour as before.
我想我已经找到解决这个问题的办法了。dynamic_col_1和dynamic_col_1_text两列都是空的。在修改表以禁止两列中的“NULL”值并仅为列年添加新索引之后,Oracle执行快速索引扫描。这样做的好处是,现在执行查询需要5秒,而不像以前那样需要1小时。
6 个解决方案
#1
5
Are you sure that an index access would be faster than a full table scan? As a very rough estimate, full table scans are 20 times faster than reading an index. If tab
has more than 5% of the data in 2011 it's not surprising that Oracle would use a full table scan. And as @Dan and @Ollie mentioned, with year
as the second column this will make the index even slower.
您确定索引访问将比完整表扫描更快吗?作为一个非常粗略的估计,全表扫描比读取索引快20倍。如果tab在2011年拥有超过5%的数据,那么Oracle使用全表扫描也就不足为奇了。正如@Dan和@Ollie提到的,年作为第二列,这将使索引更加缓慢。
If the index really is faster, than the issue is probably bad statistics. There are hundreds of ways the statistics could be bad. Very briefly, here's what I'd look at first:
如果指数真的比实际更快,那么问题很可能是糟糕的统计数据。统计数据有上百种可能是糟糕的。简单地说,这是我首先看到的:
- Run an explain plan with and without and index hint. Are the cardinalities off by 10x or more? Are the times off by 10x or more?
- 使用和不使用以及索引提示运行解释计划。基数差了10倍或更多吗?时间间隔是10倍还是更多?
- If the cardinality is off, make sure there are up to date stats on the table and index and you're using a reasonable ESTIMATE_PERCENT (DBMS_STATS.AUTO_SAMPLE_SIZE is almost always the best for 11g).
- 如果基数为off,请确保表和索引上有最新的统计信息,并使用合理的ESTIMATE_PERCENT (DBMS_STATS)。AUTO_SAMPLE_SIZE几乎总是最适合11g)。
- If the time is off, check your workload statistics.
- 如果没有时间,请检查工作负载统计信息。
- Are you using parallelism? Oracle always assumes a near linear improvement for parallelism, but on a desktop with one hard drive you probably won't see any improvement at all.
- 你使用并行性吗?Oracle总是假定并行性会得到近乎线性的改善,但是在一个硬盘驱动的桌面上,你可能根本看不到任何改善。
Also, this isn't really relevant to your problem, but you may want to avoid using quoted identifiers. Once you use them you have to use them everywhere, and it generally makes your tables and queries painful to work with.
另外,这与您的问题并不相关,但是您可能希望避免使用引用的标识符。一旦您使用了它们,您就必须在任何地方使用它们,这通常会使您的表和查询难以使用。
#2
4
Your index should be:
索引应该是:
CREATE INDEX Index_year
ON tab (year)
TABLESPACE tabspace_index;
Also, your query could just be:
同样,您的查询可能是:
SELECT DISTINCT
dynamic_col_1 "AS_dynamic_col_1",
dynamic_col_1_text "AS_dynamic_col_1_text"
FROM tab
WHERE year = 2011;
If your index was created solely for this query though, you could create it including the two fetched columns as well, then the optimiser would not have to go to the table for the query data, it could retrieve it directly from the index making your query more efficient again.
如果您的索引是为这个查询单独创建的,那么您也可以创建它,包括两个提取的列,那么optimiser就不需要转到表中获取查询数据,它可以直接从索引中检索索引,从而使您的查询更加高效。
Hope it helps...
希望它能帮助……
#3
2
I don't have an Oracle instance on hand so this is somewhat guesswork, but my inclination is to say it's because you have the compound index in the wrong order. If you had year
as the first column in the index it might use it.
我手头没有Oracle实例,所以这有点猜测,但我倾向于说,这是因为复合索引的顺序不对。如果你有年份作为索引的第一列,它可能会使用它。
#4
1
Your second test query:
你的第二个测试查询:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
would not use the index because you have no WHERE clause, so you're asking Oracle to read every row in the table. In that situation the full table scan is the faster access method.
不会使用索引,因为您没有WHERE子句,所以您要求Oracle读取表中的每一行。在这种情况下,全表扫描是更快的访问方法。
Also, as other posters have mentioned, your index on YEAR has it in the second column. Oracle can use this index by performing a skip scan, but there is a performance hit for doing so, and depending on the size of your table Oracle may just decide to use the FTS again.
另外,正如其他海报所提到的,你的年度索引在第二列。Oracle可以通过执行跳过扫描来使用这个索引,但是这样做会影响性能,并且取决于您的表的大小,Oracle可能只是决定再次使用FTS。
#5
1
I don't know if it's relevant, but I tested the following query:
我不知道是否相关,但我测试了以下查询:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
WHERE "dynamic_col_1" = 123 AND "dynamic_col_1_text" = 'abc'
The explain plan for that query show that Oracle uses an index scan in this scenario.
该查询的解释计划表明,Oracle在此场景中使用了索引扫描。
The columns dynamic_col_1
and dynamic_col_1_text
are nullable. Does this have an effect on the usage of the index?
dynamic_col_1和dynamic_col_1_text列是可空的。这对索引的使用有影响吗?
01.10.2011: UPDATE]
01.10.2011更新):
I think I've found the solution for the problem. Both columns dynamic_col_1 and dynamic_col_1_text are nullable. After altering the table to prohibit "NULL"-values in both columns and adding a new index solely for the column year, Oracle performs a Fast Index Scan. The advantage is that the query takes now about 5 seconds to execute and not 1 hour as before.
我想我已经找到解决这个问题的办法了。dynamic_col_1和dynamic_col_1_text两列都是空的。在修改表以禁止两列中的“NULL”值并仅为列年添加新索引之后,Oracle执行快速索引扫描。这样做的好处是,现在执行查询需要5秒,而不像以前那样需要1小时。
#6
0
Try this:
试试这个:
1) Create an index on year field (see Ollie answer).
1)在year字段上创建一个索引(参见Ollie的回答)。
2) And then use this query:
2)然后使用这个查询:
SELECT DISTINCT
dynamic_col_1
,dynamic_col_1_text
FROM tab
WHERE ID (SELECT ID FROM tab WHERE year=2011)
or
或
SELECT DISTINCT
dynamic_col_1
,dynamic_col_1_text
FROM tab
WHERE ID (SELECT ID FROM tab WHERE year=2011)
GROUP BY dynamic_col_1, dynamic_col_1_text
Maybe it will help you.
也许这对你有帮助。
#1
5
Are you sure that an index access would be faster than a full table scan? As a very rough estimate, full table scans are 20 times faster than reading an index. If tab
has more than 5% of the data in 2011 it's not surprising that Oracle would use a full table scan. And as @Dan and @Ollie mentioned, with year
as the second column this will make the index even slower.
您确定索引访问将比完整表扫描更快吗?作为一个非常粗略的估计,全表扫描比读取索引快20倍。如果tab在2011年拥有超过5%的数据,那么Oracle使用全表扫描也就不足为奇了。正如@Dan和@Ollie提到的,年作为第二列,这将使索引更加缓慢。
If the index really is faster, than the issue is probably bad statistics. There are hundreds of ways the statistics could be bad. Very briefly, here's what I'd look at first:
如果指数真的比实际更快,那么问题很可能是糟糕的统计数据。统计数据有上百种可能是糟糕的。简单地说,这是我首先看到的:
- Run an explain plan with and without and index hint. Are the cardinalities off by 10x or more? Are the times off by 10x or more?
- 使用和不使用以及索引提示运行解释计划。基数差了10倍或更多吗?时间间隔是10倍还是更多?
- If the cardinality is off, make sure there are up to date stats on the table and index and you're using a reasonable ESTIMATE_PERCENT (DBMS_STATS.AUTO_SAMPLE_SIZE is almost always the best for 11g).
- 如果基数为off,请确保表和索引上有最新的统计信息,并使用合理的ESTIMATE_PERCENT (DBMS_STATS)。AUTO_SAMPLE_SIZE几乎总是最适合11g)。
- If the time is off, check your workload statistics.
- 如果没有时间,请检查工作负载统计信息。
- Are you using parallelism? Oracle always assumes a near linear improvement for parallelism, but on a desktop with one hard drive you probably won't see any improvement at all.
- 你使用并行性吗?Oracle总是假定并行性会得到近乎线性的改善,但是在一个硬盘驱动的桌面上,你可能根本看不到任何改善。
Also, this isn't really relevant to your problem, but you may want to avoid using quoted identifiers. Once you use them you have to use them everywhere, and it generally makes your tables and queries painful to work with.
另外,这与您的问题并不相关,但是您可能希望避免使用引用的标识符。一旦您使用了它们,您就必须在任何地方使用它们,这通常会使您的表和查询难以使用。
#2
4
Your index should be:
索引应该是:
CREATE INDEX Index_year
ON tab (year)
TABLESPACE tabspace_index;
Also, your query could just be:
同样,您的查询可能是:
SELECT DISTINCT
dynamic_col_1 "AS_dynamic_col_1",
dynamic_col_1_text "AS_dynamic_col_1_text"
FROM tab
WHERE year = 2011;
If your index was created solely for this query though, you could create it including the two fetched columns as well, then the optimiser would not have to go to the table for the query data, it could retrieve it directly from the index making your query more efficient again.
如果您的索引是为这个查询单独创建的,那么您也可以创建它,包括两个提取的列,那么optimiser就不需要转到表中获取查询数据,它可以直接从索引中检索索引,从而使您的查询更加高效。
Hope it helps...
希望它能帮助……
#3
2
I don't have an Oracle instance on hand so this is somewhat guesswork, but my inclination is to say it's because you have the compound index in the wrong order. If you had year
as the first column in the index it might use it.
我手头没有Oracle实例,所以这有点猜测,但我倾向于说,这是因为复合索引的顺序不对。如果你有年份作为索引的第一列,它可能会使用它。
#4
1
Your second test query:
你的第二个测试查询:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
would not use the index because you have no WHERE clause, so you're asking Oracle to read every row in the table. In that situation the full table scan is the faster access method.
不会使用索引,因为您没有WHERE子句,所以您要求Oracle读取表中的每一行。在这种情况下,全表扫描是更快的访问方法。
Also, as other posters have mentioned, your index on YEAR has it in the second column. Oracle can use this index by performing a skip scan, but there is a performance hit for doing so, and depending on the size of your table Oracle may just decide to use the FTS again.
另外,正如其他海报所提到的,你的年度索引在第二列。Oracle可以通过执行跳过扫描来使用这个索引,但是这样做会影响性能,并且取决于您的表的大小,Oracle可能只是决定再次使用FTS。
#5
1
I don't know if it's relevant, but I tested the following query:
我不知道是否相关,但我测试了以下查询:
SELECT DISTINCT
"dynamic_col_1" "AS_dynamic_col_1", "dynamic_col_1_text" "AS_dynamic_col_1_text"
FROM "tab"
WHERE "dynamic_col_1" = 123 AND "dynamic_col_1_text" = 'abc'
The explain plan for that query show that Oracle uses an index scan in this scenario.
该查询的解释计划表明,Oracle在此场景中使用了索引扫描。
The columns dynamic_col_1
and dynamic_col_1_text
are nullable. Does this have an effect on the usage of the index?
dynamic_col_1和dynamic_col_1_text列是可空的。这对索引的使用有影响吗?
01.10.2011: UPDATE]
01.10.2011更新):
I think I've found the solution for the problem. Both columns dynamic_col_1 and dynamic_col_1_text are nullable. After altering the table to prohibit "NULL"-values in both columns and adding a new index solely for the column year, Oracle performs a Fast Index Scan. The advantage is that the query takes now about 5 seconds to execute and not 1 hour as before.
我想我已经找到解决这个问题的办法了。dynamic_col_1和dynamic_col_1_text两列都是空的。在修改表以禁止两列中的“NULL”值并仅为列年添加新索引之后,Oracle执行快速索引扫描。这样做的好处是,现在执行查询需要5秒,而不像以前那样需要1小时。
#6
0
Try this:
试试这个:
1) Create an index on year field (see Ollie answer).
1)在year字段上创建一个索引(参见Ollie的回答)。
2) And then use this query:
2)然后使用这个查询:
SELECT DISTINCT
dynamic_col_1
,dynamic_col_1_text
FROM tab
WHERE ID (SELECT ID FROM tab WHERE year=2011)
or
或
SELECT DISTINCT
dynamic_col_1
,dynamic_col_1_text
FROM tab
WHERE ID (SELECT ID FROM tab WHERE year=2011)
GROUP BY dynamic_col_1, dynamic_col_1_text
Maybe it will help you.
也许这对你有帮助。