在多个列中,列顺序是否有惟一的约束,这有什么区别吗?有重复的索引是合理的吗?

时间:2021-06-03 22:33:20

In "SQL Certified Expert Exam Guide" by Steve O'Hearn I've found this paragraph:

在Steve O'Hearn的《SQL认证专家考试指南》中,我发现了这一段:

In the rare instance when you create a composite index along with multiple constraints that call on the same index, a special syntax is required. For example, if we decide to create a composite index on both of our columns in the INVOICES table, we can use this syntax:

在罕见的情况下,当您创建一个复合索引和多个约束的同时调用同一个索引时,需要一个特殊的语法。例如,如果我们决定在发票表中的两列上创建一个复合索引,我们可以使用以下语法:

CREATE TABLE invoices
(
    invoice_id    NUMBER(11),
    invoice_date  DATE,
    CONSTRAINT    un_invoices_invoice_id UNIQUE (invoice_id, invoice_date)
                  USING INDEX (CREATE INDEX ix_invoices
                                      ON invoices(invoice_id, invoice_date)),
    CONSTRAINT    un_invoices_invoice_date UNIQUE (invoice_date, invoice_id)
                  USING INDEX ix_invoices
);

And here're my questions:

这是我的问题:

  1. What's the point in creating two unique constraints changing only columns order in declaration?

    创建两个惟一的约束的意义是什么?

  2. We've create one multiple-column index: "invoice_id" as first column and "invoice_date" as second column. But let's assume that we really often run queries that relate to "invoice_date" itself, without "invoice_id" participation. Would it be a good idea to create second, single column index on "invoice_date"? I know that:

    我们创建了一个多列索引:“invoice_id”作为第一列,“invoice_date”作为第二列。但是让我们假设我们经常运行与“invoice_date”本身相关的查询,而没有“invoice_id”参与。在“invoice_date”上创建第二个、单个列索引是否是个好主意?我知道:

Because Oracle supports multi-column indexes, it’s easy to accidently create “duplicate” indexes, indexes that add overhead to DML and do not aid in speeding-up SQL execution. [Source]

因为Oracle支持多列索引,所以很容易意外地创建“重复”索引,这些索引会增加DML的开销,并且不会加速SQL执行。(来源)

and I also know that:

我也知道

Thanks to skip scanning a WHERE clause that references any columns within a composite index may invoke the index in its processing. [Steve O'Hearn]

由于跳过了扫描WHERE子句,该子句引用复合索引中的任何列,可能会在处理中调用索引。(史蒂夫·奥赫恩)

but also I know that:

但我也知道:

This isn't quite as beneficial as a simple one-column index, and its benefit varies, depending on the uniqueness of values in the first column. [Steve O'Hearn]

这并不像一个简单的单列索引那样有好处,它的好处是不同的,这取决于第一列中值的唯一性。(史蒂夫·奥赫恩)

So let's assume that we rarely use DML commands on this table and let's assume that we relate to both columns in SELECT's WHERE clause as often as to "index_date" or "index_id" separately. Would it be justified, in certain situations, to create two indexes? One, multiple column index, on (index_id, index_date) and second, single column index, on (index_date)?

假设我们很少在这个表上使用DML命令,假设我们经常在SELECT's WHERE子句中与“index_date”或“index_id”两个列关联。在某些情况下,创建两个索引是否合理?一个,多个列索引,on (index_id, index_date)和第二个,单列索引,on (index_date)?

2 个解决方案

#1


4  

Your question is:

你的问题是:

Would it be justified, in certain situations, to create two indexes? One, multiple column index, on (index_id, index_date) and second, single column index, on (index_date)?

在某些情况下,创建两个索引是否合理?一个,多个列索引,on (index_id, index_date)和第二个,单列索引,on (index_date)?

The answer is "yes". The first index will be used to satisfy queries with conditions like:

答案是“是的”。第一个索引将用于满足查询条件,例如:

  • filtering on index_id in the where clause
  • 在where子句中过滤index_id
  • filtering on index_id and index_date in the where clause
  • 在where子句中过滤index_id和index_date
  • filtering on index_id in the where clause and the ordering by index_date
  • 在where子句中对index_id进行过滤,并按index_date排序

The second index would not be used in these circumstance. It would be used for:

在这种情况下不使用第二个索引。它将被用于:

  • filtering on index_date in the where clause
  • 在where子句中过滤index_date

And the first index would not be used in this case.

第一个索引在这种情况下不会被使用。

The ordering of columns in indexes is important. They are used from the left to the right. So, these two indexes are useful. However, a third index on index_id alone would not be useful, because the first index already takes care of the same situations where that index would be used.

索引中列的顺序非常重要。它们是从左到右使用的。这两个指标很有用。但是,仅使用index_id中的第三个索引是没有用的,因为第一个索引已经处理了使用该索引的相同情况。

#2


3  

You asked

你问

"What's the point in creating two unique constraints changing only columns order in declaration?"

“创建两个惟一的约束的意义是什么?在声明中只更改列的顺序?”

There isn't any point. The order of columns in a composite constraint doesn't make any difference:

没有任何一点。复合约束中的列顺序没有任何区别:

SQL> select * from t23
  2  /

      COL1 COL
---------- ---
         1 WTF

SQL> create index t23_i on t23(col1, col2);

Index created.

SQL> alter table t23 add constraint t23_uk unique (col1 , col2) using index t23_i
  2  /

Table altered.

SQL> insert into t23 values (1, 'WTF')
  2  /
insert into t23 values (1, 'WTF')
*
ERROR at line 1:
ORA-00001: unique constraint (APC.T23_UK) violated


SQL> alter table t23 drop  constraint t23_uk
  2  /

Table altered.

SQL> alter table t23 add constraint t23_uk unique (col2, col1) using index t23_i
  2  /

Table altered.

SQL> insert into t23 values (1, 'WTF')
  2  /
insert into t23 values (1, 'WTF')
*
ERROR at line 1:
ORA-00001: unique constraint (APC.T23_UK) violated


SQL> 

That's the problem with exam-crammers: they often just say stuff, without providing explanation or context.

这就是考试填鸭式考试的问题:他们经常只说一些东西,不提供解释或上下文。

You also asked:

你还要求:

" Would it be a good idea to create second, single column index on invoice_date?"

“在invoice_date上创建第二个、单个列索引是不是一个好主意?”

Without knowing the data it's hard to tell, but I would expect a date column to be less selective than an ID column (especially if the time element is truncated), so generally I would expect an index to built as (invoice_date, invoice_id) anyway. That might allows us to use index compression.

在不知道数据的情况下,这是很难判断的,但是我希望日期列比ID列的选择性要少(特别是如果时间元素被截断),所以通常我希望索引被构建为(invoice_date, invoice_id)。这可以让我们使用索引压缩。

Skip-scanning doesn't quite work as Steve states: it starts by probing the leading edge of the index, but only if the second column in the composite index is referenced in the WHERE clause. The optimizer might choose a Full Fast Index Scan for searching on third columns or lower. Also it won't choose a Skip Scan path if the leading edge has too many distinct values: another good reason for leading with columns of low selectively.

skip -scan并不像Steve所说的那样有效:它从探查索引的前缘开始,但前提是复合索引中的第二列在WHERE子句中引用。优化器可以选择完全快速的索引扫描来搜索第三列或更低的列。如果前缘有太多不同的值,它也不会选择跳转扫描路径:这是有选择地使用低列进行导航的另一个好理由。

So, this doesn't exactly answer your question, but I think it does convey an important point: there are no universal rules governing creating indexes for performance. We need to understand the profile of the data - its values distribution and volumes - and also the most important queries which will use the table.

因此,这并不能确切地回答您的问题,但我认为它确实传达了一个重要的观点:没有统一的规则来管理为性能创建索引。我们需要了解数据的概要文件——它的值分布和卷——以及使用该表的最重要查询。

#1


4  

Your question is:

你的问题是:

Would it be justified, in certain situations, to create two indexes? One, multiple column index, on (index_id, index_date) and second, single column index, on (index_date)?

在某些情况下,创建两个索引是否合理?一个,多个列索引,on (index_id, index_date)和第二个,单列索引,on (index_date)?

The answer is "yes". The first index will be used to satisfy queries with conditions like:

答案是“是的”。第一个索引将用于满足查询条件,例如:

  • filtering on index_id in the where clause
  • 在where子句中过滤index_id
  • filtering on index_id and index_date in the where clause
  • 在where子句中过滤index_id和index_date
  • filtering on index_id in the where clause and the ordering by index_date
  • 在where子句中对index_id进行过滤,并按index_date排序

The second index would not be used in these circumstance. It would be used for:

在这种情况下不使用第二个索引。它将被用于:

  • filtering on index_date in the where clause
  • 在where子句中过滤index_date

And the first index would not be used in this case.

第一个索引在这种情况下不会被使用。

The ordering of columns in indexes is important. They are used from the left to the right. So, these two indexes are useful. However, a third index on index_id alone would not be useful, because the first index already takes care of the same situations where that index would be used.

索引中列的顺序非常重要。它们是从左到右使用的。这两个指标很有用。但是,仅使用index_id中的第三个索引是没有用的,因为第一个索引已经处理了使用该索引的相同情况。

#2


3  

You asked

你问

"What's the point in creating two unique constraints changing only columns order in declaration?"

“创建两个惟一的约束的意义是什么?在声明中只更改列的顺序?”

There isn't any point. The order of columns in a composite constraint doesn't make any difference:

没有任何一点。复合约束中的列顺序没有任何区别:

SQL> select * from t23
  2  /

      COL1 COL
---------- ---
         1 WTF

SQL> create index t23_i on t23(col1, col2);

Index created.

SQL> alter table t23 add constraint t23_uk unique (col1 , col2) using index t23_i
  2  /

Table altered.

SQL> insert into t23 values (1, 'WTF')
  2  /
insert into t23 values (1, 'WTF')
*
ERROR at line 1:
ORA-00001: unique constraint (APC.T23_UK) violated


SQL> alter table t23 drop  constraint t23_uk
  2  /

Table altered.

SQL> alter table t23 add constraint t23_uk unique (col2, col1) using index t23_i
  2  /

Table altered.

SQL> insert into t23 values (1, 'WTF')
  2  /
insert into t23 values (1, 'WTF')
*
ERROR at line 1:
ORA-00001: unique constraint (APC.T23_UK) violated


SQL> 

That's the problem with exam-crammers: they often just say stuff, without providing explanation or context.

这就是考试填鸭式考试的问题:他们经常只说一些东西,不提供解释或上下文。

You also asked:

你还要求:

" Would it be a good idea to create second, single column index on invoice_date?"

“在invoice_date上创建第二个、单个列索引是不是一个好主意?”

Without knowing the data it's hard to tell, but I would expect a date column to be less selective than an ID column (especially if the time element is truncated), so generally I would expect an index to built as (invoice_date, invoice_id) anyway. That might allows us to use index compression.

在不知道数据的情况下,这是很难判断的,但是我希望日期列比ID列的选择性要少(特别是如果时间元素被截断),所以通常我希望索引被构建为(invoice_date, invoice_id)。这可以让我们使用索引压缩。

Skip-scanning doesn't quite work as Steve states: it starts by probing the leading edge of the index, but only if the second column in the composite index is referenced in the WHERE clause. The optimizer might choose a Full Fast Index Scan for searching on third columns or lower. Also it won't choose a Skip Scan path if the leading edge has too many distinct values: another good reason for leading with columns of low selectively.

skip -scan并不像Steve所说的那样有效:它从探查索引的前缘开始,但前提是复合索引中的第二列在WHERE子句中引用。优化器可以选择完全快速的索引扫描来搜索第三列或更低的列。如果前缘有太多不同的值,它也不会选择跳转扫描路径:这是有选择地使用低列进行导航的另一个好理由。

So, this doesn't exactly answer your question, but I think it does convey an important point: there are no universal rules governing creating indexes for performance. We need to understand the profile of the data - its values distribution and volumes - and also the most important queries which will use the table.

因此,这并不能确切地回答您的问题,但我认为它确实传达了一个重要的观点:没有统一的规则来管理为性能创建索引。我们需要了解数据的概要文件——它的值分布和卷——以及使用该表的最重要查询。