独特的索引对列搜索性能更好吗?(PGSQL & MySQL)

I am curious as to whether

我很好奇是否

CREATE INDEX idx ON tbl (columns);

vs.

与

CREATE UNIQUE INDEX idx ON tbl (columns);

has a significant algorithmic performance benefit in PostgreSQL or MySQL implementations when scanning the indexed column(s), or whether the UNIQUE keyword simply introduces a unique constraint alongside the index.

在扫描索引列时，PostgreSQL或MySQL实现具有显着的算法性能优势，或者UNIQUE关键字是否只是在索引旁边引入了唯一约束。

I imagine it is probably fair to say that there is a marginal benefit insofar as indexes are likely to be internally implemented as some sort of hash¹-like structure, and collision handling by definition result in something other than O(1) performance. Given this premise, it is likely that if a large percentage of values are identical than the structure degenerates into something linear.

我想可以公平地说，只要索引很可能在内部实现为某种类似hash1的结构，并且定义中的冲突处理会导致O（1）性能之外的其他内容，这可能是公平的。鉴于这一前提，如果大部分值相同而不是结构退化为线性，则很可能。

So, for purposes of my question, assume that the distribution of values is relatively discrete and uniform.

因此，出于我的问题的目的，假设值的分布是相对离散和均匀的。

Thanks in advance!

提前致谢！

^{1 Which is a matter of pure speculation for me, as I am not familiar with RDBM internals.}

1对于我来说这是一个纯粹的推测问题，因为我不熟悉RDBM内部。

3 个解决方案

#1

If your data are unique, you should create a UNIQUE index on them.

如果您的数据是唯一的，您应该在它们上创建一个UNIQUE索引。

This implies no additional overhead and affects optimizer's decisions in certain cases so that it can choose a better algorithm.

这意味着没有额外的开销，并且在某些情况下会影响优化器的决策，以便它可以选择更好的算法。

In SQL Server and in PostgreSQL, for instance, if you sort on a UNIQUE key, the optimizer ignores the ORDER BY clauses used after that (since they are irrelevant), i. e. this query:

例如，在SQL Server和PostgreSQL中，如果对UNIQUE键进行排序，优化器会忽略之后使用的ORDER BY子句（因为它们不相关），i。即这个查询：

SELECT  *
FROM    mytable
ORDER BY
        col_unique, other_col
LIMIT 10

will use an index on col_unique and won't sort on other_col because it's useless.

将使用col_unique上的索引，并且不会对other_col进行排序，因为它没用。

This query:

这个查询：

SELECT  *
FROM    mytable
WHERE   mycol IN
        (
        SELECT  othercol
        FROM    othertable
        )

will also be converted into an INNER JOIN (as opposed to a SEMI JOIN) if there is a UNIQUE index on othertable.othercol.

如果在othertable.othercol上有一个UNIQUE索引，也将转换为INNER JOIN（而不是SEMI JOIN）。

An index always contains some kind of a pointer to the row (ctid in PostgreSQL, row pointer in MyISAM, primary key/uniquifier in InnoDB) and the leaves are ordered on these pointers, so in fact every index leaf is unique is some way (though it may not be obvious).

索引总是包含某种指向行的指针（PostgreSQL中的ctid，MyISAM中的行指针，InnoDB中的主键/ uniquifier），并且叶子在这些指针上排序，所以实际上每个索引叶子都是独特的（某种方式）虽然它可能不是很明显）。

See this article in my blog for performance details:

有关性能详情，请参阅我的博客中的这篇文章

Making an index UNIQUE
制作索引UNIQUE

#2

There is a small penalty during update/insert operations for having the unique constraint. It has to search before the insert/update operation to make sure the uniqueness constraint isn't violated.

在具有唯一约束的更新/插入操作期间存在小的惩罚。它必须在插入/更新操作之前进行搜索，以确保不违反唯一性约束。

#3

Well, usually indexes are B-Trees, not hashes (there are hash based indexes, but the most common index (at least in PostgreSQL) is bases on B Tree).

好吧，通常索引是B-Trees，而不是哈希（有基于哈希的索引，但最常见的索引（至少在PostgreSQL中）是基于B Tree）。

As for speed - unique should be faster - when index scanning finds row with given value, it doesn't have to search if there are any other rows with this value, and can finish scanning imemdiately.

至于速度 - 唯一应该更快 - 当索引扫描找到具有给定值的行时，它不必搜索是否存在具有此值的任何其他行，并且可以完全扫描。

#1