B-Tree和GiST索引方法之间有什么区别(在PostgreSQL中)?

时间:2022-08-27 11:04:14

I have been working on optimizing my Postgres databases recently, and traditionally, I've only ever use B-Tree indexes. However, I saw that GiST indexes suport non-unique, multicolumn indexes, in the Postgres 8.3 documentation.

我最近一直在努力优化我的Postgres数据库,传统上,我只使用B-Tree索引。但是,我在Postgres 8.3文档中看到GiST索引支持非唯一的多列索引。

I couldn't, however, see what the actual difference between them is. I was hoping that my fellow coders might beable to explain, what the pros and cons between them are, and more importantly, the reasons why I would use one over the other?

但是,我不能看出它们之间的实际区别是什么。我希望我的同事们能够解释一下,他们之间的利弊是什么,更重要的是,我之所以使用其中一个的原因是什么?

4 个解决方案

#1


In a nutshell: B-Tree indexes perform better, but GiST indexes are more flexible. Usually, you want B-Tree indexes if they'll work for your data type. There was a recent post on the PG lists about a huge performance hit for using GiST indexes; they're expected to be slower than B-Trees (such is the price of flexibility), but not that much slower... work is, as you might expect, ongoing.

简而言之:B-Tree索引的性能更好,但GiST索引更灵活。通常,如果B-Tree索引适用于您的数据类型,则需要它们。 PG名单上最近有一篇关于使用GiST索引的巨大性能损失的帖子;他们预计会比B-Trees慢(这就是灵活性的代价),但速度并不慢......正如你所料,工作正在进行中。

From a post by Tom Lane, a core PostgreSQL developer:

来自PostgreSQL核心开发人员Tom Lane的帖子:

The main point of GIST is to be able to index queries that simply are not indexable in btree. ... One would fully expect btree to beat out GIST for btree-indexable cases. I think the significant point here is that it's winning by a factor of a couple hundred; that's pretty awful, and might point to some implementation problem.

GIST的要点是能够索引在btree中根本不可索引的查询。 ...人们完全希望btree能够在btree-indexable案例中击败GIST。我认为这里的重点是它赢得了几百倍;这非常糟糕,可能会指出一些实施问题。

#2


Basically everybody's right - btree is default index as it performs very well. GiST are somewhat different beasts - it's more of a "framework to write index types" than a index type on its own. You have to add custom code (in server) to use it, but on the other hand - they are very flexible.

基本上每个人都是对的 - btree是默认索引,因为它表现得非常好。 GiST是一些不同的野兽 - 它更像是一个“编写索引类型的框架”,而不是一个索引类型。您必须添加自定义代码(在服务器中)才能使用它,但另一方面 - 它们非常灵活。

Generally - you don't use GiST unless the datatype you're using tell you to do so. Example of datatypes that use GiST: ltree (from contrib), tsvector (contrib/tsearch till 8.2, in core since 8.3), and others.

通常 - 除非您使用的数据类型告诉您这样做,否则不使用GiST。使用GiST的数据类型示例:ltree(来自contrib),tsvector(contrib / tsearch到8.2,自8.3以来的核心)以及其他。

There is well known, and pretty fast geographic extenstion to PostgreSQL - PostGIS (http://postgis.refractions.net/) which uses GiST for its purposes.

PostgreSQL有着众所周知且非常快速的地理扩展 - PostGIS(http://postgis.refractions.net/),它将GiST用于其目的。

#3


GiST indexes are lossy to an extent, meaning that the DBMS has to deal with false positives/negatives, i.e.:

GiST索引在某种程度上是有损的,这意味着DBMS必须处理误报/否定,即:

GiST indexes are lossy because each document is represented in the index by a fixed- length signature. The signature is generated by hashing each word into a random bit in an n-bit string, with all these bits OR-ed together to produce an n-bit document signature. When two words hash to the same bit position there will be a false match. If all words in the query have matches (real or false) then the table row must be retrieved to see if the match is correct. b-trees do not have this behavior, so depending on the data being indexed, there may be some performance difference between the two.

GiST索引是有损的,因为每个文档在索引中由固定长度签名表示。通过将每个字散列为n比特串中的随机比特来生成签名,所有这些比特一起进行OR运算以产生n比特文档签名。当两个单词散列到相同的位位置时,将存在错误匹配。如果查询中的所有单词都匹配(真或假),则必须检索表行以查看匹配是否正确。 b-tree没有这种行为,因此根据被索引的数据,两者之间可能存在一些性能差异。

See for text search behavior http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html and http://www.postgresql.org/docs/8.3/static/indexes-types.html for a general purpose comparison.

请参阅文本搜索行为http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html和http://www.postgresql.org/docs/8.3/static/indexes-types.html一般目的比较。

#4


GiST are more general indexes. You can use them for broader purposes that the ones you would use with B-Tree. Including the ability to build a B-Tree using GiST.

GiST是更通用的索引。您可以将它们用于与B-Tree一起使用的更广泛的目的。包括使用GiST构建B树的能力。

I.E.: you can use GiST to index on geographical points, or geographical areas, something you won't be able to do with B-Tree indexes, since the only thing that matter on a B-Tree is the key (or keys) you are indexing on.

IE:您可以使用GiST索引地理点或地理区域,这是B-Tree索引无法做到的事情,因为B树上唯一重要的是键(或键)你正在索引。

#1


In a nutshell: B-Tree indexes perform better, but GiST indexes are more flexible. Usually, you want B-Tree indexes if they'll work for your data type. There was a recent post on the PG lists about a huge performance hit for using GiST indexes; they're expected to be slower than B-Trees (such is the price of flexibility), but not that much slower... work is, as you might expect, ongoing.

简而言之:B-Tree索引的性能更好,但GiST索引更灵活。通常,如果B-Tree索引适用于您的数据类型,则需要它们。 PG名单上最近有一篇关于使用GiST索引的巨大性能损失的帖子;他们预计会比B-Trees慢(这就是灵活性的代价),但速度并不慢......正如你所料,工作正在进行中。

From a post by Tom Lane, a core PostgreSQL developer:

来自PostgreSQL核心开发人员Tom Lane的帖子:

The main point of GIST is to be able to index queries that simply are not indexable in btree. ... One would fully expect btree to beat out GIST for btree-indexable cases. I think the significant point here is that it's winning by a factor of a couple hundred; that's pretty awful, and might point to some implementation problem.

GIST的要点是能够索引在btree中根本不可索引的查询。 ...人们完全希望btree能够在btree-indexable案例中击败GIST。我认为这里的重点是它赢得了几百倍;这非常糟糕,可能会指出一些实施问题。

#2


Basically everybody's right - btree is default index as it performs very well. GiST are somewhat different beasts - it's more of a "framework to write index types" than a index type on its own. You have to add custom code (in server) to use it, but on the other hand - they are very flexible.

基本上每个人都是对的 - btree是默认索引,因为它表现得非常好。 GiST是一些不同的野兽 - 它更像是一个“编写索引类型的框架”,而不是一个索引类型。您必须添加自定义代码(在服务器中)才能使用它,但另一方面 - 它们非常灵活。

Generally - you don't use GiST unless the datatype you're using tell you to do so. Example of datatypes that use GiST: ltree (from contrib), tsvector (contrib/tsearch till 8.2, in core since 8.3), and others.

通常 - 除非您使用的数据类型告诉您这样做,否则不使用GiST。使用GiST的数据类型示例:ltree(来自contrib),tsvector(contrib / tsearch到8.2,自8.3以来的核心)以及其他。

There is well known, and pretty fast geographic extenstion to PostgreSQL - PostGIS (http://postgis.refractions.net/) which uses GiST for its purposes.

PostgreSQL有着众所周知且非常快速的地理扩展 - PostGIS(http://postgis.refractions.net/),它将GiST用于其目的。

#3


GiST indexes are lossy to an extent, meaning that the DBMS has to deal with false positives/negatives, i.e.:

GiST索引在某种程度上是有损的,这意味着DBMS必须处理误报/否定,即:

GiST indexes are lossy because each document is represented in the index by a fixed- length signature. The signature is generated by hashing each word into a random bit in an n-bit string, with all these bits OR-ed together to produce an n-bit document signature. When two words hash to the same bit position there will be a false match. If all words in the query have matches (real or false) then the table row must be retrieved to see if the match is correct. b-trees do not have this behavior, so depending on the data being indexed, there may be some performance difference between the two.

GiST索引是有损的,因为每个文档在索引中由固定长度签名表示。通过将每个字散列为n比特串中的随机比特来生成签名,所有这些比特一起进行OR运算以产生n比特文档签名。当两个单词散列到相同的位位置时,将存在错误匹配。如果查询中的所有单词都匹配(真或假),则必须检索表行以查看匹配是否正确。 b-tree没有这种行为,因此根据被索引的数据,两者之间可能存在一些性能差异。

See for text search behavior http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html and http://www.postgresql.org/docs/8.3/static/indexes-types.html for a general purpose comparison.

请参阅文本搜索行为http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html和http://www.postgresql.org/docs/8.3/static/indexes-types.html一般目的比较。

#4


GiST are more general indexes. You can use them for broader purposes that the ones you would use with B-Tree. Including the ability to build a B-Tree using GiST.

GiST是更通用的索引。您可以将它们用于与B-Tree一起使用的更广泛的目的。包括使用GiST构建B树的能力。

I.E.: you can use GiST to index on geographical points, or geographical areas, something you won't be able to do with B-Tree indexes, since the only thing that matter on a B-Tree is the key (or keys) you are indexing on.

IE:您可以使用GiST索引地理点或地理区域,这是B-Tree索引无法做到的事情,因为B树上唯一重要的是键(或键)你正在索引。