有没有比pg_trgm更快的postgres模糊匹配?

时间:2021-09-15 19:26:03

I have a Postgres table with about 5 million records and I want to find the closest match to an input key. I tried using trigrams with the pg_trgm module, but it took roughly 5 seconds per query, which is too slow for my needs.

我有一个包含约500万条记录的Postgres表,我想找到与输入键最接近的匹配项。我尝试在pg_trgm模块中使用trigrams,但每次查询花了大约5秒钟,这对我的需求来说太慢了。

Is there a faster way to do fuzzy match within Postgres?

在Postgres中有更快的方式进行模糊匹配吗?

3 个解决方案

#1


It looks like the estimations of result size in your explain output are way off. This is not unexpected as it is very hard to estimate results of full text search well.

看起来你的解释输出中结果大小的估计是偏离的。这并不意外,因为很难很好地估计全文搜索的结果。

This causes Postgresql to use bad query plan. Try to disable bitmap scan (set enable_bitmapscan=off) and try again.

这会导致Postgresql使用错误的查询计划。尝试禁用位图扫描(设置enable_bitmapscan = off),然后重试。

#2


Soundex is an alternative fuzzy match, but it can be very fuzzy. I would stick with the trigram matching, if you can. Is there another criterion you could use to make the trigram search work on a smaller set of results?

Soundex是另一种模糊匹配,但它可能非常模糊。如果可以的话,我会坚持使用三元组匹配。是否有另一个标准可用于使三元组搜索在较小的结果集上工作?

#3


Depending on what you are looking for, Postgres can also do matches on regular expressions, instead of the standard "like" syntax. It may be a better fit for you.

根据您的需求,Postgres还可以对正则表达式进行匹配,而不是标准的“喜欢”语法。它可能更适合您。

#1


It looks like the estimations of result size in your explain output are way off. This is not unexpected as it is very hard to estimate results of full text search well.

看起来你的解释输出中结果大小的估计是偏离的。这并不意外,因为很难很好地估计全文搜索的结果。

This causes Postgresql to use bad query plan. Try to disable bitmap scan (set enable_bitmapscan=off) and try again.

这会导致Postgresql使用错误的查询计划。尝试禁用位图扫描(设置enable_bitmapscan = off),然后重试。

#2


Soundex is an alternative fuzzy match, but it can be very fuzzy. I would stick with the trigram matching, if you can. Is there another criterion you could use to make the trigram search work on a smaller set of results?

Soundex是另一种模糊匹配,但它可能非常模糊。如果可以的话,我会坚持使用三元组匹配。是否有另一个标准可用于使三元组搜索在较小的结果集上工作?

#3


Depending on what you are looking for, Postgres can also do matches on regular expressions, instead of the standard "like" syntax. It may be a better fit for you.

根据您的需求,Postgres还可以对正则表达式进行匹配,而不是标准的“喜欢”语法。它可能更适合您。