如何使用Postgresql创建简单的模糊搜索？

I have a little problem with search functionality on my RoR based site. I have many Produts with some CODEs. This code can be any string like "AB-123-lHdfj". Now I use ILIKE operator to find products:

我的基于RoR的网站上的搜索功能有点问题。我有很多产品和一些CODE。此代码可以是任何字符串，如“AB-123-lHdfj”。现在我使用ILIKE运算符来查找产品：

Product.where("code ILIKE ?", "%" + params[:search] + "%")

It works fine, but it can't find product with codes like "AB123-lHdfj", or "AB123lHdfj".

它工作正常，但找不到像“AB123-lHdfj”或“AB123lHdfj”这样的代码的产品。

What should I do for this? May be postgresql has some string normalization function, or some other methods to help me? :)

我该怎么办？可能postgresql有一些字符串规范化功能，还是其他一些方法来帮助我？ :)

2 个解决方案

#1

Postgres provides a module with several string comparsion functions such as soundex and metaphone. But you will want to use the levenshtein edit distance function.

Postgres提供了一个带有多个字符串比较函数的模块，例如soundex和metaphone。但是你会想要使用levenshtein编辑距离函数。

Example:

test=# SELECT levenshtein('GUMBO', 'GAMBOL');
 levenshtein
-------------
           2
(1 row)

The 2 is the edit distance between the two words. When you apply this against a number of words and sort by the edit distance result you will have the type of fuzzy matches that you're looking for.

2是两个单词之间的编辑距离。当您对多个单词应用此单词并按编辑距离结果排序时，您将获得您正在寻找的模糊匹配类型。

Try this query sample: (with your own object names and data of course)

试试这个查询示例:(当然有你自己的对象名和数据）

SELECT * 
FROM some_table
WHERE levenshtein(code, 'AB123-lHdfj') <= 3
ORDER BY levenshtein(code, 'AB123-lHdfj')
LIMIT 10

This query says:

这个查询说：

Give me the top 10 results of all data from some_table where the edit distance between the code value and the input 'AB123-lHdfj' is less than 3. You will get back all rows where the value of code is within 3 characters difference to 'AB123-lHdfj'...

给我来自some_table的所有数据的前10个结果，其中代码值和输入'AB123-lHdfj'之间的编辑距离小于3.您将获得代码值在3个字符之内的所有行的差异' AB123-lHdfj” ...

Note: if you get an error like:

注意：如果您收到如下错误：

function levenshtein(character varying, unknown) does not exist

Install the fuzzystrmatch extension using:

使用以下命令安装fuzzystrmatch扩展：

test=# CREATE EXTENSION fuzzystrmatch;

#2

Paul told you about levenshtein(). That's a very useful tool, but it's also very slow with big tables. It has to calculate the levenshtein-distance from the search term for every single row, that's expensive.

保罗告诉你levenshtein（）。这是一个非常有用的工具，但是对于大表来说它也很慢。它必须计算每一行搜索项的levenshtein距离，这是昂贵的。

First off, if your requirements are as simple as the example indicates, you can still use LIKE. Just replace any - in your search term with % to create the WHERE clause

首先，如果您的要求与示例所示一样简单，您仍然可以使用LIKE。只需用％替换任何 - 在搜索词中创建WHERE子句

WHERE code LIKE "%AB%123%lHdfj%"

instead of

代替

WHERE code LIKE "%AB-123-lHdfj%"

If your real problem is more complex and you need something faster then - depending on your requirements - there are several options.

如果你真正的问题更复杂，而你需要更快的东西 - 根据你的要求 - 有几种选择。

There is full text search, of course. But this may be an overkill in your case.

当然还有全文搜索。但在你的情况下，这可能是一种矫枉过正。
A more likely candidate is pg_trgm. Note that you can combine that with LIKE in PostgreSQL 9.1. See this blog post by Depesz.
Also very interesting in this context: the similarity() function or % operator of that module. More:

更可能的候选人是pg_trgm。请注意，您可以将它与PostgreSQL 9.1中的LIKE结合使用。请参阅Depesz的这篇博客文章。在此上下文中也非常有趣：该模块的similarity（）函数或％运算符。更多：
- PostgreSQL LIKE query performance variations
- PostgreSQL LIKE查询性能变化
Last but not least you can implement a hand-knit solution with a function to normalize the strings to be searched. For instance, you could transform AB1-23-lHdfj -> ab123lhdfj, save it in an additional column and search it with search terms that have been transformed the same way.

最后但并非最不重要的是，您可以实现一个手工编织的解决方案，其中包含一个函数来规范化要搜索的字符串。例如，您可以转换AB1-23-lHdfj - > ab123lhdfj，将其保存在其他列中，并使用以相同方式转换的搜索项进行搜索。

Or use an index on an expression instead of the redundant column. (Involved functions must be IMMUTABLE.) And possibly combine that with pg_tgrm from above.

或者在表达式而不是冗余列上使用索引。（涉及的函数必须是IMMUTABLE。）并且可能将它与上面的pg_tgrm结合起来。

Overview of pattern-matching techniques:

模式匹配技术概述：

Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
与PostgreSQL中的LIKE，SIMILAR TO或正则表达式匹配的模式

#1