PostgreSQL和Django中的不精确全文搜索

时间:2021-11-20 19:24:30

I'm new to PostgreSQL, and I'm not sure how to go about doing an inexact full-text search. Not that it matters too much, but I'm using Django. In other words, I'm looking for something like the following:

我是PostgreSQL的新手,我不确定如何进行不精确的全文搜索。这并不重要,但我正在使用Django。换句话说,我正在寻找以下内容:

q = 'hello world'
queryset = Entry.objects.extra(
    where=['body_tsv @@ plainto_tsquery(%s)'], 
    params=[q])
for entry in queryset:
    print entry.title

where I the list of entries should contain either exactly 'hello world', or something similar. The listings should then be ordered according to how far away their value is from the specified string. For instance, I would like the query to include entries containing "Hello World", "hEllo world", "helloworld", "hell world", etc., with some sort of ranking indicating how far away each item is from the perfect, unchanged query string.

在哪里我的条目列表应该包含'hello world'或类似的东西。然后应根据它们的值与指定字符串的距离来排序列表。例如,我希望查询包含包含“Hello World”,“hEllo world”,“helloworld”,“hell world”等的条目,其中某种排名表示每个项目距离完美的距离,查询字符串不变。

How would you go about doing this?

你会怎么做呢?

2 个解决方案

#1


2  

Your best bet is to use Django raw querysets, I use it with MySQL to perform full text matching. If the data is all in the database and Postgres provides the matching capability then it makes sense to use it. Plus Postgres offers some really useful things in terms of stemming etc with full text queries.

最好的办法是使用Django原始查询集,我将它与MySQL一起使用来执行全文匹配。如果数据全部在数据库中并且Postgres提供匹配功能,则使用它是有意义的。加上Postgres提供了一些非常有用的东西,包括词干等全文查询。

Basically it lets you write the actual query you want yet returns models (as long as you are querying a model table obviously).

基本上它可以让你编写你想要的实际查询然后返回模型(只要你明显查询模型表)。

The advantage this gives you is that you can test the exact query you will be using first in Postgres, the documentation covers full text queries pretty well.

这给您带来的好处是,您可以在Postgres中测试您将首先使用的确切查询,文档很好地涵盖了全文查询。

The main gotcha with raw querysets at the moment is they don't support count. So if you will be returning lots of data and have memory constraints on your application you might need to do something clever.

目前原始查询集的主要问题是它们不支持计数。因此,如果您将返回大量数据并对应用程序有内存限制,则可能需要做一些聪明的事情。


"Inexact" matching however isn't really part of the full text searching capabilities. Instead you want the postgres fuzzystrmatch contrib module. It's use is described here with indexes.

然而,“不精确”匹配并不是全文搜索功能的一部分。相反,你想要postgres fuzzystrmatch contrib模块。这里使用索引来描述它的用法。

#2


1  

The best would be to use a search engine for this purpose. Django-haystack supports the integration of three different search engines.

最好的方法是使用搜索引擎来实现此目的。 Django-haystack支持三种不同搜索引擎的集成。

#1


2  

Your best bet is to use Django raw querysets, I use it with MySQL to perform full text matching. If the data is all in the database and Postgres provides the matching capability then it makes sense to use it. Plus Postgres offers some really useful things in terms of stemming etc with full text queries.

最好的办法是使用Django原始查询集,我将它与MySQL一起使用来执行全文匹配。如果数据全部在数据库中并且Postgres提供匹配功能,则使用它是有意义的。加上Postgres提供了一些非常有用的东西,包括词干等全文查询。

Basically it lets you write the actual query you want yet returns models (as long as you are querying a model table obviously).

基本上它可以让你编写你想要的实际查询然后返回模型(只要你明显查询模型表)。

The advantage this gives you is that you can test the exact query you will be using first in Postgres, the documentation covers full text queries pretty well.

这给您带来的好处是,您可以在Postgres中测试您将首先使用的确切查询,文档很好地涵盖了全文查询。

The main gotcha with raw querysets at the moment is they don't support count. So if you will be returning lots of data and have memory constraints on your application you might need to do something clever.

目前原始查询集的主要问题是它们不支持计数。因此,如果您将返回大量数据并对应用程序有内存限制,则可能需要做一些聪明的事情。


"Inexact" matching however isn't really part of the full text searching capabilities. Instead you want the postgres fuzzystrmatch contrib module. It's use is described here with indexes.

然而,“不精确”匹配并不是全文搜索功能的一部分。相反,你想要postgres fuzzystrmatch contrib模块。这里使用索引来描述它的用法。

#2


1  

The best would be to use a search engine for this purpose. Django-haystack supports the integration of three different search engines.

最好的方法是使用搜索引擎来实现此目的。 Django-haystack支持三种不同搜索引擎的集成。