将查询部分与Lucene和部分数据库(MySQL)相结合

时间:2021-01-09 03:08:55

I have an application which needs to do filtering and retrieve results from a list of Articles. For database I am using MySQL, with NHibernate as the ORM. The query also does a full-text search based on keywords, and for this it uses Lucene.Net.

我有一个应用程序需要进行过滤并从文章列表中检索结果。对于数据库,我使用MySQL,NHibernate作为ORM。该查询还基于关键字进行全文搜索,为此它使用Lucene.Net。

My problem is that the query spans 'both' domains. For example, I might need to get all articles that contain the keywords 'traffic control', and have PublishedOn < 2012-10-01. Also, the query uses pagination, for example page #2, with a page size of 50. The problem is how can I create a query which spans both MySQL (for the PublishedOn part), and Lucene.Net to harness the full-text search capability.

我的问题是查询跨越“两个”域。例如,我可能需要获取包含关键字“流量控制”的所有文章,并且已发布了PublishedOn <2012-10-01。此外,查询使用分页,例如页面#2,页面大小为50.问题是我如何创建一个跨MySQL(对于PublishedOn部分)和Lucene.Net来利用全文的查询搜索能力。

If I do a search on MySQL first, I cannot just get the first 50, because the results might be further filtered in Lucene and I need 50 as my page size. The same goes if I start with Lucene.Net. Also, preferably the ordering is by 'relevance', so this is something which Lucene can do, not MySQL.

如果我首先搜索MySQL,我不能只获得前50个,因为结果可能会在Lucene中进一步过滤,我需要50作为我的页面大小。如果我从Lucene.Net开始,也是如此。此外,优选地,排序是“相关性”,因此这是Lucene可以做的事情,而不是MySQL。

My current approach has been to first filter by MySQL, and retrieve ALL the primary keys of matched records. Then, I do a query in Lucene, with a term-query matching the primary key to the list of results. However, Lucene is very slow for such a query, and the database can contain over 200,000 articles. Doing such a query takes ages in Lucene, while it is blazingly fast for full-text searches.

我目前的方法是首先通过MySQL过滤,并检索匹配记录的所有主键。然后,我在Lucene中进行查询,术语查询将主键与结果列表相匹配。但是,Lucene对于这样的查询非常慢,并且数据库可以包含超过200,000篇文章。在Lucene中进行这样的查询需要花费很长时间,而全文搜索的速度却非常快。

Any ideas how one can go about addressing such an issue?

任何想法如何处理这样的问题?

1 个解决方案

#1


0  

Lucene isn't only about full text search. You can add PublishedOn property to a Lucene document and perform query like this:

Lucene不仅仅是关于全文搜索。您可以将PublishedOn属性添加到Lucene文档并执行以下查询:

Text:"traffic control" AND PublishedOn:[00000000 TO 20121001]

Check out the "Range Searches" section in Lucene syntax documentation.

查看Lucene语法文档中的“范围搜索”部分。

#1


0  

Lucene isn't only about full text search. You can add PublishedOn property to a Lucene document and perform query like this:

Lucene不仅仅是关于全文搜索。您可以将PublishedOn属性添加到Lucene文档并执行以下查询:

Text:"traffic control" AND PublishedOn:[00000000 TO 20121001]

Check out the "Range Searches" section in Lucene syntax documentation.

查看Lucene语法文档中的“范围搜索”部分。