我可以让Lucene返回无限数量的搜索结果吗?

时间:2022-09-12 23:49:04

I am using Lucene 3.0.1 in a Java 5 environment. I've been researching this issue a little bit, but the documentation hasn't given any direct answers.

我在Java 5环境中使用Lucene 3.0.1。我一直在研究这个问题,但文档没有给出任何直接的答案。

Using the search method

使用搜索方法

TopFieldDocs    search(Weight weight, Filter filter, int nDocs, Sort sort) 

I always need to provide a maximum number of search results nDocs.

我总是需要提供最大数量的搜索结果nDocs。

What if I wanted to have all matching results? It feels like setting nDocs to Integer.MAX_VALUE is a kind of hacky way to do this (and would result in speed and memory performance drop?).

如果我想获得所有匹配结果怎么办?感觉就像将nDocs设置为Integer.MAX_VALUE是一种执行此操作的hacky方法(并且会导致速度和内存性能下降?)。

Anyone else who has any idea?

其他任何有想法的人?

1 个解决方案

#1


6  

You are using a search method that returns the top n hits for a query.

您正在使用搜索方法返回查询的前n个匹配。

There are other (more low-level) methods that do not have the limitation, and it says in the documentation that "applications should only use this if they need all of the matching documents. The high-level search API (search(Query, int)) is usually more efficient, as it skips non-high-scoring hits.".

还有其他(更低级别)方法没有限制,并且在文档中说“应用程序应该只在需要所有匹配文档时使用它。高级搜索API(搜索,查询, int))通常更有效率,因为它会跳过非高得分的命中。“

So if you really need all documents, you can use the low-level API. I doubt that it makes a big difference in performance to passing a really high limit to the high-level API. If you need all documents (and there really are a lot of them), it is going to be slow either way, especially if sorting is involved.

因此,如果您确实需要所有文档,则可以使用低级API。我怀疑在高性能API上传递一个非常高的限制会对性能产生很大影响。如果您需要所有文档(并且确实存在很多文档),那么无论哪种方式都会很慢,尤其是在涉及排序的情况下。

#1


6  

You are using a search method that returns the top n hits for a query.

您正在使用搜索方法返回查询的前n个匹配。

There are other (more low-level) methods that do not have the limitation, and it says in the documentation that "applications should only use this if they need all of the matching documents. The high-level search API (search(Query, int)) is usually more efficient, as it skips non-high-scoring hits.".

还有其他(更低级别)方法没有限制,并且在文档中说“应用程序应该只在需要所有匹配文档时使用它。高级搜索API(搜索,查询, int))通常更有效率,因为它会跳过非高得分的命中。“

So if you really need all documents, you can use the low-level API. I doubt that it makes a big difference in performance to passing a really high limit to the high-level API. If you need all documents (and there really are a lot of them), it is going to be slow either way, especially if sorting is involved.

因此,如果您确实需要所有文档,则可以使用低级API。我怀疑在高性能API上传递一个非常高的限制会对性能产生很大影响。如果您需要所有文档(并且确实存在很多文档),那么无论哪种方式都会很慢,尤其是在涉及排序的情况下。