I am using Lucene 3.0.1 in a Java 5 environment. I've been researching this issue a little bit, but the documentation hasn't given any direct answers.
我在Java 5环境中使用Lucene 3.0.1。我一直在研究这个问题,但文档没有给出任何直接的答案。
Using the search method
使用搜索方法
TopFieldDocs search(Weight weight, Filter filter, int nDocs, Sort sort)
I always need to provide a maximum number of search results nDocs.
我总是需要提供最大数量的搜索结果nDocs。
What if I wanted to have all matching results? It feels like setting nDocs to Integer.MAX_VALUE
is a kind of hacky way to do this (and would result in speed and memory performance drop?).
如果我想获得所有匹配结果怎么办?感觉就像将nDocs设置为Integer.MAX_VALUE是一种执行此操作的hacky方法(并且会导致速度和内存性能下降?)。
Anyone else who has any idea?
其他任何有想法的人?
1 个解决方案
#1
6
You are using a search method that returns the top n hits for a query.
您正在使用搜索方法返回查询的前n个匹配。
There are other (more low-level) methods that do not have the limitation, and it says in the documentation that "applications should only use this if they need all of the matching documents. The high-level search API (search(Query, int)) is usually more efficient, as it skips non-high-scoring hits.".
还有其他(更低级别)方法没有限制,并且在文档中说“应用程序应该只在需要所有匹配文档时使用它。高级搜索API(搜索,查询, int))通常更有效率,因为它会跳过非高得分的命中。“
So if you really need all documents, you can use the low-level API. I doubt that it makes a big difference in performance to passing a really high limit to the high-level API. If you need all documents (and there really are a lot of them), it is going to be slow either way, especially if sorting is involved.
因此,如果您确实需要所有文档,则可以使用低级API。我怀疑在高性能API上传递一个非常高的限制会对性能产生很大影响。如果您需要所有文档(并且确实存在很多文档),那么无论哪种方式都会很慢,尤其是在涉及排序的情况下。
#1
6
You are using a search method that returns the top n hits for a query.
您正在使用搜索方法返回查询的前n个匹配。
There are other (more low-level) methods that do not have the limitation, and it says in the documentation that "applications should only use this if they need all of the matching documents. The high-level search API (search(Query, int)) is usually more efficient, as it skips non-high-scoring hits.".
还有其他(更低级别)方法没有限制,并且在文档中说“应用程序应该只在需要所有匹配文档时使用它。高级搜索API(搜索,查询, int))通常更有效率,因为它会跳过非高得分的命中。“
So if you really need all documents, you can use the low-level API. I doubt that it makes a big difference in performance to passing a really high limit to the high-level API. If you need all documents (and there really are a lot of them), it is going to be slow either way, especially if sorting is involved.
因此,如果您确实需要所有文档,则可以使用低级API。我怀疑在高性能API上传递一个非常高的限制会对性能产生很大影响。如果您需要所有文档(并且确实存在很多文档),那么无论哪种方式都会很慢,尤其是在涉及排序的情况下。