全文搜索:Whoosh Vs SOLR。

时间:2022-12-31 10:36:31

I am working on a Django project, where I need to implement full text search. I have seen SOLR and found some good comments for the same. But as its implemented in Java and would need java enviroment to be installed on the system along with Python. Looking for the python equivalent for SOLR, I have seen Whoosh but I am not sure whether Whoosh is as efficient and strong as SOLR. Or shall I go with SOLR option only or are there any better options than Whoosh and SOLR with python?

我正在开发一个Django项目,在这个项目中我需要实现全文搜索。我也见过SOLR,也发现了一些不错的评论。但是,正如它在Java中实现的那样,需要将Java环境与Python一起安装到系统上。在寻找python的SOLR对等物时,我看到了Whoosh,但我不确定Whoosh是否像SOLR一样高效和强大。或者我应该只使用SOLR选项,还是有比Whoosh和SOLR使用python更好的选项?

Please suggest.

请建议。

Thanks in advance

谢谢提前

2 个解决方案

#1


13  

Whoosh is actually very fast for a python-only implementation. That said, it's still at least an order of magnitude slower. Depending on the amount of data you need to index and search and the requirements on the maximum allowable latency and concurrent searches, it may not be an option.

Whoosh实际上对于只包含python的实现来说非常快。也就是说,它至少慢了一个数量级。根据需要索引和搜索的数据量以及对最大允许延迟和并发搜索的需求,这可能不是一个选项。

SOLR is a bit of a complicated beast, but it's by far the most comprehensive search solution. Mix it with solrpy for stunning results. Yes, you will need java hosting.

SOLR有点复杂,但它是迄今为止最全面的搜索解决方案。将它与solrpy混合,得到令人惊叹的结果。是的,您需要java托管。

You might also want to check out the python bindings for xapian. Xapian is very very fast, but less of a complete solution than SOLR. They are GPL licensed though, so that may/may not be viable for you.

您可能还想检查xapian的python绑定。Xapian的速度非常快,但比SOLR的解决方案更少。但是它们是GPL许可的,所以可能对您来说是不可行的。

#2


1  

I have used Lucene and Lucene extensions like SOLR and Nutch, and I found out that lucene pretty much satisfies what I need. I've only tried Whoosh once but chose Lucene because 1) I am using Java 2) I had trouble making UTF-8 work with Whoosh (not sure if it works out of the box now). In Lucene, I had no trouble working with Chinese characters.

我使用了Lucene和Lucene扩展,比如SOLR和Nutch,我发现Lucene基本上满足了我的需要。我只尝试过一次Whoosh,但选择Lucene是因为1)我正在使用Java 2)我在让UTF-8与Whoosh一起工作时遇到了麻烦(不确定它现在是否能正常工作)。在Lucene,我和汉字打交道没有什么困难。

If you're using Python as your Programming Language and Whoosh satisfies your needs then I'd suggest you use it over Java alternatives for better integration, avoid external dependencies, faster customization if you need to code additional functionalities.

如果您使用Python作为您的编程语言,并且Whoosh满足您的需求,那么我建议您使用Python而不是Java替代来更好地集成,避免外部依赖,如果您需要编写额外的功能,可以更快地定制。

UPDATE: If you're interested in using Lucene, it has a Python wrapper: See http://lucene.apache.org/pylucene/

更新:如果您对使用Lucene感兴趣,可以查看http://lucene.apache.org/pylucene/

#1


13  

Whoosh is actually very fast for a python-only implementation. That said, it's still at least an order of magnitude slower. Depending on the amount of data you need to index and search and the requirements on the maximum allowable latency and concurrent searches, it may not be an option.

Whoosh实际上对于只包含python的实现来说非常快。也就是说,它至少慢了一个数量级。根据需要索引和搜索的数据量以及对最大允许延迟和并发搜索的需求,这可能不是一个选项。

SOLR is a bit of a complicated beast, but it's by far the most comprehensive search solution. Mix it with solrpy for stunning results. Yes, you will need java hosting.

SOLR有点复杂,但它是迄今为止最全面的搜索解决方案。将它与solrpy混合,得到令人惊叹的结果。是的,您需要java托管。

You might also want to check out the python bindings for xapian. Xapian is very very fast, but less of a complete solution than SOLR. They are GPL licensed though, so that may/may not be viable for you.

您可能还想检查xapian的python绑定。Xapian的速度非常快,但比SOLR的解决方案更少。但是它们是GPL许可的,所以可能对您来说是不可行的。

#2


1  

I have used Lucene and Lucene extensions like SOLR and Nutch, and I found out that lucene pretty much satisfies what I need. I've only tried Whoosh once but chose Lucene because 1) I am using Java 2) I had trouble making UTF-8 work with Whoosh (not sure if it works out of the box now). In Lucene, I had no trouble working with Chinese characters.

我使用了Lucene和Lucene扩展,比如SOLR和Nutch,我发现Lucene基本上满足了我的需要。我只尝试过一次Whoosh,但选择Lucene是因为1)我正在使用Java 2)我在让UTF-8与Whoosh一起工作时遇到了麻烦(不确定它现在是否能正常工作)。在Lucene,我和汉字打交道没有什么困难。

If you're using Python as your Programming Language and Whoosh satisfies your needs then I'd suggest you use it over Java alternatives for better integration, avoid external dependencies, faster customization if you need to code additional functionalities.

如果您使用Python作为您的编程语言,并且Whoosh满足您的需求,那么我建议您使用Python而不是Java替代来更好地集成,避免外部依赖,如果您需要编写额外的功能,可以更快地定制。

UPDATE: If you're interested in using Lucene, it has a Python wrapper: See http://lucene.apache.org/pylucene/

更新:如果您对使用Lucene感兴趣,可以查看http://lucene.apache.org/pylucene/