NoSQL数据库中的全文搜索

Has anyone here have any experience deploying a real online system that had a full text search in any of the NoSQL databases?
在座的有谁有在任何一个NoSQL数据库中进行全文搜索的实际在线系统部署经验吗?
For example, how does the full-text search compare in MongoDB, Riak and CouchDB?
例如，在MongoDB、Riak和CouchDB中，全文搜索是如何进行比较的?
Some of the metric that I am looking for is ease of deployment and maintaince and of course speed.
我正在寻找的一些度量是易于部署和维护，当然还有速度。
How mature are they? Are they any replacement for the Lucene infrastructure?
他们有多成熟?它们能替代Lucene基础设施吗?

Thanks.

谢谢。

12 个解决方案

#1

None of the existing "NoSQL" database provides a reasonable implementation of something that could be named "fulltext search". MongoDB in particular has barely nothing so far (matching using regular expressions is not fulltext search and searching using $in or $all operators on a keyword word list is just a very poor implementation of a "fulltext search"). Using Solr, ElasticSearch or Sphinx is straight forward - an implementation and integration on the application level. Your choice widely depends on you requirements and current setup.

现有的“NoSQL”数据库中没有一个提供了可以命名为“全文搜索”的合理实现。到目前为止，MongoDB几乎没有任何东西(使用正则表达式进行匹配不是全文搜索，而在关键字列表中使用$in或$所有运算符搜索，只是一个非常糟糕的“全文搜索”的实现)。使用Solr、弹性搜索或Sphinx是直接的——在应用程序级别上实现和集成。您的选择很大程度上取决于您的需求和当前的设置。

#2

Here's the details on Riak Search http://wiki.basho.com/Riak-Search.html and a presentation on it as well

以下是Riak搜索的详细信息:http://wiki.basho.com/Riak-Search.html以及它的演示

#3

Yes. See CouchDB-Lucene which is a CouchDB extension to support full Lucene queries of the data.

是的。请参阅CouchDB-Lucene，它是一个CouchDB扩展，支持数据的完整Lucene查询。

#4

MarkLogic has better options for text search, if I recall. Here is a discussion on the topic, though it is on their blog, from their writers.

如果我记得的话，MarkLogic有更好的文本搜索选项。这是一个关于这个话题的讨论，尽管它在他们的博客上，来自他们的作者。

#5

I'm involved in the development of an application using Solandra (Cassandra based Apache Solr). In my experience the system is quite stable and able to handle TB+ data. I'm personally quite happy with the software for the following reasons: 1. Automated partitioning of data due to Cassandra backend. 2. Rich querying capabilities (due to Solr and Lucene). 3. Fast read and writes (writes significantly faster than reads).

我参与了使用Solandra(基于Cassandra的Apache Solr)开发应用程序。根据我的经验，该系统相当稳定，能够处理TB+数据。我个人对这个软件非常满意，原因如下:1。由于Cassandra后端，数据的自动划分。2。丰富的查询功能(由于Solr和Lucene)。3所示。快读和写(写得比读快得多)。

However currently Solandra, I believe does not support batch mutations. That is, I can insert 100 columns in a single insertion into Cassandra, however Solandra does not support this.

但是目前我认为Solandra不支持批量突变。也就是说，我可以在卡桑德拉的一次插入中插入100列，但是Solandra不支持这一点。

#6

For MongoDB, there isn't a full full-text indexing feature yet, however there's possibly one in the pipeline, perhaps due in v2.2.

对于MongoDB，还没有完整的全文本特性，但是可能有一个正在开发中，可能是v2.2版本。

In the meantime, you can create a simple inverted index by using a string array field, and putting an index on it, as described here: Full Text Search in Mongo

与此同时，您可以使用字符串数组字段创建一个简单的反向索引，并在其上放置一个索引，如下所示:Mongo中的全文搜索

Or, you could maintain a parallel full-text index in a dedicated Solr or Lucene index, and if you're feeling really ambitious replicate directly to your full-text store from the Mongo oplog. Otherwise, populate both and keep in sync from your application logic.

或者，您可以在专用Solr或Lucene索引中维护一个并行的全文索引，如果您感觉非常有野心，可以直接复制到Mongo oplog的全文存储。否则，从应用程序逻辑中填充并保持同步。

#7

I've just finished completion of this using data that is stored in MongoDB while having my Fulltext engin in Sphinx Search. I know mongo has a votable issue for adding fulltext to a future release; however at this point they don't have it.

我刚刚用存储在MongoDB中的数据完成了这项工作，同时在Sphinx搜索中使用了全文引擎。我知道mongo有一个votable的问题，可以在将来的版本中添加全文;但是现在他们没有。

There are several ways of inserting your Mongo information into sphinx; however the one I've found the most luck with (and has been extremely easy) is through xmlpipe2. It took me a bit to fully understand how to use this; however this article: Sphinx xmlpipe2 in PHP has an outstanding walk through which shows (at least in PHP) how to build the document, then how to insert it into sphinx.

有几种方法可以将你的Mongo信息插入到sphinx中;然而，我发现最幸运的方法(而且非常容易)是通过xmlpipe2。我花了一点时间才完全理解如何使用它;然而，本文:PHP中的Sphinx xmlpipe2有一个出色的遍历，它展示了(至少在PHP中)如何构建文档，然后如何将其插入Sphinx。

Essentially my config ends up looking like this:

我的配置最终是这样的:

source my_source {
     type = xmlpipe
     xmlpipe_command = /usr/bin/php /www/generateSphinXml.php identifierForMyTable
}

with my index then looking like this:

我的指数是这样的:

index my_index {
     source = my_source
     path = /usr/local/sphinx/var/data/my_index
     docinfo = extern
     min_word_len = 1
     mlock = 0
     morphology = stem_en
     charset_type = utf-8 //<----- This is q requirement however.
     enable_star = 1
     html_strip = 0
     min_prefix_len = 2
}

I've had excellent success with this; hopefully you can find this as useful.

我在这方面非常成功;希望你能发现这是有用的。

#8

If you are using PHP there is a great solution for fulltext search in No-SQL database MongoDB named as Mongo*. http://sourceforge.net/projects/mongo*/

如果您正在使用PHP，那么在无sql数据库MongoDB中有一个很棒的全文搜索解决方案，名为Mongo*。http://sourceforge.net/projects/mongo*/

Previously I was using Sphinx+MongoDB to perform fulltext search, the performance was great but result quality was very poor. With Mongo* my current search improved a lot.

之前我使用Sphinx+MongoDB进行全文搜索，性能很好，但是结果质量很差。有了Mongo*，我现在的搜索有了很大的改进。

Mongo* is also listed in MongoDB site.

Mongo*也被列出在MongoDB网站上。

Please let me know if you try it of your own.

如果你自己尝试的话，请告诉我。

#9

cLunce project. Also xapian not mentioned above. I use Sphinx and it's very good but somewhat clumsy to set up. I actually prefer piping data from Mongo into Sphinx via XMLPIPE2, instead of using Sphinx' SQL in sphinx.conf file.

cLunce项目。上面没有提到xapian。我使用狮身人面像，它很好，但是设置起来有点笨拙。实际上，我更喜欢将来自Mongo的数据通过XMLPIPE2传输到Sphinx，而不是在Sphinx中使用Sphinx的SQL。conf文件。

#10

Solr could be used with 10gen's Mongo Connector, which allows to push data there (among others)

Solr可以与10gen的Mongo连接器一起使用，该连接器允许在那里推送数据(包括其他)

https://github.com/10gen-labs/mongo-connector/tree/master/mongo-connector

From their example:

从他们的例子:

python mongo_connector.py -m localhost:27217 -t http://localhost:8080/solr

#11

Definitely Solr. It is NoSQL.

绝对Solr。这是NoSQL。

It has:

它有:

awesome performance
很棒的表演
awesome storage options
很棒的存储选项
stemmers
除梗器
highligting
highligting
faceting
小面
distributed search (SolrCloud)
分布式搜索(SolrCloud)
perfect API
完美的API
web admin
网络管理
HTML, PDF, DOC indexing
HTML、PDF文档索引
many other features
许多其他功能

#12

Couchbase 5.0 is releasing full text search capabilities built on the open source Bleve engine. You enable indexing for full text and start using against existing JSON documents in the database.

Couchbase 5.0正在发布基于开放源码Bleve引擎的全文搜索功能。启用全文本索引，并开始使用数据库中的现有JSON文档。

Some slides and presentation video covering the topic, mentioning Elasticsearch and Lucene as well... https://www.slideshare.net/Couchbase/fulltext-search-how-it-works-and-what-it-can-do

有些幻灯片和演示视频涉及到这个主题，还提到了Elasticsearch和Lucene……https://www.slideshare.net/Couchbase/fulltext-search-how-it-works-and-what-it-can-do

#1

#2