如何用Lucene最好地搜索DB ?

时间:2022-09-03 20:22:33

I am looking into mechanisms for better search capabilities against our database. It is currently a huge bottleneck (causing long-lasting queries that are hurting our database performance).

我正在研究针对我们的数据库提供更好的搜索功能的机制。它目前是一个巨大的瓶颈(导致长期的查询,损害了我们的数据库性能)。

My boss wanted me to look into Solr, but on closer inspection, it seems we actually want some kind of DB integration mechanism with Lucene itself.

我的老板想让我研究Solr,但仔细观察,我们似乎想要某种DB集成机制与Lucene本身。

From the Lucene FAQ, they recommend Hibernate Search, Compass, and DBSight .

在Lucene FAQ中,他们推荐Hibernate Search、Compass和DBSight。

As a background of our current technology stack, we are using straight JSPs on Tomcat, no Hibernate, no other frameworks on top of it... just straight Java, JSP, and JDBC against a DB2 database.

作为当前技术堆栈的背景,我们在Tomcat上使用直接的jsp,没有Hibernate,没有其他框架……直接针对DB2数据库使用Java、JSP和JDBC。

Given that, it seems Hibernate Search might be a bit more difficult to integrate into our system, though it might be nice to have the option of using Hibernate after such an integration.

考虑到这一点,似乎Hibernate搜索很难集成到我们的系统中,尽管在这样的集成之后可以选择使用Hibernate也不错。

Does anyone have any experiences they can share with using one of these tools (or other similar Lucene based solutions) that might help in picking the right tool?

是否有人有使用这些工具(或其他类似的基于Lucene的解决方案)的经验可以帮助选择正确的工具?

It needs to be a FOSS solution, and ideally will manage updating Lucene with changes from the database automagicly (though efficiently), without extra effort to notify the tool when changes have been made (otherwise, it seems rolling my own Lucene solution would be just as good). Also, we have multiple application servers with just 1 database (+failover), so it would be nice if it is easy to use the solution from all application servers seamlessly.

它需要是一个FOSS解决方案,并且理想的情况下将自动地管理从数据库中进行更改的Lucene更新(尽管非常有效),而无需在更改完成时通知工具(否则,我自己的Lucene解决方案似乎也一样好)。另外,我们有多个应用服务器,只有一个数据库(+故障转移),所以如果能够无缝地使用所有应用程序服务器的解决方案,那就更好了。

I am continuing to inspect the options now, but it would be really helpful to utilize other people's experiences.

我正在继续检查这些选项,但是利用其他人的经验将会非常有帮助。

5 个解决方案

#1


3  

When you say "search against a DB", what do you mean?

当你说“针对DB搜索”时,你是什么意思?

Relational databases and information retrieval systems use very different approaches for good reason. What kind of data are you searching? What kind of queries do you perform?

关系数据库和信息检索系统使用非常不同的方法是有原因的。你在搜索什么类型的数据?您执行什么样的查询?

If I were going to implement an inverted index on top of a database, as Compass does, I would not use their approach, which is to implement Lucene's Directory abstraction with BLOBs. Rather, I'd implement Lucene's IndexReader abstraction.

如果我要在数据库上实现反向索引,就像Compass一样,我不会使用它们的方法,也就是用blob实现Lucene的目录抽象。相反,我将实现Lucene的IndexReader抽象。

Relational databases are quite capable of maintaining indexes. The value that Lucene brings in this context is its analysis capabilities, which are most useful for unstructured text records. A good approach would leverage the strengths of each tool.

关系数据库能够维护索引。Lucene在这个上下文中带来的价值是它的分析能力,这对于非结构化文本记录是最有用的。一个好的方法可以利用每个工具的优点。

As updates are made to the index, Lucene creates more segments (additional files or BLOBs), which degrade performance until a costly "optimize" procedure is used. Most databases will amortize this cost over each index update, giving you more stable performance.

当对索引进行更新时,Lucene会创建更多的段(额外的文件或BLOBs),这会降低性能,直到使用代价高昂的“优化”过程。大多数数据库将在每次索引更新时摊销这一成本,从而使性能更加稳定。

#2


2  

I have had good experiences with Compass. It has really good integration with hibernate and can mirror data changes made through hibernate and jdbc directly to the Lucene indexes though its GPS devices http://www.compass-project.org/docs/1.2.2/reference/html/gps-jdbc.html.

我对指南针有很好的经验。它与hibernate进行了很好的集成,并且可以通过其GPS设备http://www.compass-project.org/docs/1.2.2/ reference/gps-jdbc.html将通过hibernate和jdbc进行的数据更改直接镜像到Lucene索引。

Maintaining the Lucene indexes on all your application servers may be an issue. If you have multiple App servers updating the db, then you may hit some issues with keeping the index in sync with all the changes. Compass may have an alternate mechanism for handling this now.

在所有应用服务器上维护Lucene索引可能是一个问题。如果有多个应用程序服务器更新db,那么可能会遇到一些问题,即索引与所有更改保持同步。罗盘现在可能有另一种机制来处理这个问题。

The Alfresco Project (CMS) also uses Lucene and have a mechanism for replicating Lucene index changes between servers that may be useful in handling these issues.

Alfresco项目(CMS)也使用Lucene,并具有在服务器之间复制Lucene索引更改的机制,这对于处理这些问题可能很有用。

We started using Compass before Hibernate Search was really off the ground so I cannot offer any comparison with it.

我们在冬眠前就开始使用指南针了,所以我无法和它进行比较。

#3


1  

LuSql http://code.google.com/p/lusql/ allows you to load the contents of a JDBC-accessible database into Lucene, making it searchable. It is highly optimized and multi-threaded. I am the author of LuSql and will be coming out with a new version (re-architected with a new plugable architecture) in the next month.

LuSql http://code.google.com/p/lusql/允许您将jdbc可访问数据库的内容加载到Lucene中,使其可搜索。它是高度优化和多线程的。我是LuSql的作者,下个月将推出一个新版本(重新架构,并有一个新的可插件架构)。

#4


0  

For a pure performance boost with searching Lucene will certainly help out a lot. Only index what you care about/need and you should be good. You could use Hibernate or some other piece if you like but I don't think it is required.

对于纯粹的性能提升,搜索Lucene肯定会有很大的帮助。只索引你所关心的/需要的,你应该是好的。如果你喜欢,你可以使用Hibernate或者其他的部分,但是我不认为它是必需的。

#5


0  

Well, it seems DBSight doesn't meet the FOSS requirement, so unless it is an absolutely stellar solution, it is not an option for me right now...

看来DBSight并没有满足FOSS的要求,所以除非它是一个绝对优秀的解决方案,否则它现在不是我的选择……

#1


3  

When you say "search against a DB", what do you mean?

当你说“针对DB搜索”时,你是什么意思?

Relational databases and information retrieval systems use very different approaches for good reason. What kind of data are you searching? What kind of queries do you perform?

关系数据库和信息检索系统使用非常不同的方法是有原因的。你在搜索什么类型的数据?您执行什么样的查询?

If I were going to implement an inverted index on top of a database, as Compass does, I would not use their approach, which is to implement Lucene's Directory abstraction with BLOBs. Rather, I'd implement Lucene's IndexReader abstraction.

如果我要在数据库上实现反向索引,就像Compass一样,我不会使用它们的方法,也就是用blob实现Lucene的目录抽象。相反,我将实现Lucene的IndexReader抽象。

Relational databases are quite capable of maintaining indexes. The value that Lucene brings in this context is its analysis capabilities, which are most useful for unstructured text records. A good approach would leverage the strengths of each tool.

关系数据库能够维护索引。Lucene在这个上下文中带来的价值是它的分析能力,这对于非结构化文本记录是最有用的。一个好的方法可以利用每个工具的优点。

As updates are made to the index, Lucene creates more segments (additional files or BLOBs), which degrade performance until a costly "optimize" procedure is used. Most databases will amortize this cost over each index update, giving you more stable performance.

当对索引进行更新时,Lucene会创建更多的段(额外的文件或BLOBs),这会降低性能,直到使用代价高昂的“优化”过程。大多数数据库将在每次索引更新时摊销这一成本,从而使性能更加稳定。

#2


2  

I have had good experiences with Compass. It has really good integration with hibernate and can mirror data changes made through hibernate and jdbc directly to the Lucene indexes though its GPS devices http://www.compass-project.org/docs/1.2.2/reference/html/gps-jdbc.html.

我对指南针有很好的经验。它与hibernate进行了很好的集成,并且可以通过其GPS设备http://www.compass-project.org/docs/1.2.2/ reference/gps-jdbc.html将通过hibernate和jdbc进行的数据更改直接镜像到Lucene索引。

Maintaining the Lucene indexes on all your application servers may be an issue. If you have multiple App servers updating the db, then you may hit some issues with keeping the index in sync with all the changes. Compass may have an alternate mechanism for handling this now.

在所有应用服务器上维护Lucene索引可能是一个问题。如果有多个应用程序服务器更新db,那么可能会遇到一些问题,即索引与所有更改保持同步。罗盘现在可能有另一种机制来处理这个问题。

The Alfresco Project (CMS) also uses Lucene and have a mechanism for replicating Lucene index changes between servers that may be useful in handling these issues.

Alfresco项目(CMS)也使用Lucene,并具有在服务器之间复制Lucene索引更改的机制,这对于处理这些问题可能很有用。

We started using Compass before Hibernate Search was really off the ground so I cannot offer any comparison with it.

我们在冬眠前就开始使用指南针了,所以我无法和它进行比较。

#3


1  

LuSql http://code.google.com/p/lusql/ allows you to load the contents of a JDBC-accessible database into Lucene, making it searchable. It is highly optimized and multi-threaded. I am the author of LuSql and will be coming out with a new version (re-architected with a new plugable architecture) in the next month.

LuSql http://code.google.com/p/lusql/允许您将jdbc可访问数据库的内容加载到Lucene中,使其可搜索。它是高度优化和多线程的。我是LuSql的作者,下个月将推出一个新版本(重新架构,并有一个新的可插件架构)。

#4


0  

For a pure performance boost with searching Lucene will certainly help out a lot. Only index what you care about/need and you should be good. You could use Hibernate or some other piece if you like but I don't think it is required.

对于纯粹的性能提升,搜索Lucene肯定会有很大的帮助。只索引你所关心的/需要的,你应该是好的。如果你喜欢,你可以使用Hibernate或者其他的部分,但是我不认为它是必需的。

#5


0  

Well, it seems DBSight doesn't meet the FOSS requirement, so unless it is an absolutely stellar solution, it is not an option for me right now...

看来DBSight并没有满足FOSS的要求,所以除非它是一个绝对优秀的解决方案,否则它现在不是我的选择……