I want to implement search functionality for a website (assume it is similar to SO). I don't want to use Google search of stuff like that.
我想为网站实现搜索功能(假设它类似于SO)。我不想使用谷歌搜索这样的东西。
My question is:
我的问题是:
How do I implement this?
我该如何实现?
There are two methods I am aware of:
我知道有两种方法:
- Search all the databases in the application when the user gives his query.
- Index all the data I have and store it somewhere else and query from there (like what Google does).
当用户提出查询时,搜索应用程序中的所有数据库。
索引我拥有的所有数据并将其存储在其他地方并从那里查询(就像Google所做的那样)。
Can anyone tell me which way to go? What are the pros and cons?
谁能告诉我哪条路走?优缺点都有什么?
Better, are there any better ways to do this?
更好,有没有更好的方法来做到这一点?
7 个解决方案
#1
34
Use lucene,
http://lucene.apache.org/java/docs/
使用lucene,http://lucene.apache.org/java/docs/
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
Apache Lucene是一个完全用Java编写的高性能,功能齐全的文本搜索引擎库。它是一种适用于几乎所有需要全文搜索的应用程序的技术,尤其是跨平台搜索。
It is available in java and .net. It is also in available in php in the form of a zend framework module.
它在java和.net中可用。它也是以zend框架模块的形式在php中提供的。
Lucene does what you wanted(indexing of the searched items), you have to keep track of a lucene index but it is much better than doing a database search in terms of performance. BTW, SO search is powered by lucene. :D
Lucene做你想要的(搜索项目的索引),你必须跟踪lucene索引,但它比在性能方面做数据库搜索要好得多。 BTW,SO搜索由lucene提供支持。 :d
#2
33
It depends on how comprehensive your web site is and how much you want to do yourself.
这取决于您的网站的综合程度以及您想要自己做多少。
If you are running a a small website without further possibilities to add a custom search, let google do the work (maybe add a sitemap) and use the google custom search.
如果您正在运行一个小型网站而没有其他可能性来添加自定义搜索,请让Google执行此项工作(可能添加站点地图)并使用Google自定义搜索。
If you run a medium site with an sql engine use the search features of your sql engine.
如果运行带有sql引擎的中型站点,请使用sql引擎的搜索功能。
If you run some heavier software stack like J2EE or .Net use Lucene, a great, powerful search engine or its .Net clone lucene.Net
如果运行像J2EE或.Net这样较重的软件堆栈,请使用Lucene,一个强大的搜索引擎或者它的.Net克隆lucene.Net
If you want to abstract your search from your application and be able to query it in a language neutral way with XML/HTTP and JSON APIs, have a look at solr. Solr runs lucene in the background, but adds a nice web interface to it.
如果要从应用程序中抽象出搜索,并能够使用XML / HTTP和JSON API以语言中立的方式查询,请查看solr。 Solr在后台运行lucene,但为它添加了一个不错的Web界面。
#3
#4
1
The best way to approach this will depend on how you construct your pages.
解决此问题的最佳方法取决于您构建页面的方式。
If they're frequently composed from a lot of different records (as I imagine stack overflow pages are), the indexing approach is likely to give better results unless you put a lot of work into effectively reconstructing the pages on the database side.
如果它们经常由许多不同的记录组成(正如我想象的那样堆栈溢出页面),索引方法可能会提供更好的结果,除非你在数据库端有效地重建页面需要做大量的工作。
The disadvantage you have with the indexing approach is the turn around time. There are workarounds (like the Google's sitemap stuff), but they're also complex to get right.
索引方法的缺点是转向时间。有一些解决方法(比如谷歌的站点地图的东西),但它们也很复杂。
If you go with database path, also be aware that modern search engine systems function much better if they have link data to process, so finding a system which can understand links between 'pages' in the database will have a positive effect.
如果你使用数据库路径,也要注意现代搜索引擎系统如果有要处理的链接数据则功能要好得多,因此找到一个可以理解数据库中“页面”之间链接的系统将产生积极的影响。
#5
1
If you are on Microsoft plattform you could use the Indexing service. This integrates very easliy with IIS websites.
如果您使用的是Microsoft平台,则可以使用索引服务。这非常容易与IIS网站集成。
It has all the basic features like full text search, ranking, exlcude and include certain files types and you can add your own meta information as well via meta tags in the html pages.
它具有全文搜索,排名,exlcude和包含某些文件类型等所有基本功能,您还可以通过html页面中的元标记添加自己的元信息。
Do a google and you'll find tons!
做一个谷歌,你会发现吨!
#6
0
This is somewhat orthogonal to your question, but I highly recommend the idea of a RESTful search. That is, to perform a search that has never been performed, the website POSTs a query to /searches/. To re-run a search, the website GETs /searches/{some id}
这与您的问题有些正交,但我强烈推荐RESTful搜索的想法。也就是说,为了执行从未执行过的搜索,网站将查询发布到/ searching /。要重新搜索,网站GETs / searching / {some id}
There are some good documents to be found regarding this, for example here.
有一些关于此的好文件,例如这里。
(That said, I like indexing where possible, though it is an optimization, and thus can be premature.)
(也就是说,我喜欢在可能的情况下进行索引,尽管这是一种优化,因此可能为时过早。)
#7
-1
If you application uses the Java EE stack and you are using Hibernate you can use the Compass Framework maintain a searchable index of your database. The Compass Framework uses Lucene under the hood.
如果您的应用程序使用Java EE堆栈而您正在使用Hibernate,则可以使用Compass Framework维护数据库的可搜索索引。指南针框架使用Lucene。
The only catch is that you cannot replicate your search index. So you need to use a clustered database to hold the index tables or use the newer grid based index storage mechanisms that have been added to the Compass Framework 2.x.
唯一的问题是您无法复制搜索索引。因此,您需要使用群集数据库来保存索引表或使用已添加到Compass Framework 2.x中的较新的基于网格的索引存储机制。
#1
34
Use lucene,
http://lucene.apache.org/java/docs/
使用lucene,http://lucene.apache.org/java/docs/
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
Apache Lucene是一个完全用Java编写的高性能,功能齐全的文本搜索引擎库。它是一种适用于几乎所有需要全文搜索的应用程序的技术,尤其是跨平台搜索。
It is available in java and .net. It is also in available in php in the form of a zend framework module.
它在java和.net中可用。它也是以zend框架模块的形式在php中提供的。
Lucene does what you wanted(indexing of the searched items), you have to keep track of a lucene index but it is much better than doing a database search in terms of performance. BTW, SO search is powered by lucene. :D
Lucene做你想要的(搜索项目的索引),你必须跟踪lucene索引,但它比在性能方面做数据库搜索要好得多。 BTW,SO搜索由lucene提供支持。 :d
#2
33
It depends on how comprehensive your web site is and how much you want to do yourself.
这取决于您的网站的综合程度以及您想要自己做多少。
If you are running a a small website without further possibilities to add a custom search, let google do the work (maybe add a sitemap) and use the google custom search.
如果您正在运行一个小型网站而没有其他可能性来添加自定义搜索,请让Google执行此项工作(可能添加站点地图)并使用Google自定义搜索。
If you run a medium site with an sql engine use the search features of your sql engine.
如果运行带有sql引擎的中型站点,请使用sql引擎的搜索功能。
If you run some heavier software stack like J2EE or .Net use Lucene, a great, powerful search engine or its .Net clone lucene.Net
如果运行像J2EE或.Net这样较重的软件堆栈,请使用Lucene,一个强大的搜索引擎或者它的.Net克隆lucene.Net
If you want to abstract your search from your application and be able to query it in a language neutral way with XML/HTTP and JSON APIs, have a look at solr. Solr runs lucene in the background, but adds a nice web interface to it.
如果要从应用程序中抽象出搜索,并能够使用XML / HTTP和JSON API以语言中立的方式查询,请查看solr。 Solr在后台运行lucene,但为它添加了一个不错的Web界面。
#3
4
You might want to have a look at xapian and the omega front end. It's essentially a toolkit on which you can build search functionality.
你可能想看看xapian和omega前端。它本质上是一个可以构建搜索功能的工具包。
#4
1
The best way to approach this will depend on how you construct your pages.
解决此问题的最佳方法取决于您构建页面的方式。
If they're frequently composed from a lot of different records (as I imagine stack overflow pages are), the indexing approach is likely to give better results unless you put a lot of work into effectively reconstructing the pages on the database side.
如果它们经常由许多不同的记录组成(正如我想象的那样堆栈溢出页面),索引方法可能会提供更好的结果,除非你在数据库端有效地重建页面需要做大量的工作。
The disadvantage you have with the indexing approach is the turn around time. There are workarounds (like the Google's sitemap stuff), but they're also complex to get right.
索引方法的缺点是转向时间。有一些解决方法(比如谷歌的站点地图的东西),但它们也很复杂。
If you go with database path, also be aware that modern search engine systems function much better if they have link data to process, so finding a system which can understand links between 'pages' in the database will have a positive effect.
如果你使用数据库路径,也要注意现代搜索引擎系统如果有要处理的链接数据则功能要好得多,因此找到一个可以理解数据库中“页面”之间链接的系统将产生积极的影响。
#5
1
If you are on Microsoft plattform you could use the Indexing service. This integrates very easliy with IIS websites.
如果您使用的是Microsoft平台,则可以使用索引服务。这非常容易与IIS网站集成。
It has all the basic features like full text search, ranking, exlcude and include certain files types and you can add your own meta information as well via meta tags in the html pages.
它具有全文搜索,排名,exlcude和包含某些文件类型等所有基本功能,您还可以通过html页面中的元标记添加自己的元信息。
Do a google and you'll find tons!
做一个谷歌,你会发现吨!
#6
0
This is somewhat orthogonal to your question, but I highly recommend the idea of a RESTful search. That is, to perform a search that has never been performed, the website POSTs a query to /searches/. To re-run a search, the website GETs /searches/{some id}
这与您的问题有些正交,但我强烈推荐RESTful搜索的想法。也就是说,为了执行从未执行过的搜索,网站将查询发布到/ searching /。要重新搜索,网站GETs / searching / {some id}
There are some good documents to be found regarding this, for example here.
有一些关于此的好文件,例如这里。
(That said, I like indexing where possible, though it is an optimization, and thus can be premature.)
(也就是说,我喜欢在可能的情况下进行索引,尽管这是一种优化,因此可能为时过早。)
#7
-1
If you application uses the Java EE stack and you are using Hibernate you can use the Compass Framework maintain a searchable index of your database. The Compass Framework uses Lucene under the hood.
如果您的应用程序使用Java EE堆栈而您正在使用Hibernate,则可以使用Compass Framework维护数据库的可搜索索引。指南针框架使用Lucene。
The only catch is that you cannot replicate your search index. So you need to use a clustered database to hold the index tables or use the newer grid based index storage mechanisms that have been added to the Compass Framework 2.x.
唯一的问题是您无法复制搜索索引。因此,您需要使用群集数据库来保存索引表或使用已添加到Compass Framework 2.x中的较新的基于网格的索引存储机制。