What are the main differences between search engines (DtSearch , Lucene.net, Sphinx, Google etc) that should influence the decision as to which to use to search proprietary data?
搜索引擎(DtSearch,Lucene.net,Sphinx,Google等)之间的主要区别是什么应该影响决定使用哪种搜索专有数据?
The data to be searched consists of presentation-free data that is marked up with metadata in the form of name/value pairs. We’re not interested in the format parsing abilities of the tools various. Also, the search results need to be well structured, presentation-free data that is amenable to aggregating with search results from other (similarly structured repositories.
要搜索的数据包括使用名称/值对形式的元数据标记的无表示数据。我们对各种工具的格式解析能力不感兴趣。此外,搜索结果需要结构良好,无需呈现的数据,可以与其他(类似结构化的存储库)的搜索结果进行聚合。
Some relevant search engine characteristics that need to inform the decision are listed below. Futther suggestions or description of experiences welcome.
下面列出了需要通知决策的一些相关搜索引擎特征。 Futther建议或经验描述欢迎。
• Cost • Ease of use • Can be configured to return specific tags only • Can ‘identify’ specific terms give search results higher weighting for these results • Fast < 0.3seconds to return search results or %E6 records/documents • Support tags with types (find weather=’sunny’ but not personality=sunny) • Support weightings to give relevancy ranking • Return results in ranked order by relevency • Supports Synonyms • Supports stemmings • Supports Stop words • Supports spelling corrections • Amenable to parallelisation or index building (if index based) • Fast to reindex (if index based) • Fast to update index (if index based) • Combine results from multiple indexes (if index based) • Proximity checks: give higher relevance to words found close together
•成本•易于使用•可以配置为仅返回特定标签•可以“识别”特定术语,为这些结果提供更高权重的搜索结果•快速<0.3秒返回搜索结果或%E6记录/文档•支持带有类型的标签(找天气='阳光'但不是个性=晴天)•支持权重以给出相关性排名•通过相关性按排名顺序返回结果•支持同义词•支持词干•支持停用词•支持拼写更正•适合并行化或索引构建(如果基于索引)•快速重新索引(如果基于索引)•快速更新索引(如果基于索引)•组合多个索引的结果(如果基于索引)•邻近检查:与靠近发现的单词给出更高的相关性
2 个解决方案
#1
2
I like Solr with the DataImportHandler. It supports most of your bullet points, and is not too difficult to set up, as long as you don't mind editing some XML configuration files. It's easier than many enterprise class search engines.
我喜欢Solr和DataImportHandler。它支持您的大部分要点,并且只要您不介意编辑某些XML配置文件,就不会太难设置。它比许多企业级搜索引擎更容易。
There is nothing wrong with GSA (Google Search Appliance), but for the amount of control that you desire, Solr is a better option.
GSA(Google Search Appliance)没有任何问题,但是对于您想要的控制量,Solr是更好的选择。
#2
1
In relation to relevancy, the Google Search Appliance allows a little tweaking. They believe that allowing too much tweaking will give poor relevancy, and I do believe that Google knows relevancy.
就相关性而言,Google Search Appliance可以进行一些调整。他们认为允许过多的调整会带来较差的相关性,我相信Google知道相关性。
It is unlikely that users will find a search engine other than Google easier to use.
用户不太可能会发现Google以外的搜索引擎更易于使用。
#1
2
I like Solr with the DataImportHandler. It supports most of your bullet points, and is not too difficult to set up, as long as you don't mind editing some XML configuration files. It's easier than many enterprise class search engines.
我喜欢Solr和DataImportHandler。它支持您的大部分要点,并且只要您不介意编辑某些XML配置文件,就不会太难设置。它比许多企业级搜索引擎更容易。
There is nothing wrong with GSA (Google Search Appliance), but for the amount of control that you desire, Solr is a better option.
GSA(Google Search Appliance)没有任何问题,但是对于您想要的控制量,Solr是更好的选择。
#2
1
In relation to relevancy, the Google Search Appliance allows a little tweaking. They believe that allowing too much tweaking will give poor relevancy, and I do believe that Google knows relevancy.
就相关性而言,Google Search Appliance可以进行一些调整。他们认为允许过多的调整会带来较差的相关性,我相信Google知道相关性。
It is unlikely that users will find a search engine other than Google easier to use.
用户不太可能会发现Google以外的搜索引擎更易于使用。