如何使用Lucene.NET索引和查找数字?

时间:2022-10-10 03:05:43

I've implemented full text search for a web site using Lucene.NET (Version 2.0). Indexing and searching works well, but I have one problem. If I look for numbers (phone numbers, product numbers etc.) as search terms, I don't get any resulting documents.

我使用Lucene.NET(2.0版)实现了对网站的全文搜索。索引和搜索效果很好,但我有一个问题。如果我查找数字(电话号码,产品编号等)作为搜索字词,我不会得到任何结果文件。

I'm using the Lucene.Net.Analysis.SimpleAnalyzer Class. I guess I have to change Analyzer and/or Tokenizer.

我正在使用Lucene.Net.Analysis.SimpleAnalyzer类。我想我必须改变Analyzer和/或Tokenizer。

Any advice?

Thank you!

1 个解决方案

#1


13  

When you build up a Lucene Document, you get to select different indexing options for each field. For fields you don't want tokenized, you need to select the Field.Index.UN_TOKENIZED option. This will keep your phone numbers and product numbers in tact.

构建Lucene文档时,可以为每个字段选择不同的索引选项。对于不希望标记化的字段,您需要选择Field.Index.UN_TOKENIZED选项。这将保持您的电话号码和产品编号。

I would also advise using the StandardAnalyzer, as its doesn't strip numbers out like SimpleAnalyzer.

我还建议使用StandardAnalyzer,因为它不像SimpleAnalyzer那样删除数字。

It is also important you use the same analyzer for both indexing and searching, to get consistent results.

使用相同的分析器进行索引和搜索也很重要,以获得一致的结果。

#1


13  

When you build up a Lucene Document, you get to select different indexing options for each field. For fields you don't want tokenized, you need to select the Field.Index.UN_TOKENIZED option. This will keep your phone numbers and product numbers in tact.

构建Lucene文档时,可以为每个字段选择不同的索引选项。对于不希望标记化的字段,您需要选择Field.Index.UN_TOKENIZED选项。这将保持您的电话号码和产品编号。

I would also advise using the StandardAnalyzer, as its doesn't strip numbers out like SimpleAnalyzer.

我还建议使用StandardAnalyzer,因为它不像SimpleAnalyzer那样删除数字。

It is also important you use the same analyzer for both indexing and searching, to get consistent results.

使用相同的分析器进行索引和搜索也很重要,以获得一致的结果。