在Lucene中索引时保留首字母缩略词的点

If i want Lucene to preserve dots of acronyms(example: U.K,U.S.A. etc), which analyzer do i need to use and how? I also want to input a set of stop words to Lucene while doing this.

如果我想让Lucene保留首字母缩略词(例如:英国,美国等),我需要使用哪种分析仪?我还希望在执行此操作时向Lucene输入一组停用词。

2 个解决方案

#1

A WhiteSpaceAnalyzer will preserve the dots. A StopFilter removes a list of stop words. You should define exactly the analysis you need, and then combine analyzers and token filters to achieve it, or write your own analyzer.

WhiteSpaceAnalyzer将保留点。 StopFilter删除停用词列表。您应该准确定义所需的分析,然后组合分析器和令牌过滤器来实现它,或者编写自己的分析器。

#2

StandardTokenizer preserves the dots occurring between letters. You can use StandardAnalyzer which uses StandardTokenizer. Or you could create your own analyzer with StandardTokenizer.

StandardTokenizer保留字母之间出现的点。您可以使用StandardAnalyzer,它使用StandardTokenizer。或者您可以使用StandardTokenizer创建自己的分析器。

Correction: StandardAnalyzer will not help as it uses StandardFilter, which removes the dots from the acronym. You can construct your own analyzer with StandardTokenizer and additional filters (such as lower case filter) minus the StandardFilter.

更正:StandardAnalyzer没有帮助,因为它使用StandardFilter,它从首字母缩略词中删除点。您可以使用StandardTokenizer和其他过滤器(例如小写过滤器)减去StandardFilter来构建您自己的分析器。

#1

WhiteSpaceAnalyzer将保留点。 StopFilter删除停用词列表。您应该准确定义所需的分析,然后组合分析器和令牌过滤器来实现它,或者编写自己的分析器。

#2

StandardTokenizer preserves the dots occurring between letters. You can use StandardAnalyzer which uses StandardTokenizer. Or you could create your own analyzer with StandardTokenizer.

StandardTokenizer保留字母之间出现的点。您可以使用StandardAnalyzer,它使用StandardTokenizer。或者您可以使用StandardTokenizer创建自己的分析器。

秒客网

在Lucene中索引时保留首字母缩略词的点

2 个解决方案

#1

#2

#1

#2

相关文章