如何索引Lucene中的String?

时间:2021-04-17 03:06:33

I'm using Lucene to index strings which I read from document. I'm not using reader class, since I need to index string to different fields.

我正在使用Lucene来索引我从文档中读取的字符串。我没有使用reader类,因为我需要将字符串索引到不同的字段。

document.add(new Field("FIELD1","string1", Field.Store.YES, Field.Index.UNTOKENIZED));
document.add(new Field("FIELD2","string2", Field.Store.YES, Field.Index.UNTOKENIZED));

This works in building the index but searching

这适用于构建索引但搜索

QueryParser queryParser = new QueryParser("FIELD1", new StandardAnalyzer());
Query query = queryParser.parse(searchString);
Hits hits = indexSearcher.search(query);
System.out.println("Number of hits: " + hits.length());

doesn't returns any result.

不会返回任何结果。

But when I index a sentence like,

但当我索引一个句子时,

document.add(new Field("FIELD1","This is sentence to be indexed", Field.Store.YES, Field.Index.TOKENIZED));

searching works fine.

搜索工作正常。

Thanks.

谢谢。

1 个解决方案

#1


1  

You need to set the parameter for the fields with the words also to Field.Index.TOKENIZED because searching is only possible when you tokenize. The word "string1" will be indexed as "string1". Without tokenization it won't be indexed at all.

您需要将字段的参数设置为Field.Index.TOKENIZED,因为只有在进行标记时才能进行搜索。单词“string1”将被索引为“string1”。如果没有标记化,它将不会被编入索引。

Use this:

用这个:

document.add(new Field("FIELD1","string1", Field.Store.YES, Field.Index.TOKENIZED));
document.add(new Field("FIELD2","string2", Field.Store.YES, Field.Index.TOKENIZED));

When you want to index a string containing multiple words, e.g. "two words" as one searchable element without tokenizing into 2 words, you either need to use the KeywordAnalyzer during indexing which takes the whole string as a token or you can use the StringField object in newer versions of Lucene.

如果要索引包含多个单词的字符串,例如“两个单词”作为一个可搜索的元素而没有标记为2个单词,您需要在索引期间使用KeywordAnalyzer,它将整个字符串作为标记,或者您可以在较新版本的Lucene中使用StringField对象。

#1


1  

You need to set the parameter for the fields with the words also to Field.Index.TOKENIZED because searching is only possible when you tokenize. The word "string1" will be indexed as "string1". Without tokenization it won't be indexed at all.

您需要将字段的参数设置为Field.Index.TOKENIZED,因为只有在进行标记时才能进行搜索。单词“string1”将被索引为“string1”。如果没有标记化,它将不会被编入索引。

Use this:

用这个:

document.add(new Field("FIELD1","string1", Field.Store.YES, Field.Index.TOKENIZED));
document.add(new Field("FIELD2","string2", Field.Store.YES, Field.Index.TOKENIZED));

When you want to index a string containing multiple words, e.g. "two words" as one searchable element without tokenizing into 2 words, you either need to use the KeywordAnalyzer during indexing which takes the whole string as a token or you can use the StringField object in newer versions of Lucene.

如果要索引包含多个单词的字符串,例如“两个单词”作为一个可搜索的元素而没有标记为2个单词,您需要在索引期间使用KeywordAnalyzer,它将整个字符串作为标记,或者您可以在较新版本的Lucene中使用StringField对象。