I am facing some issues with phrase query, so write a small code to exactly know how phrase query actually works with slop stuff:
我正在面对短语查询的一些问题,所以写一个小代码来准确地知道短语查询实际上如何与slop东西一起工作:
I have a string "abc institute of technology" and I indexed different combination of this string(more like a shingle) like this
我有一个字符串“abc技术学院”,我索引这个字符串的不同组合(更像一个木瓦)像这样
Document doc = new Document();
ArrayList<String> sh = new ArrayList<String>();
sh.add("abc institute engineering technology");
sh.add("abc institute engineering");
sh.add("abc institute");
sh.add("abc");
sh.add("institute engineering technology");
sh.add("institute engineering");
sh.add("institute");
sh.add("engineering technology");
sh.add("engineering");
sh.add("technology");
for(String s : sh){
doc.add(new Field("insti_shingles", s.toLowerCase(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
}
writer.addDocument(doc);
Now when i read all the tokens from the index directory i have these set of tokens:
现在,当我从索引目录中读取所有令牌时,我有这些令牌集:
engineering technology
abc
institute
abc institute engineering technology
technology
abc institute
abc institute engineering
institute engineering technology
engineering
institute engineering
Now when i search for term "abc institute technology"
现在,当我搜索术语“abc研究所技术”
IndexSearcher searcher = new IndexSearcher(dir);
BooleanQuery booleanQuery = new BooleanQuery();
PhraseQuery query = new PhraseQuery();
query.add(new Term("insti_shingles", "abc institute technology"));
query.setSlop(4);
booleanQuery.add(query, BooleanClause.Occur.SHOULD);
TopDocs hits = searcher.search(booleanQuery, 30);
Now according to documentation of phrase query with slop, i should get some results but i am getting empty result set. But I get the result when i search for the term that is exactly as indexed token.
现在根据slop短语查询的文档,我应该得到一些结果,但我得到空结果集。但是当我搜索与索引标记完全一样的术语时,我得到了结果。
i think the term "abc institute technology" should get matched by token "abc institute engineering technology" when we use phrase query???
我认为当我们使用短语查询时,术语“abc研究所技术”应与令牌“abc研究所工程技术”相匹配???
Am i doing anything wrong? Help
我做错了吗?救命
1 个解决方案
#1
0
You don't need a special tokenizer to use phrase queries with slop - indeed it will cause these queries to fail, as you have noticed.
你不需要一个特殊的标记器来使用slop的短语查询 - 事实上它会导致这些查询失败,正如你所注意到的那样。
Just tokenize using a StandardAnalyzer
, no need to do that custom shingle stuff.
只需使用StandardAnalyzer进行标记,就不需要使用自定义木瓦。
#1
0
You don't need a special tokenizer to use phrase queries with slop - indeed it will cause these queries to fail, as you have noticed.
你不需要一个特殊的标记器来使用slop的短语查询 - 事实上它会导致这些查询失败,正如你所注意到的那样。
Just tokenize using a StandardAnalyzer
, no need to do that custom shingle stuff.
只需使用StandardAnalyzer进行标记,就不需要使用自定义木瓦。