在LUCENE中使用带有slop的短语查询时遇到问题

时间:2020-12-02 03:06:16

I am facing some issues with phrase query, so write a small code to exactly know how phrase query actually works with slop stuff:

我正在面对短语查询的一些问题,所以写一个小代码来准确地知道短语查询实际上如何与slop东西一起工作:

I have a string "abc institute of technology" and I indexed different combination of this string(more like a shingle) like this

我有一个字符串“abc技术学院”,我索引这个字符串的不同组合(更像一个木瓦)像这样

Document doc = new Document();
ArrayList<String> sh = new ArrayList<String>(); 
     sh.add("abc institute engineering technology");
     sh.add("abc institute engineering");
     sh.add("abc institute");
     sh.add("abc");
     sh.add("institute engineering technology");
     sh.add("institute engineering");
     sh.add("institute");
     sh.add("engineering technology");
     sh.add("engineering");
     sh.add("technology");
  for(String s : sh){
        doc.add(new Field("insti_shingles", s.toLowerCase(), Field.Store.YES,  Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
  }
  writer.addDocument(doc);

Now when i read all the tokens from the index directory i have these set of tokens:

现在,当我从索引目录中读取所有令牌时,我有这些令牌集:

engineering technology
abc
institute
abc institute engineering technology
technology
abc institute
abc institute engineering
institute engineering technology
engineering
institute engineering

Now when i search for term "abc institute technology"

现在,当我搜索术语“abc研究所技术”

IndexSearcher searcher = new IndexSearcher(dir);
BooleanQuery booleanQuery = new BooleanQuery();
PhraseQuery query = new PhraseQuery();
query.add(new Term("insti_shingles", "abc institute technology"));
query.setSlop(4);
booleanQuery.add(query, BooleanClause.Occur.SHOULD);
TopDocs hits = searcher.search(booleanQuery, 30);

Now according to documentation of phrase query with slop, i should get some results but i am getting empty result set. But I get the result when i search for the term that is exactly as indexed token.

现在根据slop短语查询的文档,我应该得到一些结果,但我得到空结果集。但是当我搜索与索引标记完全一样的术语时,我得到了结果。

i think the term "abc institute technology" should get matched by token "abc institute engineering technology" when we use phrase query???

我认为当我们使用短语查询时,术语“abc研究所技术”应与令牌“abc研究所工程技术”相匹配???

Am i doing anything wrong? Help

我做错了吗?救命

1 个解决方案

#1


0  

You don't need a special tokenizer to use phrase queries with slop - indeed it will cause these queries to fail, as you have noticed.

你不需要一个特殊的标记器来使用slop的短语查询 - 事实上它会导致这些查询失败,正如你所注意到的那样。

Just tokenize using a StandardAnalyzer, no need to do that custom shingle stuff.

只需使用StandardAnalyzer进行标记,就不需要使用自定义木瓦。

#1


0  

You don't need a special tokenizer to use phrase queries with slop - indeed it will cause these queries to fail, as you have noticed.

你不需要一个特殊的标记器来使用slop的短语查询 - 事实上它会导致这些查询失败,正如你所注意到的那样。

Just tokenize using a StandardAnalyzer, no need to do that custom shingle stuff.

只需使用StandardAnalyzer进行标记,就不需要使用自定义木瓦。