Lucene spannear查询,包含java中的复合字

时间:2021-04-17 03:06:39

I'm having a problem with the SpanNearQuery in Lucene 4.3. I'm trying to do a query like this:

我对Lucene 4.3中的spannear查询有问题。我试着做这样的查询:

SpanTermQuery fleeceQ = new SpanTermQuery(new Term("content", "golden fleece"));
SpanTermQuery blackQ = new SpanTermQuery(new Term("content", "black"));
SpanQuery[] clauses = {fleeceQ, blackQ};
SpanNearQuery nearQ = new SpanNearQuery(clauses, 10, false);

In the field "content" of my document I have: "History looks fondly upon the black story of the golden fleece, but most people don't agree"

在我的文件的“内容”中,我有:“历史对金羊毛的黑色故事很感兴趣,但大多数人不同意。”

Well, what happens is that the query returns me nothing. But if I change "golden fleece" to "fleece" it works, so I guess the problem is with the composite words.

实际上,查询没有返回任何东西。但是如果我把“金色羊毛”改成“羊毛”,我猜问题就出在复合词上。

I'm using the SpanNearQuery because I have to do a proximity search and I need to know how many times it occurs.

我使用的是spannear查询因为我需要进行近距离搜索我需要知道它出现了多少次。

Anyone know how to fix this?

有人知道怎么修复吗?

1 个解决方案

#1


0  

The problem is that "golden fleece" is Not a term. It's two terms, golden and fleece. When you construct the term yourself though, with:

问题是“金羊毛”不是一个术语。有两个术语,金色和羊毛。当你自己构造这个词的时候,

new Term("content", "golden fleece")

It will take your word for it, and make it a single term. There are no matches, because the single term golden fleece doesn't exist in your index.

它会相信你的话,并使它成为一个单独的术语。没有匹配项,因为单个术语金色羊毛在索引中不存在。

There isn't a clear way to incorporate a PhraseQuery into a SpanNearQuery, so I think it might make sense to create another, nested, SpanNearQuery to create the behavior you are looking for:

没有一种清晰的方法可以将一个短语应用到一个spannear查询中,所以我认为创建另一个嵌套的、spannear查询来创建您想要的行为是有意义的:

SpanTermQuery goldenQ = new SpanTermQuery(new Term("content", "golden"));
SpanTermQuery fleeceQ = new SpanTermQuery(new Term("content", "fleece"));
SpanTermQuery blackQ = new SpanTermQuery(new Term("content", "black"));

SpanQuery[] subclauses = {goldenQ, fleeceQ};
SpanNearQuery goldfleeceQ = new SpanNearQuery(subclauses, 0, true); //No slop, in order!

SpanQuery[] mainclauses = {goldfleeceQ, blackQ};
SpanNearQuery finalQ = new SpanNearQuery(mainclauses, 10, false); //As before, 10 slop, any order

#1


0  

The problem is that "golden fleece" is Not a term. It's two terms, golden and fleece. When you construct the term yourself though, with:

问题是“金羊毛”不是一个术语。有两个术语,金色和羊毛。当你自己构造这个词的时候,

new Term("content", "golden fleece")

It will take your word for it, and make it a single term. There are no matches, because the single term golden fleece doesn't exist in your index.

它会相信你的话,并使它成为一个单独的术语。没有匹配项,因为单个术语金色羊毛在索引中不存在。

There isn't a clear way to incorporate a PhraseQuery into a SpanNearQuery, so I think it might make sense to create another, nested, SpanNearQuery to create the behavior you are looking for:

没有一种清晰的方法可以将一个短语应用到一个spannear查询中,所以我认为创建另一个嵌套的、spannear查询来创建您想要的行为是有意义的:

SpanTermQuery goldenQ = new SpanTermQuery(new Term("content", "golden"));
SpanTermQuery fleeceQ = new SpanTermQuery(new Term("content", "fleece"));
SpanTermQuery blackQ = new SpanTermQuery(new Term("content", "black"));

SpanQuery[] subclauses = {goldenQ, fleeceQ};
SpanNearQuery goldfleeceQ = new SpanNearQuery(subclauses, 0, true); //No slop, in order!

SpanQuery[] mainclauses = {goldfleeceQ, blackQ};
SpanNearQuery finalQ = new SpanNearQuery(mainclauses, 10, false); //As before, 10 slop, any order