.NET Lucene:为什么不会在BooleanQuery中使用MultiFieldQueryParser和WhitespaceAnalyzer查询产生正确的结果,如何解决?

时间:2021-06-18 03:10:05

So, I have a short query that I want to construct. I'm using a boolean query to specify that the "type" field of the Document matched from the index must be "Idea", and then I have a search string given by a user that may be one or more words. I want to be able to restrict the results programatically for the client to only contain docs in the index that have the Field "type" equal to "index", but I also want their search term to be able to match any word in the search phrase with a word in the result. I think my code below explains what I want exactly.

所以,我有一个我想要构建的简短查询。我正在使用布尔查询来指定从索引匹配的Document的“type”字段必须是“Idea”,然后我有一个用户给出的搜索字符串,可能是一个或多个单词。我希望能够以编程方式限制客户端的结果只包含索引中具有“type”等于“index”的文档,但我也希望他们的搜索词能够匹配搜索中的任何单词在结果中带有单词的短语。我认为下面的代码解释了我想要的内容。

WhitespaceAnalyzer analyzer = new WhitespaceAnalyzer();

MultiFieldQueryParser parser = new MultiFieldQueryParser(
    Version.LUCENE_30, new string[] { "company", "description", 
    "name", "posterName"},
    analyzer);

parser.AllowLeadingWildcard = true;

Lucene.Net.Search.Query query = parser.Parse(searchParam); 

BooleanQuery bq = new BooleanQuery(); 

TermQuery tQuery = new TermQuery(new Lucene.Net.Index.Term("type", "Idea"));

bq.Add(tQuery, Lucene.Net.Search.Occur.MUST);

bq.Add(query, Lucene.Net.Search.Occur.MUST);

The way that I am indexing data is described in a short amount of the pertinent code below:

我在索引数据的方式在下面的相关代码的简短描述中描述:

Document doc = new Document();
doc.Add(new Field("type",
    "Idea",
    Field.Store.YES,
    Field.Index.ANALYZED));
doc.Add(new Field("company",
    (_idea.Company==null ?
      "Company Not Set for Idea" 
      : _idea.Company.Name),
    Field.Store.YES,
    Field.Index.ANALYZED));
doc.Add(new Field("description",
    _idea.Description,
    Field.Store.YES,
    Field.Index.ANALYZED));
doc.Add(new Field("name",
    _idea.Name,
    Field.Store.YES,Field.Index.ANALYZED));
if (_idea.Poster != null)
{
    doc.Add(new Field("posterName",
      _idea.Poster.FirstName + " " + _idea.Poster.LastName,
      Field.Store.YES, Field.Index.ANALYZED));
}
doc.Add(new Field("ID",
    _idea.ID.ToString(), Field.Store.YES,
    Field.Index.NOT_ANALYZED));
iWriter.AddDocument(doc);

What I don't understand, is that when I search for a given word that I KNOW exists in the index, it returns no results. Its only if I search with a wildcard like "*" or something that I get any results. What I would think is, if the code does exactly what it says it does for the documentation on a MultiFieldQueryParser, it would return matches if any piece of any field in the parameters of company, description, name ect were to be found in a doc. But it doesn't. For example, in one of the docs, I know I have a name field of "Another Idea". When I search for "Another"/"another"/"Idea"/ ect it should return that particular doc. But it doesn't... it does, however, correctly filter the results by the type.

我不明白的是,当我在索引中搜索我知道的给定单词时,它不会返回任何结果。只有当我使用像“*”这样的通配符搜索或者我得到任何结果的东西时。我想的是,如果代码完全按照它对MultiFieldQueryParser上的文档所做的那样,如果在doc中找到公司,描述,名称等参数中的任何字段,它将返回匹配。但事实并非如此。例如,在其中一个文档中,我知道我有一个“另一个想法”的名称字段。当我搜索“另一个”/“另一个”/“想法”/等时,它应该返回该特定文档。但它没有...但它确实按类型正确过滤结果。

What do I need to do to get this short code snippet to return matches that I want?

我需要做些什么才能让这个简短的代码片段返回我想要的匹配?

2 个解决方案

#1


1  

I figured out how to solve this question, and it turns out to be a no brainer (depending on how much you know about lucene and using Visual Studio asp projects, which I'm not that familiar with). This is my first.

我想出了如何解决这个问题,结果证明这是一个没有道理的(取决于你对lucene的了解程度以及使用我不熟悉的Visual Studio asp项目)。这是我的第一次。

Turns out that you can use the BooleanQuery object to add different queries together, and specify how you want them to operate together. Then you can pass the final sum of all queries to the searcher.

事实证明,您可以使用BooleanQuery对象将不同的查询一起添加,并指定它们如何一起操作。然后,您可以将所有查询的最终总和传递给搜索者。

Turns out, I just wasn't splitting the objects and creating queries off of them: I have attached the sample solution that works for me below:

事实证明,我只是没有拆分对象并从中创建查询:我已经附加了适用于我的示例解决方案:

    StandardAnalyzer analyzer =
        new StandardAnalyzer(Version.LUCENE_30);
    MultiFieldQueryParser mfqp = new MultiFieldQueryParser(
         Version.LUCENE_30, new string[] {"company", "description", 
         "name", "posterName"},
         analyzer);
    mfqp.DefaultOperator = MultiFieldQueryParser.OR_OPERATOR;
         mfqp.AllowLeadingWildcard = true;
         BooleanQuery innerExpr = new BooleanQuery();
         foreach (string s in searchParam.Split(new char[] {' '})) {
             innerExpr.Add(mfqp.Parse(s), Occur.SHOULD);
         }
   innerExpr.Add(new WildcardQuery(new Term("company", searchParam)), Occur.SHOULD);
   innerExpr.Add(new WildcardQuery(new Term("description", searchParam)), Occur.SHOULD);
   innerExpr.Add(new WildcardQuery(new Term("name", searchParam)), Occur.SHOULD);
   innerExpr.Add(new WildcardQuery(new Term("posterName", searchParam)), Occur.SHOULD);

   TermQuery tQuery = new TermQuery(new Term("type", "Idea"));

   //bq.Add(mfqp.Parse(searchParam), Lucene.Net.Search.Occur.MUST);
   TopDocs hits = sharedIndex.Search(innerExpr,
       new QueryWrapperFilter(tQuery), 1000, 
       new Sort(SortField.FIELD_DOC));

This entire route wasn't clear to me when I started on this.

当我开始这个时,整个路线对我来说都不清楚。

#2


0  

One improvement you can make to that solution, in order to accommodate future changes to your index, would be to create a string array variable to hold your field names, e.g.:

为了适应索引的未来变化,您可以对该解决方案进行的一项改进是创建一个字符串数组变量来保存您的字段名称,例如:

string[] allFields = new string[] {"company", "description", 
     "name", "posterName"};

which in turn will give you a value to put into your parser:

这反过来会给你一个值放入你的解析器:

MultiFieldQueryParser mfqp = new MultiFieldQueryParser(
     Version.LUCENE_30, allFields, analyzer);

and the ability to iterate through the fields and have a single line to add your wildcard queries:

以及遍历字段并有一行来添加通配符查询的能力:

foreach (string searchField in allFields) {
    innerExpr.Add(new WildcardQuery(new Term(searchField, searchParam)), Occur.SHOULD);
}

Then, in the future, you need only add/change/remove field names to the array, and not have to manage your list of queries.

然后,将来,您只需要向阵列添加/更改/删除字段名称,而不必管理您的查询列表。

#1


1  

I figured out how to solve this question, and it turns out to be a no brainer (depending on how much you know about lucene and using Visual Studio asp projects, which I'm not that familiar with). This is my first.

我想出了如何解决这个问题,结果证明这是一个没有道理的(取决于你对lucene的了解程度以及使用我不熟悉的Visual Studio asp项目)。这是我的第一次。

Turns out that you can use the BooleanQuery object to add different queries together, and specify how you want them to operate together. Then you can pass the final sum of all queries to the searcher.

事实证明,您可以使用BooleanQuery对象将不同的查询一起添加,并指定它们如何一起操作。然后,您可以将所有查询的最终总和传递给搜索者。

Turns out, I just wasn't splitting the objects and creating queries off of them: I have attached the sample solution that works for me below:

事实证明,我只是没有拆分对象并从中创建查询:我已经附加了适用于我的示例解决方案:

    StandardAnalyzer analyzer =
        new StandardAnalyzer(Version.LUCENE_30);
    MultiFieldQueryParser mfqp = new MultiFieldQueryParser(
         Version.LUCENE_30, new string[] {"company", "description", 
         "name", "posterName"},
         analyzer);
    mfqp.DefaultOperator = MultiFieldQueryParser.OR_OPERATOR;
         mfqp.AllowLeadingWildcard = true;
         BooleanQuery innerExpr = new BooleanQuery();
         foreach (string s in searchParam.Split(new char[] {' '})) {
             innerExpr.Add(mfqp.Parse(s), Occur.SHOULD);
         }
   innerExpr.Add(new WildcardQuery(new Term("company", searchParam)), Occur.SHOULD);
   innerExpr.Add(new WildcardQuery(new Term("description", searchParam)), Occur.SHOULD);
   innerExpr.Add(new WildcardQuery(new Term("name", searchParam)), Occur.SHOULD);
   innerExpr.Add(new WildcardQuery(new Term("posterName", searchParam)), Occur.SHOULD);

   TermQuery tQuery = new TermQuery(new Term("type", "Idea"));

   //bq.Add(mfqp.Parse(searchParam), Lucene.Net.Search.Occur.MUST);
   TopDocs hits = sharedIndex.Search(innerExpr,
       new QueryWrapperFilter(tQuery), 1000, 
       new Sort(SortField.FIELD_DOC));

This entire route wasn't clear to me when I started on this.

当我开始这个时,整个路线对我来说都不清楚。

#2


0  

One improvement you can make to that solution, in order to accommodate future changes to your index, would be to create a string array variable to hold your field names, e.g.:

为了适应索引的未来变化,您可以对该解决方案进行的一项改进是创建一个字符串数组变量来保存您的字段名称,例如:

string[] allFields = new string[] {"company", "description", 
     "name", "posterName"};

which in turn will give you a value to put into your parser:

这反过来会给你一个值放入你的解析器:

MultiFieldQueryParser mfqp = new MultiFieldQueryParser(
     Version.LUCENE_30, allFields, analyzer);

and the ability to iterate through the fields and have a single line to add your wildcard queries:

以及遍历字段并有一行来添加通配符查询的能力:

foreach (string searchField in allFields) {
    innerExpr.Add(new WildcardQuery(new Term(searchField, searchParam)), Occur.SHOULD);
}

Then, in the future, you need only add/change/remove field names to the array, and not have to manage your list of queries.

然后,将来,您只需要向阵列添加/更改/删除字段名称,而不必管理您的查询列表。