How would I take the following data and be able to get the results below?
我如何获取以下数据并能够得到以下结果?
Ive included a code sample but I cant seem to figure out how to properly search through multiple columns and apply the boost I need.
我已经包含了一个代码示例,但我似乎无法弄清楚如何正确搜索多个列并应用我需要的提升。
Am I going about this the right way?
我是以正确的方式来做这件事的吗?
Boost / Weight for each column
First Name = 100
Last Name = 75
Bio = 50
Data
First Name, Last Name, BioBenny, Benson, This is a test
- "ben" appears in the first name AND last name
- Score = 175
Jim, Smith, Another test with the word ben
- "ben" appears in the bio
- Score = 50
John, Benson, And another test here
- "ben" appears in the last name
- Score = 75
Results
1. Benny
2. John
3. Jim
protected override void _addToLuceneIndex(dynamic item, IndexWriter writer)
{
var user = item as UserTestItem;
if (user == null) return;
// remove older index entry
var searchQuery = new TermQuery(new Term(USER_ID, user.UserID.ToString(CultureInfo.InvariantCulture)));
writer.DeleteDocuments(searchQuery);
// add new index entry
var doc = new Document();
// get fields
var userId = new Field(USER_ID, user.UserID.ToString(CultureInfo.InvariantCulture), Field.Store.YES, Field.Index.NOT_ANALYZED);
var firstName = new Field(FIRST_NAME, user.FirstName ?? string.Empty, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES);
var lastName = new Field(LAST_NAME, user.LastName ?? string.Empty, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES);
var bio = new Field(BIO, user.Bio ?? string.Empty, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES);
// add boosts
firstName.Boost = 100f;
lastName.Boost = 75f;
bio.Boost = 50f;
// add lucene fields mapped to db fields
doc.Add(userId);
doc.Add(firstName);
doc.Add(lastName);
doc.Add(bio);
// add entry to index
writer.AddDocument(doc);
}
public string[] FieldsToSearch { get; set; } // i.e. "FirstName", "LastName", "Bio"
public UserSearchResults SearchUsers(string searchQuery, bool exact = false)
{
var results = new UserSearchResults();
if (!string.IsNullOrEmpty(searchQuery))
{
//searchQuery = PrepareInput(searchQuery, exact);
try
{
using (var searcher = new IndexSearcher(IndexDirectory, false))
{
var analyzer = new StandardAnalyzer(LUCENE_VERSION);
// Search by multiple fields (ordered by RELEVANCE)
var parser = new MultiFieldQueryParser(LUCENE_VERSION, FieldsToSearch, analyzer);
parser.AllowLeadingWildcard = true;
parser.DefaultOperator = exact ? QueryParser.AND_OPERATOR : QueryParser.OR_OPERATOR;
var multiFieldQuery = ParseQuery(searchQuery, parser);
var hits = searcher.Search(multiFieldQuery, null, SearchResultLimit, Sort.RELEVANCE);
var docs = hits.ScoreDocs;
results.Items = _mapLuceneToDataList(docs, searcher).Cast<UserTestItem>().ToList();
results.Total = results.Items.Count;
results.RawQuery = LastUsedQuery.ToString();
analyzer.Close();
searcher.Dispose();
}
}
catch (Exception ex)
{
Debug.WriteLine(ex.ToString());
}
}
return results;
}
3 个解决方案
#1
1
You can pass boosts directly to MultiFieldQueryParser, example:
您可以将boost直接传递给MultiFieldQueryParser,例如:
var boosts = new Dictionary<string, float>
{
{"First Name", 100},
{"Last Name", 75},
{"Bio", 50},
}
var parser = new MultiFieldQueryParser(LUCENE_VERSION, FieldsToSearch, analyzer, boosts);
#2
0
I believe I have the behavior I want now. I had to boost the query like this instead of boosting during indexing which I was doing.
我相信我现在有我想要的行为。我不得不像这样提升查询,而不是在我正在做的索引期间提升。
var mainQuery = new BooleanQuery();
var fnQuery = new BooleanQuery();
fnQuery.Add(new WildcardQuery(new Term(FIRST_NAME, searchQuery)), Occur.SHOULD);
fnQuery.Boost = 100f;
var lnQuery = new BooleanQuery();
lnQuery.Add(new WildcardQuery(new Term(LAST_NAME, searchQuery)), Occur.SHOULD);
lnQuery.Boost = 75f;
var bioQuery = new BooleanQuery();
bioQuery.Add(new WildcardQuery(new Term(BIO, searchQuery)), Occur.SHOULD);
bioQuery.Boost = 50f;
mainQuery.Add(fnQuery, Occur.SHOULD);
mainQuery.Add(lnQuery, Occur.SHOULD);
mainQuery.Add(bioQuery, Occur.SHOULD);
var hits = searcher.Search(mainQuery, null, SearchResultLimit, Sort.RELEVANCE);
Is there a more appropriate way to do this?
有没有更合适的方法来做到这一点?
#3
0
Boost gets lost in the Lucene's algorithm, so its a mix of boost and the search terms. Another option you have to to return a set of matches (i.e. I searched on Kevin Smith) and there are 50 matches based on similarity/Lucene's score. Then have an additional field calculated in a database and sort by the field, use LINQ to sort the results from the Lucene collector. This works a little different than boosting, but you have more control over the exact score and sorting.
Boost迷失在Lucene的算法中,所以它是一个强化和搜索术语的混合体。你必须返回一组比赛的另一个选项(即我在凯文史密斯上搜索),根据相似度/ Lucene的得分有50场比赛。然后在数据库中计算一个额外的字段并按字段排序,使用LINQ对Lucene收集器的结果进行排序。这与增强功能略有不同,但您可以更好地控制精确分数和排序。
#1
1
You can pass boosts directly to MultiFieldQueryParser, example:
您可以将boost直接传递给MultiFieldQueryParser,例如:
var boosts = new Dictionary<string, float>
{
{"First Name", 100},
{"Last Name", 75},
{"Bio", 50},
}
var parser = new MultiFieldQueryParser(LUCENE_VERSION, FieldsToSearch, analyzer, boosts);
#2
0
I believe I have the behavior I want now. I had to boost the query like this instead of boosting during indexing which I was doing.
我相信我现在有我想要的行为。我不得不像这样提升查询,而不是在我正在做的索引期间提升。
var mainQuery = new BooleanQuery();
var fnQuery = new BooleanQuery();
fnQuery.Add(new WildcardQuery(new Term(FIRST_NAME, searchQuery)), Occur.SHOULD);
fnQuery.Boost = 100f;
var lnQuery = new BooleanQuery();
lnQuery.Add(new WildcardQuery(new Term(LAST_NAME, searchQuery)), Occur.SHOULD);
lnQuery.Boost = 75f;
var bioQuery = new BooleanQuery();
bioQuery.Add(new WildcardQuery(new Term(BIO, searchQuery)), Occur.SHOULD);
bioQuery.Boost = 50f;
mainQuery.Add(fnQuery, Occur.SHOULD);
mainQuery.Add(lnQuery, Occur.SHOULD);
mainQuery.Add(bioQuery, Occur.SHOULD);
var hits = searcher.Search(mainQuery, null, SearchResultLimit, Sort.RELEVANCE);
Is there a more appropriate way to do this?
有没有更合适的方法来做到这一点?
#3
0
Boost gets lost in the Lucene's algorithm, so its a mix of boost and the search terms. Another option you have to to return a set of matches (i.e. I searched on Kevin Smith) and there are 50 matches based on similarity/Lucene's score. Then have an additional field calculated in a database and sort by the field, use LINQ to sort the results from the Lucene collector. This works a little different than boosting, but you have more control over the exact score and sorting.
Boost迷失在Lucene的算法中,所以它是一个强化和搜索术语的混合体。你必须返回一组比赛的另一个选项(即我在凯文史密斯上搜索),根据相似度/ Lucene的得分有50场比赛。然后在数据库中计算一个额外的字段并按字段排序,使用LINQ对Lucene收集器的结果进行排序。这与增强功能略有不同,但您可以更好地控制精确分数和排序。