I'm trying to incorporate Lucene.net in my web search.
我正在尝试将Lucene.net合并到我的网络搜索中。
Currently I have a lucene.net index that contains +1 million documents with 7 fields each. The last field is the "all" field that has the content of the previous fields concatenated. Searching the all field is just EXTREMELY fast :)
目前我有一个lucene.net索引,其中包含+1万个文档,每个文档有7个字段。最后一个字段是“all”字段,其中包含先前字段连接的内容。搜索所有字段只是非常快:)
But I feel there is more to be found here. How can I make a search that searches one or more space separated strings over all the fields without using the "all" field?
I want to be able to give weights to certain fields. Furthermore it would be really nice if the search contained information on WHERE the hit took place so I can show it in the result.
但我觉得这里还有更多。如何在不使用“all”字段的情况下进行搜索,在所有字段上搜索一个或多个空格分隔的字符串?我希望能够给某些领域赋予权重。此外,如果搜索包含关于发生命中的WHERE的信息,那将是非常好的,所以我可以在结果中显示它。
I think this is all possible, but I don't immideatelly see how.
Any help?
我认为这一切都是可能的,但我不会随便看看如何。有帮助吗?
3 个解决方案
#1
I don't think you need to maintain an "all" field.
我认为你不需要保持“全部”字段。
- Have a look into using a "MultiFieldQueryParser". Rather than taking a single default field to be used by the query parser, it accepts an array of field names (in addition to the index analyser).
- Term boost should work as per "QueryParser" (i.e. no special action required). I should add that I've found the standard scoring seems OK for me (length of field, number of matches etc) without using boosted terms.
- Lucene.Net (well, certainly the SVN 2.3 builds at the moment) includes a port of the Highlight package from the Java source. It does have a couple of quirks (not least of which is that it can be tricky to get going in the first place), but it basically works.
看看使用“MultiFieldQueryParser”。它不接受查询解析器使用的单个默认字段,而是接受字段名称数组(除索引分析器外)。
术语提升应该按照“QueryParser”工作(即不需要特殊操作)。我应该补充一点,我发现标准得分对我来说似乎没问题(场地长度,比赛次数等),而不使用提升的条款。
Lucene.Net(当然,SVN 2.3当前构建)包括来自Java源代码的Highlight包的端口。它确实有一些怪癖(其中最重要的是它首先开始变得棘手),但它基本上有效。
Good luck
#2
We do something similar, the trick is to specify fields in your query string:
我们做类似的事情,诀窍是在查询字符串中指定字段:
(+Tier1:ribbon^1)^4 OR (+Tier2:ribbon^1)^4 OR (+Tier3:ribbon^1) OR (+Tier4:q*ribbon*^1)^12
In the above example, the user searched for "ribbon" in our application. We have different segments of data in different fields, and the final field "Tier4" contains all the previous terms concatenated together. We prepend the field with a "q", so we can do leading wild-cards, also:
在上面的示例中,用户在我们的应用程序中搜索“功能区”。我们在不同的字段中有不同的数据段,最后的字段“Tier4”包含连接在一起的所有先前的术语。我们在前面加上一个“q”,所以我们可以做领先的外卡,还有:
(+Tier4:q*ribbon*^1)^12
Lastly, we use boosts with the caret (^). This ends up weighting things differently. It took a while to get boosts right, and I'm still not 100% happy with them, but they do make a big impact.
最后,我们使用插入符号(^)进行提升。这最终会以不同的方式加权。需要一段时间才能获得正确的提升,我仍然不会对他们百分百满意,但他们确实会产生很大的影响。
#3
You have to get Lucene in Action. Although about original (that is Java) Lucene implementation, it contains all the information you need: about boosts, highlighters, qwery parsers, etc.
你必须让Lucene行动起来。虽然关于原始(即Java)Lucene实现,它包含您需要的所有信息:有关boost,highlighters,qwery解析器等。
#1
I don't think you need to maintain an "all" field.
我认为你不需要保持“全部”字段。
- Have a look into using a "MultiFieldQueryParser". Rather than taking a single default field to be used by the query parser, it accepts an array of field names (in addition to the index analyser).
- Term boost should work as per "QueryParser" (i.e. no special action required). I should add that I've found the standard scoring seems OK for me (length of field, number of matches etc) without using boosted terms.
- Lucene.Net (well, certainly the SVN 2.3 builds at the moment) includes a port of the Highlight package from the Java source. It does have a couple of quirks (not least of which is that it can be tricky to get going in the first place), but it basically works.
看看使用“MultiFieldQueryParser”。它不接受查询解析器使用的单个默认字段,而是接受字段名称数组(除索引分析器外)。
术语提升应该按照“QueryParser”工作(即不需要特殊操作)。我应该补充一点,我发现标准得分对我来说似乎没问题(场地长度,比赛次数等),而不使用提升的条款。
Lucene.Net(当然,SVN 2.3当前构建)包括来自Java源代码的Highlight包的端口。它确实有一些怪癖(其中最重要的是它首先开始变得棘手),但它基本上有效。
Good luck
#2
We do something similar, the trick is to specify fields in your query string:
我们做类似的事情,诀窍是在查询字符串中指定字段:
(+Tier1:ribbon^1)^4 OR (+Tier2:ribbon^1)^4 OR (+Tier3:ribbon^1) OR (+Tier4:q*ribbon*^1)^12
In the above example, the user searched for "ribbon" in our application. We have different segments of data in different fields, and the final field "Tier4" contains all the previous terms concatenated together. We prepend the field with a "q", so we can do leading wild-cards, also:
在上面的示例中,用户在我们的应用程序中搜索“功能区”。我们在不同的字段中有不同的数据段,最后的字段“Tier4”包含连接在一起的所有先前的术语。我们在前面加上一个“q”,所以我们可以做领先的外卡,还有:
(+Tier4:q*ribbon*^1)^12
Lastly, we use boosts with the caret (^). This ends up weighting things differently. It took a while to get boosts right, and I'm still not 100% happy with them, but they do make a big impact.
最后,我们使用插入符号(^)进行提升。这最终会以不同的方式加权。需要一段时间才能获得正确的提升,我仍然不会对他们百分百满意,但他们确实会产生很大的影响。
#3
You have to get Lucene in Action. Although about original (that is Java) Lucene implementation, it contains all the information you need: about boosts, highlighters, qwery parsers, etc.
你必须让Lucene行动起来。虽然关于原始(即Java)Lucene实现,它包含您需要的所有信息:有关boost,highlighters,qwery解析器等。