来自Optimizing Search Results in Elasticsearch with Scoring and Boosting 作者:Neil Alex 2015/03/18
虽然es提供了高效的打分函数,但是在电商环境下还是不够用。大多数的用户还是关注排在前几名的结果,所以有灵活的打分机制尤为重要。如果能根据用户的需要展示搜索结果,那么转化率就尤其可观了。
本文中,我们先来看看es默认的评分配置,然后再定制几个评分。本文介绍到的内容可以帮助你做一个用户定制结果的评分。
es默认采用了lucene的评分公式,用正的浮点数_score
来表示相关得分。这个_score
越高,文档的相关性也就越高。一个查询子句会为每个文档生成一个_score
,计算取决于查询子句的类型。
查询子句服务于不同的目的:模糊查询的_score
取决于原始的搜索词与发现的词的拼写的相似度。词的查询会考虑查到的词的比例。一般情况,相关度都是指计算全文的field的内容与全文query串的相关度。
es中用的标准的相似度算法就是词频/逆文档频率,即tf/idf,考虑了如下要素:
要素 | 描述 |
---|---|
tf | 词频 |
idf | 逆文档频率 |
coord | 匹配到多个词的处理策略 |
lengthnorm | 短field的处理策略 |
querynorm | query标准化因素 |
boost(index) | index阶段的boost要素 |
boost(query) | query阶段的boost要素 |
上述要素决定了elasticsearch中的决定文档得分的处理要素。
词频(TF):词频是对一个词在文档的内容中出现次数的计量。如果出现次数多,得分就高,该文档与查询相关的可能性就高。
逆文档频率(IDF):逆文档频率是对搜索的词在文档集中出现的频率的衡量。如果一个搜索词在很多文档中都普遍的出现,(这个词)的得分就比较低。那种稀有词如果在文档中频繁出现,会把评分值boost的较高。
共现因子(Coord):共现因子是对出现多个搜索词的衡量,query中的词共现的越多,整体的得分就越高。比如搜索这两个词“woolen(羊毛)”&“jacket(夹克)”。两个词放在一起搜也没啥问题:在内部会转化为bool查询,每个词都会单独的去搜索。两个词都包含的文档比那些只包含一个词的文档得分高。如果你给query的权重是2,那么两个词都包含的词的coord是2*2 = 4。只包含一个词的coord权重就是2 * 1 = 2。
长度标准化(lengthnorm):会衡量短field的匹配,给出更高的权重。
译者:lucene考虑了文章的长度,因为考虑到更长的文章会包含更多的词,从而通过lengthnorm进行标准化。所以lucene会跟偏向于短标题。
比如:title,短文本的标准化因子0.5,而较长的标题可能只有0.01。所以标准化因子可能极大的影响得分。
比如,如果搜索词在title中,那它比在content中更相关,得分更高。
query标准化(querynorm):虽然不直接与文档相关度相关,querynorm在你对query类型的组合时,可以对query进行衡量。
Index时boost(index time boost)&Query时boost(query time boost):可以在索引时和查询时进行boost。对特定的field进行boost时,会让得分的计算更加明显。
Lucene的评分计算:默认的es的得分算法是布尔检索与空间向量模型的组合。通过布尔模型的文档会通过空间向量模型进行下一步的评分计算。
得分公式如下:
score(q,d)= queryNorm(q)∗coord(q,d)∗∑t∈q(tf(t∈q)∗idf(t)2∗t.getBoost()∗norm(t,d))
我们来看看我们如何用这些要素来计算一个文档的得分,首先我们先来看看调试es中query的工具。打开query中的explain,你就可以得到上述的得分因素中的每一个详细解释,以及该文档的最终得分。我们不推荐在最终的产品中使用这个结果,但是你在开发中debug,调整query时还是很有用的。
调整_score
最有用的工具就是function_score
。
实际上es为每个匹配提供了很多得分计算方法。可以使用custom_score
以及脚本来获取特定数字域的得分。例如:
“script”:
“_score * doc[‘my_numeric_field’].value”
我们这里my_numeric_field
乘上默认的_score
进行加权。也可以使用custom_filers_score_query
,可以应用过滤器来限制结果集,在用脚本或者boost来为过滤后的文档分配一个boost。相似的也可以应用custom_boost_factor
来乘上默认的打分来给query一个boost的值。在es的0.19.0版本中,有一个新query可以把所有的要素都融合进function_score
。再附加上内嵌的功能,用脚本可以实现更多功能。用function_score
你首先需要定义一个query,然后就可以使用任意的过滤器了。使用过滤器来限制达到你标准的结果。这样就可以不计算那些你不想要结果得分了。如果你选择这种boost模式,你可以通过你定制的函数决定最后的得分。你可以用默认的结果直接代替得分,或者通过你函数的计算来乘上默认的搜索结果。
Elasticsearch提供了结合多个函数结果来计算得分的多种函数的能力。也有多种用 function_score
query来计算多找要素的方法,其中有:时间要素、距离特定点的距离、热门程度等。我们现在在来进一步的深入了解给数据集的做相关度的tuning。
script_score提供了通过script表达式来定义一个评分函数的功能。再加上field_value_factor,你可以获得一个特定field的值,这样获得的这个field的值可以直接参与最终的评分的计算。DECAY_FUNCTION提供得分的衰减模式。
例子1:你想找到一个地理坐标点的距离,5km以内的点权重要是5km外的点的权重的三倍。
例子2:考虑发布时间的文档,比如一个文档的头15天得分为7,发布时间为25的话,得分就应该为3。这样的例子都可以使用DECAY_FUNCTION以及加权函数和随机函数。你也可以写一个定制函数来应用上述的那些函数,并可以在此基础上进行定制。
如上述,我们可以运用多种函数来计算一个得分值,然后再用score_mode
和boost_mode
来融合这些函数的输出。score_mode
定义了你定义的单独的函数怎样融合,boost_mode
定义了你如何给默认的得分一个特定的得分函数或者连续加或乘所有的函数结果。如下列出了boost_mode
和 score_mode
。
boost_mode的功能
多种query得分以及函数得分(默认的)的加和加上函数得分的平均,平均了函数得分,定制的得分的最小值首先代替query得分,函数得分的最大值代替query得分和函数得分的大者。
score_mode中的可选项
模式 | 功能 |
---|---|
multiply | 乘函数打分值(默认) |
sum | 加和 |
avg | 平均值 |
first | 应用filter匹配上的第一个函数值 |
min | 采用函数得分的最小值 |
max | 采用函数得分的最大值 |
我们现在来做一个完整的例子,我们自己写一个按照热度递减的排序。首先索引得分,假定一个特定的item如果他的热度高,对应的排序就较高。简单的方式就是定义一个function_score
query,用内建的field_value_factor
函数来介入得分:
POST /ecomercedata/gadgets/_search
{
"explain": true,
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"field_value_factor": {
"field": "rating"
}
}
],
"boost_mode": "multiply"
}
}
}
我们来review一下上述的代码,t.getboost()
以及boost
不可见,因为他们都在querynorm中。可以尝试给特定的field加boost值,可以看到匹配到的在explanation中有更高的querynorm
得分。
{{ "took": 22, "timed_out": false, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 9, "max_score": 9, "hits": [ { "_shard": 0, "_node": "477kWUQVR2eiLpIQEN4vFw", "_index": "ecomercedata", "_type": "gadgets", "_id": "nKk9DfRnTDyUU80cepXRrw", "_score": 9, "_source": { "name": "MacBookPro", "category": "Laptop", "brand": "Apple", "rating": 9, "prize": 1299, "piecesSold": 9500, "dateOfRelease": "2005-02-01" }, "_explanation": { "value": 9, "description": "function score, product of:", "details": [ { "value": 1, "description": "ConstantScore(*:*), product of:", "details": [ { "value": 1, "description": "boost" }, { "value": 1, "description": "queryNorm" } ] }, { "value": 9, "description": "Math.min of", "details": [ { "value": 9, "description": "function score, score mode [multiply]", "details": [ { "value": 9, "description": "function score, product of:", "details": [ { "value": 1, "description": "match filter: *:*" }, { "value": 9, "description": "field value function: (doc['rating'].value * factor=1.0)", "details": [ { "value": 1, "description": "ConstantScore(*:*), product of:", "details": [ { "value": 1, "description": "boost" }, { "value": 1, "description": "queryNorm" } ] } ] } ] } ] }, { "value": 3.4028235e+38, "description": "maxBoost" } ] }, { "value": 1, "description": "queryBoost" } ] } }, { "_shard": 0, "_node": "477kWUQVR2eiLpIQEN4vFw", "_index": "ecomercedata", "_type": "gadgets", "_id": "KNycQwC5TcSmhXPBdKgW3g", "_score": 9, "_source": { "name": "Ipad", "category": "Tablet", "brand": "Apple", "rating": 9, "prize": 600, "piecesSold": 9500, "dateOfRelease": "2005-07-01" }, "_explanation": { "value": 9, "description": "function score, product of:", "details": [ { "value": 1, "description": "ConstantScore(*:*), product of:", "details": [ { "value": 1, "description": "boost" }, { "value": 1, "description": "queryNorm" } ] }, { "value": 9, "description": "Math.min of", "details": [ { "value": 9, "description": "function score, score mode [multiply]", "details": [ { "value": 9, "description": "function score, product of:", "details": [ { "value": 1, "description": "match filter: *:*" }, { "value": 9, "description": "field value function: (doc['rating'].value * factor=1.0)", "details": [ { "value": 1, "description": "ConstantScore(*:*), product of:", "details": [ { "value": 1, "description": "boost" }, { "value": 1, "description": "queryNorm" } ] } ] } ] } ] }, { "value": 3.4028235e+38, "description": "maxBoost" } ] }, { "value": 1, "description": "queryBoost" } ] } }, { "_shard": 0, "_node": "477kWUQVR2eiLpIQEN4vFw", "_index": "ecomercedata", "_type": "gadgets", "_id": "IfTr4n90Tbez-t26Iu6JCg", "_score": 8, "_source": { "name": "MacBookAir", "category": "Laptop", "brand": "Apple", "rating": 8, "prize": 1099, "piecesSold": 8700, "dateOfRelease": "2006-05-01" }, "_explanation": { "value": 8, "description": "function score, product of:", "details": [ { "value": 1, "description": "ConstantScore(*:*), product of:", "details": [ { "value": 1, "description": "boost" }, { "value": 1, "description": "queryNorm" } ] }, { "value": 8, "description": "Math.min of", "details": [ { "value": 8, "description": "function score, score mode [multiply]", "details": [ { "value": 8, "description": "function score, product of:", "details": [ { "value": 1, "description": "match filter: *:*" }, { "value": 8, "description": "field value function: (doc['rating'].value * factor=1.0)", "details": [ { "value": 1, "description": "ConstantScore(*:*), product of:", "details": [ { "value": 1, "description": "boost" }, { "value": 1, "description": "queryNorm" } ] } ] } ] } ] }, { "value": 3.4028235e+38, "description": "maxBoost" } ] }, { "value": 1, "description": "queryBoost" } ] } }, { "_shard": 0, "_node": "477kWUQVR2eiLpIQEN4vFw", "_index": "ecomercedata", "_type": "gadgets", "_id": "90hw7WyKSu2X0YAk3NBTMQ", "_score": 8, "_source": { "name": "ATIVBook", "category": "Laptop", "brand": "Samsung", "rating": 8, "prize": 1899, "piecesSold": 3500, "dateOfRelease": "2014-05-01" }, "_explanation": { "value": 8, "description": "function score, product of:", "details": [ { "value": 1, "description": "ConstantScore(*:*), product of:", "details": [ { "value": 1, "description": "boost" }, { "value": 1, "description": "queryNorm" } ] }, { "value": 8, "description": "Math.min of", "details": [ { "value": 8, "description": "function score, score mode [multiply]", "details": [ { "value": 8, "description": "function score, product of:", "details": [ { "value": 1, "description": "match filter: *:*" }, { "value": 8, "description": "field value function: (doc['rating'].value * factor=1.0)", "details": [ { "value": 1, "description": "ConstantScore(*:*), product of:", "details": [ { "value": 1, "description": "boost" }, { "value": 1, "description": "queryNorm" } ] } ] } ] } ] }, { "value": 3.4028235e+38, "description": "maxBoost" } ] }, { "value": 1, "description": "queryBoost" } ] } }, { "_shard": 0, "_node": "477kWUQVR2eiLpIQEN4vFw", "_index": "ecomercedata", "_type": "gadgets", "_id": "xquXInoJSSOnPwuzrOTO8A", "_score": 8, "_source": { "name": "GalaxyTab", "category": "Tablet", "brand": "Samsung", "rating": 8, "prize": 550, "piecesSold": 8500, "dateOfRelease": "2007-07-01" }, "_explanation": { "value": 8, "description": "function score, product of:", "details": [ { "value": 1, "description": "ConstantScore(*:*), product of:", "details": [ { "value": 1, "description": "boost" }, { "value": 1, "description": "queryNorm" } ] }, { "value": 8, "description": "Math.min of", "details": [ { "value": 8, "description": "function score, score mode [multiply]", "details": [ { "value": 8, "description": "function score, product of:", "details": [ { "value": 1, "description": "match filter: *:*" }, { "value": 8, "description": "field value function: (doc['rating'].value * factor=1.0)", "details": [ { "value": 1, "description": "ConstantScore(*:*), product of:", "details": [ { "value": 1, "description": "boost" }, { "value": 1, "description": "queryNorm" } ] } ] } ] } ] }, { "value": 3.4028235e+38, "description": "maxBoost" } ] }, { "value": 1, "description": "queryBoost" } ] } }, { "_shard": 0, "_node": "477kWUQVR2eiLpIQEN4vFw", "_index": "ecomercedata", "_type": "gadgets", "_id": "qnSLIKIWTsyjRdM0dNcrWg", "_score": 8, "_source": { "name": "Iphone", "category": "Mobile", "brand": "Apple", "rating": 8, "prize": 60, "piecesSold": 28000, "dateOfRelease": "2002-03-01" }, "_explanation": { "value": 8, "description": "function score, product of:", "details": [ { "value": 1, "description": "ConstantScore(*:*), product of:", "details": [ { "value": 1, "description": "boost" }, { "value": 1, "description": "queryNorm" } ] }, { "value": 8, "description": "Math.min of", "details": [ { "value": 8, "description": "function score, score mode [multiply]", "details": [ { "value": 8, "description": "function score, product of:", "details": [ { "value": 1, "description": "match filter: *:*" }, { "value": 8, "description": "field value function: (doc['rating'].value * factor=1.0)", "details": [ { "value": 1, "description": "ConstantScore(*:*), product of:", "details": [ { "value": 1, "description": "boost" }, { "value": 1, "description": "queryNorm" } ] } ] } ] } ] }, { "value": 3.4028235e+38, "description": "maxBoost" } ] }, { "value": 1, "description": "queryBoost" } ] } }, { "_shard": 0, "_node": "477kWUQVR2eiLpIQEN4vFw", "_index": "ecomercedata", "_type": "gadgets", "_id": "uq5kDPYlTQC6mRBJDtS2lQ", "_score": 8, "_source": { "name": "Xperia", "category": "Mobile", "brand": "Sony", "rating": 8, "prize": 70, "piecesSold": 24000, "dateOfRelease": "2004-03-01" }, "_explanation": { "value": 8, "description": "function score, product of:", "details": [ { "value": 1, "description": "ConstantScore(*:*), product of:", "details": [ { "value": 1, "description": "boost" }, { "value": 1, "description": "queryNorm" } ] }, { "value": 8, "description": "Math.min of", "details": [ { "value": 8, "description": "function score, score mode [multiply]", "details": [ { "value": 8, "description": "function score, product of:", "details": [ { "value": 1, "description": "match filter: *:*" }, { "value": 8, "description": "field value function: (doc['rating'].value * factor=1.0)", "details": [ { "value": 1, "description": "ConstantScore(*:*), product of:", "details": [ { "value": 1, "description": "boost" }, { "value": 1, "description": "queryNorm" } ] } ] } ] } ] }, { "value": 3.4028235e+38, "description": "maxBoost" } ] }, { "value": 1, "description": "queryBoost" } ] } }, { "_shard": 0, "_node": "477kWUQVR2eiLpIQEN4vFw", "_index": "ecomercedata", "_type": "gadgets", "_id": "YdtFFxICR-6nMxNcQmWoaQ", "_score": 6, "_source": { "name": "Inspiron", "category": "Laptop", "brand": "Dell", "rating": 6, "prize": 700, "piecesSold": 4600, "dateOfRelease": "2008-03-01" }, "_explanation": { "value": 6, "description": "function score, product of:", "details": [ { "value": 1, "description": "ConstantScore(*:*), product of:", "details": [ { "value": 1, "description": "boost" }, { "value": 1, "description": "queryNorm" } ] }, { "value": 6, "description": "Math.min of", "details": [ { "value": 6, "description": "function score, score mode [multiply]", "details": [ { "value": 6, "description": "function score, product of:", "details": [ { "value": 1, "description": "match filter: *:*" }, { "value": 6, "description": "field value function: (doc['rating'].value * factor=1.0)", "details": [ { "value": 1, "description": "ConstantScore(*:*), product of:", "details": [ { "value": 1, "description": "boost" }, { "value": 1, "description": "queryNorm" } ] } ] } ] } ] }, { "value": 3.4028235e+38, "description": "maxBoost" } ] }, { "value": 1, "description": "queryBoost" } ] } }, { "_shard": 0, "_node": "477kWUQVR2eiLpIQEN4vFw", "_index": "ecomercedata", "_type": "gadgets", "_id": "MjSPJ9hqTbu8U6PfyMrl4A", "_score": 6, "_source": { "name": "Lumia", "category": "Mobile", "brand": "Nokia", "rating": 6, "prize": 50, "piecesSold": 12000, "dateOfRelease": "2009-03-01" }, "_explanation": { "value": 6, "description": "function score, product of:", "details": [ { "value": 1, "description": "ConstantScore(*:*), product of:", "details": [ { "value": 1, "description": "boost" }, { "value": 1, "description": "queryNorm" } ] }, { "value": 6, "description": "Math.min of", "details": [ { "value": 6, "description": "function score, score mode [multiply]", "details": [ { "value": 6, "description": "function score, product of:", "details": [ { "value": 1, "description": "match filter: *:*" }, { "value": 6, "description": "field value function: (doc['rating'].value * factor=1.0)", "details": [ { "value": 1, "description": "ConstantScore(*:*), product of:", "details": [ { "value": 1, "description": "boost" }, { "value": 1, "description": "queryNorm" } ] } ] }` ] } ] }, { "value": 3.4028235e+38, "description": "maxBoost" } ] }, { "value": 1, "description": "queryBoost" } ] } } ] } }
你觉得我们应该在得分计算中考虑产品的时间要素。可能新的产品比起那些较老的产品应该赋更高的权重,然后把这些与上述的热度要素进行融合。我们这样写:
POST /ecomercedata/gadgets/_search
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"field_value_factor": {
"field": "rating"
}
},
{
"field_value_factor": {
"field": "dateOfRelease",
}
}
],
"boost_mode": "replace",
"score_mode" : "multiply"
}
}
这样改得分值涨到了一个较高的分值,有多种避免这种结果的方式,我们在赋权重的时候应该注意控制权重。也可以使用定制的脚本来计算得分值,下篇博客我们继续讨论。