【Elastic知识简报】normalizer与analyzer的区别

时间:2023-02-03 10:00:11


1、区别

normalizer与analyzer的作用类似,都是对字段进行处理,但是不同之处在于normalizer不会对字段进行分词,也就是说normalizer没有tokenizer。

所以normalizer是作用于keyword类型的字段的,相当于我们需要给keyword类型字段做一个额外的处理时,比如转换为小写时就可以用到normalizer

2、除了keyword类型,其他类型字段能设置normalizer吗?

不能

3、给keyword类型字段设置了analyzer,该字段会分词吗?

不会,实际上keyword类型是不能设置analyzer的,该类型下没有这个属性,强行设置会直接报错

4、normalizer也会作用到查询词上

当查询设置了normalizer属性的keyword类型字段时,其normalizer也会作用到查询词上

下面我们通过一个实验来证明这一点,同时也通过这个设置来体会normalizer与analyzer用法上的互通性:
设置mappings

PUT test1
{
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
},
"title": {
"type": "text",
"analyzer": "standard",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
},
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"filter": ["lowercase"],
"char_filter": []
}
},
"analyzer": {
"my_analyzer": {
"filter": ["lowercase"],
"tokenizer": "standard"
}
}
}
}
}

插入数据

POST test1/_bulk
{"index":{}}
{"name": "THIS is GOOD NEWS","title":"GOOD NEWS"}
{"index":{}}
{"name": "this is good news","title":"good news"}

查询

GET test1/_search
{
"query": {
"match": {
"name.keyword": "THIS IS GOOD NEWS"
}
}
}

执行结果:
会发现将两条数据都查询出来了,因为我们在name.keyword上设置了normalizer(转换小写),源数据中的name.keyword数据被转换为了小写,同时查询词也会被转换为小写,所有都为小写时自然将结果查询出来了

"hits" : [
{
"_index" : "test1",
"_type" : "_doc",
"_id" : "--IF9n0BcmNQdWdLpMWX",
"_score" : 0.18232156,
"_source" : {
"name" : "THIS is GOOD NEWS",
"title" : "GOOD NEWS"
}
},
{
"_index" : "test1",
"_type" : "_doc",
"_id" : "_OIF9n0BcmNQdWdLpMWX",
"_score" : 0.18232156,
"_source" : {
"name" : "this is good news",
"title" : "good news"
}
}
]
}

查询2

GET test1/_search
{
"query": {
"match": {
"title.keyword": "good news"
}
}
}

结果:
在title.keyword上没有设置normalizer,所以只能查询到小写的结果

"hits" : [
{
"_index" : "test1",
"_type" : "_doc",
"_id" : "_OIF9n0BcmNQdWdLpMWX",
"_score" : 0.6931471,
"_source" : {
"name" : "this is good news",
"title" : "good news"
}
}
]