ElasticSearch(五):Mapping和常见字段类型

时间:2023-03-08 23:37:41
ElasticSearch(五):Mapping和常见字段类型

ElasticSearch(五):Mapping和常见字段类型

学习课程链接《Elasticsearch核心技术与实战》

## 什么是Mapping
* Mapping类似数据库中的schema的定义,作用如下:
- 定义索引中的字段的名称;
- 定义字段的数据类型,例如字符串、数字、日期、布尔等;
- 对每个字段进行倒排索引的相关配置(Analyzed or Not Analyzed,Analyzer);
* Mapping 会把JSON文旦映射成Lucene所需要的扁平格式。
* 一个Mapping属于一个索引的Type:
- 每个文档都属于一个Type;
- 一个Tpye有一个Mapping定义;
- 7.0开始,不需要再Mapping定义中指定type信息;

## 字段的数据类型
* 简单类型
- Text
- Date
- Integer/Long/Floating
- Boolean
- IP4&IP6
- Keyword
* 复杂类型
- 对象类型
- 嵌套类型
* 特殊类型(地理信息)
- geo_point&geo_shape、percolator

## 什么是Dynamic Mapping
* 在写入文档的时候,如果索引不存在,则会自动创建索引;
* Dynamic Mapping机制,可以无需手动定义Mapping,ElasticSearch会自动根据文档信息,推算出字段的类型;
* 但是有时候推算的可能不对,例如地理位置信息;
* 当类型设置的不对时,会导致一些功能无法正常运行,比如范围内的Range查询;

## 类型的自动识别
JSON类型|Elasticsearch类型
---|---
字符串|匹配日期格式,设置成Date;匹配数字设置成Float或者Long,该选项默认关闭;设置为Text,并且增加keyword子字段
布尔值|Boolean
浮点数|Float
整数|Long
对象|Object
数组|由第一个非空数的类型所决定
空值|忽略

```
#写入文档,查看 Mapping
PUT mapping_test/_doc/1
{
"firstName":"Chan",
"loginDate":"2018-07-24T10:29:48.103Z",
"uid" : "123",
"isVip" : false,
"isAdmin": "true",
"age":19,
"heigh":180
}

Delete index

DELETE mapping_test

查看 Dynamic Mapping文件

GET mapping_test/_mapping


查看 Dynamic Mapping返回结果

{

"mapping_test" : {

"mappings" : {

"properties" : {

"age" : {

"type" : "long" # "age":19,设置为long

},

"firstName" : {

"type" : "text", # "firstName":"Chan",设置为Text,并且增加keyword子字段

"fields" : {

"keyword" : {

"type" : "keyword",

"ignore_above" : 256

}

}

},

"heigh" : {

"type" : "long" #"heigh":180设置为long

},

"isAdmin" : {

"type" : "text", #"isAdmin": "true",设置为Text,并且增加keyword子字段

"fields" : {

"keyword" : {

"type" : "keyword",

"ignore_above" : 256

}

}

},

"isVip" : {

"type" : "boolean" #"isVip" : false,设置为boolean

},

"loginDate" : {

"type" : "date" #"loginDate":"2018-07-24T10:29:48.103Z",设置为Date

},

"uid" : {

"type" : "text", # "uid" : "123",设置为Text,并且增加keyword子字段,匹配数字设置成Float或者Long,该选项默认关闭;

"fields" : {

"keyword" : {

"type" : "keyword",

"ignore_above" : 256

}

}

}

}

}

}

}



<br/>
## 能否更改 Mapping 的字段类型
分两种情况:
* 新增加字段
- Dynamic设置为true时,一旦有新增字段的文档写入,Mapping也同时被更新;
- Dynamic设置为false时,Mapping不会被更新,新增字段的数据无法被索引,但是信息会出现在_source中;
- Dynamic设置为strict时,文档写入失败;
* 对已有字段,一旦已有数据写入,就不在支持修改字段定义
- Lucene实现的倒排索引,一旦生成后,就不允许修改
- 如果希望修改字段类型,必须Reindex API,重建索引
- 如果修改了字段的数据类型,会导致已被索引的数据无法被搜索 <br/>
## 控制Dynamic Mappings
dynamic|true|false|strict
---|---|---|---
文档可索引|YES|YES|NO
字段可索引|YES|NO|NO
Mapping被更新|YES|NO|NO
<br/>
* 当dynamic被设置成false时,存在新增字段数据写入,该数据可以被索引,但新增字段被丢弃
* 当dynamic被设置成strict时,数据写入直接出错

1.默认Mapping支持dynamic,写入的文档中加入新的字段

PUT dynamic_mapping_test/_doc/1

{

"newField":"someValue"

}


2.该字段可以被搜索,数据也在_source中出现

POST dynamic_mapping_test/_search

{

"query":{

"match":{

"newField":"someValue"

}

}

}

返回结果:

{

"took" : 5,

"timed_out" : false,

"_shards" : {

"total" : 1,

"successful" : 1,

"skipped" : 0,

"failed" : 0

},

"hits" : {

"total" : {

"value" : 1,

"relation" : "eq"

},

"max_score" : 0.2876821,

"hits" : [

{

"_index" : "dynamic_mapping_test",

"_type" : "_doc",

"_id" : "1",

"_score" : 0.2876821,

"_source" : {

"newField" : "someValue"

}

}

]

}

}


3.修改为dynamic false

PUT dynamic_mapping_test/_mapping

{

"dynamic": false

}


4.新增 anotherField

PUT dynamic_mapping_test/_doc/10

{

"anotherField":"someValue"

}


5.该字段不可以被搜索,因为dynamic已经被设置为false

POST dynamic_mapping_test/_search

{

"query":{

"match":{

"anotherField":"someValue"

}

}

}

返回结果:

{

"took" : 657,

"timed_out" : false,

"_shards" : {

"total" : 1,

"successful" : 1,

"skipped" : 0,

"failed" : 0

},

"hits" : {

"total" : {

"value" : 0,

"relation" : "eq"

},

"max_score" : null,

"hits" : [ ]

}

}


6.修改为strict

PUT dynamic_mapping_test/_mapping

{

"dynamic": "strict"

}


7.写入数据出错,HTTP Code 400

PUT dynamic_mapping_test/_doc/12

{

"lastField":"value"

}

返回结果:

{

"error": {

"root_cause": [

{

"type": "strict_dynamic_mapping_exception",

"reason": "mapping set to strict, dynamic introduction of [lastField] within [_doc] is not allowed"

}

],

"type": "strict_dynamic_mapping_exception",

"reason": "mapping set to strict, dynamic introduction of [lastField] within [_doc] is not allowed"

},

"status": 400

}



<br/>
## 如何定义一个 Mapping

PUT index_name

{

"mappings":{

"properties":{

//define your mappings here

}

}

}


* 可以参考API手册,纯手写;
* 为了减少输入的工作量,减少出错概率,可以依照以下步骤:
- 创建一个临时的index,写入一些样本数据;
- 通过访问Mapping API获取该临时文件的动态Mapping定义;
- 修改后,使用该配置创建你的索引
- 删除临时索引 <br/>
## Mapping的一些配置
* ` index`控制当前字段是否被索引,默认为`true`。如果设置成`false`,该字段不可被搜索。
* `index_options`可以控制倒排索引记录的内容,有四种不同级别的配置:
- `docs`记录 doc id
- `freqs`记录 doc id / term frequencies
- `positions`记录 doc id / term frequencies / term position
- `offects`记录 doc id / term frequencies / term position / character offects
* Text类型默认记录`positions`,其他默认为 `docs`。记录的类容越多,占用存储空间越大。
* ` null_value`控制需要对Null值实现搜索;只有Keyword类型支持设定null_value。
* ` copy_to`满足一些特定的搜索需求,` copy_to`将字段的数值拷贝到目标字段,实现类似`_all`的作用,`_all`在ES7中被` copy_to`所替代,` copy_to`的目标字段不出现在_source中。
* Elasticsearch中不提供专门的数组类型。但是任何字段,都可以包含多个相同类类型的数值。

1.设置 index 为 false

DELETE users

PUT users

{

"mappings" : {

"properties" : {

"firstName" : {

"type" : "text"

},

"lastName" : {

"type" : "text"

},

"mobile" : {

"type" : "text",

"index": false

}

}

}

}

插入数据

PUT users/_doc/1

{

"firstName":"Ruan",

"lastName": "Yiming",

"mobile": "12345678"

}

查询

POST /users/_search

{

"query": {

"match": {

"mobile":"12345678" #该字段不可被搜索

}

}

}

查询返回结果:

{

"error": {

"root_cause": [

{

"type": "query_shard_exception",

"reason": "failed to create query: {\n "match" : {\n "mobile" : {\n "query" : "12345678",\n "operator" : "OR",\n "prefix_length" : 0,\n "max_expansions" : 50,\n "fuzzy_transpositions" : true,\n "lenient" : false,\n "zero_terms_query" : "NONE",\n "auto_generate_synonyms_phrase_query" : true,\n "boost" : 1.0\n }\n }\n}",

"index_uuid": "1oB9dwY2TPq-9QjiaMaU7g",

"index": "users"

}

],

"type": "search_phase_execution_exception",

"reason": "all shards failed",

"phase": "query",

"grouped": true,

"failed_shards": [

{

"shard": 0,

"index": "users",

"node": "u-4S1mfbQiuA1Bqe-wfPJQ",

"reason": {

"type": "query_shard_exception",

"reason": "failed to create query: {\n "match" : {\n "mobile" : {\n "query" : "12345678",\n "operator" : "OR",\n "prefix_length" : 0,\n "max_expansions" : 50,\n "fuzzy_transpositions" : true,\n "lenient" : false,\n "zero_terms_query" : "NONE",\n "auto_generate_synonyms_phrase_query" : true,\n "boost" : 1.0\n }\n }\n}",

"index_uuid": "1oB9dwY2TPq-9QjiaMaU7g",

"index": "users",

"caused_by": {

"type": "illegal_argument_exception",

"reason": "Cannot search on field [mobile] since it is not indexed." #错误原因

}

}

}

]

},

"status": 400

}


设定Null_value

DELETE users

PUT users

{

"mappings" : {

"properties" : {

"firstName" : {

"type" : "text"

},

"lastName" : {

"type" : "text"

},

"mobile" : {

"type" : "keyword",

"null_value": "NULL"

}

  }
}

}

插入数据

PUT users/_doc/1

{

"firstName":"Ruan",

"lastName": "Yiming",

"mobile": null

}

插入数据

PUT users/_doc/2

{

"firstName":"Ruan2",

"lastName": "Yiming2"

}

查询

GET users/_search

{

"query": {

"match": {

"mobile":"NULL"

}

}

}

查询返回结果:

{

"took" : 1,

"timed_out" : false,

"_shards" : {

"total" : 1,

"successful" : 1,

"skipped" : 0,

"failed" : 0

},

"hits" : {

"total" : {

"value" : 1,

"relation" : "eq"

},

"max_score" : 0.2876821,

"hits" : [

{

"_index" : "users",

"_type" : "_doc",

"_id" : "1",

"_score" : 0.2876821,

"_source" : {

"firstName" : "Ruan",

"lastName" : "Yiming",

"mobile" : null

}

}

]

}

}

设置 Copy to

DELETE users

PUT users

{

"mappings": {

"properties": {

"firstName":{

"type": "text",

"copy_to": "fullName"

},

"lastName":{

"type": "text",

"copy_to": "fullName"

}

}

}

}

插入数据

PUT users/_doc/1

{

"firstName":"Ruan",

"lastName": "Yiming"

}

查询方法1

GET users/_search?q=fullName:(Ruan Yiming)

查询方法2

POST users/_search

{

"query": {

"match": {

"fullName":{

"query": "Ruan Yiming",

"operator": "and"

}

}

}

}

查询返回结果:

{

"took" : 1,

"timed_out" : false,

"_shards" : {

"total" : 1,

"successful" : 1,

"skipped" : 0,

"failed" : 0

},

"hits" : {

"total" : {

"value" : 1,

"relation" : "eq"

},

"max_score" : 0.5753642,

"hits" : [

{

"_index" : "users",

"_type" : "_doc",

"_id" : "1",

"_score" : 0.5753642,

"_source" : {

"firstName" : "Ruan",

"lastName" : "Yiming"

}

}

]

}

}


数组类型

PUT users/_doc/1

{

"name":"twobirds",

"interests":["reading","music"]

}

GET users/_mapping

返回Mapping结果:

{

"users" : {

"mappings" : {

"properties" : {

"firstName" : {

"type" : "text",

"copy_to" : [

"fullName"

]

},

"fullName" : {

"type" : "text",

"fields" : {

"keyword" : {

"type" : "keyword",

"ignore_above" : 256

}

}

},

"interests" : {

"type" : "text", #数组类型,根据数组里数据类型配置

"fields" : {

"keyword" : {

"type" : "keyword",

"ignore_above" : 256

}

}

},

"lastName" : {

"type" : "text",

"copy_to" : [

"fullName"

]

},

"name" : {

"type" : "text",

"fields" : {

"keyword" : {

"type" : "keyword",

"ignore_above" : 256

}

}

}

}

}

}

}



<br/>