Elastic Search笔记

1.简介
2.概念和工具使用
- 2.1 基本概念
- 2.2 使用kibana
3.操作索引和数据
4. 搜索
5. 聚合

1.简介

Elastic Search是一个分布式的全文检索工具，可以用在商城中检索商品信息等。

接下来介绍本文需要用的三个工具，这三个工具版本号要相等，我选用的全部是6.8版本。

Elastic Search本体

linux版下载路径：https://www.elastic.co/cn/downloads/past-releases/elasticsearch-6-8-0
kibana

一个强大的可视化工具，基于Node.js，可以用来发送Rest请求，搜索数据可视化等。

下载路径(win版)：https://www.elastic.co/cn/downloads/past-releases/kibana-6-8-0
ik分词器

分词工具，可以把“我是中国人”分成["我","是","中国人","中国","国人"]，即把一段话分成一个个词语，

在用户在搜索句子时匹配对应的分词。

下载路径：https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v6.8.0

安装过程不再赘述，注意一点：ES的客户端端口号默认是9200，集群节点通信端口是9300，安装完后要打开linux防火墙上的对应端口。

2.概念和工具使用

2.1 基本概念

阮一峰的博客里面讲的比较好：http://www.ruanyifeng.com/blog/2017/08/elasticsearch.html

我自己的理解

ES的存储结构：索引下面包含许多文档，每个文档就是一条数据，类型把文档逻辑分组。

搜索就是根据过滤条件查询文档的过程。

我对类型的概念不是特别理解，接下来实践后再说，另外这个概念在ES7以后要被废除掉。

除了Node、Cluster、Index、Document、Type这几个概念，还有分片（shard）和副本（replica）两个概念。

分片就是把整体的数据切割成几片，目的是为了在数据量大的时候分流；

副本是每个分片的备份，如果出现意外导致数据丢失，还能指望一下备份。

下图表示：3个分片，每个分片有1个副本。

Elastic Search笔记

2.2 使用kibana

ES的API都是Rest风格的，请求和响应都是json格式。

kibana中提供了开发工具，可以很方便地发送请求，接收数据，还有语法提示：

Elastic Search笔记

接下来的演示都是在kibana中操作的。

3.操作索引和数据

2.3 索引

2.3.1 创建索引

使用PUT请求创建一个名叫test_index的索引，有3个分片，2个副本。

PUT test_index

{

  "settings": {

    "number_of_shards": 3,

    "number_of_replicas": 2

  }

}

图示：

Elastic Search笔记

2.3.2 查看索引设置

使用 GET 索引名即可查询

GET test_index

结果图示，可以看到创建时间，分片和副本信息等。

Elastic Search笔记

2.3.3 删除索引

使用 DELETE 索引名即可删除索引

DELETE test_index

2.4 索引映射到文档

2.4.1 创建映射

语法

PUT /索引库名/_mapping/类型名称

{

  "properties": {

    "字段名": {

      "type": "类型",

      "index": true，

      "store": true，

      "analyzer": "分词器"

    }

  }

}

类型名称：相当于把文档逻辑分组。
字段名：文档中的字段名，比如title、price等等。
type：字段类型，比如text、long、integer、object等等。
index：是否索引，默认为true
store：是否存储，默认为false
analyzer：分词器类型。

示例：

PUT test_index/_mapping/goods

{

  "properties": {

    "title": {

      "type": "text",

      "analyzer": "ik_max_word"

    },

    "images": {

      "type": "keyword",

      "index": "false"

    },

    "price": {

      "type": "float"

    }

  }

}

2.4.2 查看映射

发送请求

GET /test_index/_mapping

得到结果

{

  "test_index" : {

    "mappings" : {

      "goods" : {

        "properties" : {

          "images" : {

            "type" : "keyword",

            "index" : false

          },

          "price" : {

            "type" : "float"

          },

          "title" : {

            "type" : "text",

            "analyzer" : "ik_max_word"

          }

        }

      }

    }

  }

}

2.4.3 字段属性详解

2.4.3.1 type

String类型，又分两种：
- text：可分词，不可参与聚合
- keyword：不可分词，数据会作为完整字段进行匹配，可以参与聚合
Numerical：数值类型，分两类
- 基本数据类型：long、integer、short、byte、double、float、half_float
- 浮点数的高精度类型：scaled_float
  - 需要指定一个精度因子，比如10或100。ES会把真实值乘以这个因子后存储，取出时再还原。
Date：日期类型

ES可以对日期格式化为字符串存储，但是建议我们存储为毫秒值，存储为long，节省空间。
如果是对象

比如{girl:{name:"rose", age:21}}，会处理成两个字段girl.name,girl.age

2.4.3.2 index

index影响字段的索引情况。

true：字段会被索引，则可以用来进行搜索。默认值就是true
false：字段不会被索引，不能用来搜索

index的默认值就是true，也就是说你不进行任何配置，所有字段都会被索引。

但是有些字段是我们不希望被索引的，比如商品的图片信息，就需要手动设置index为false。

值得注意的一个问题，不能用来搜索的字段，存在ES中用来干嘛呢？

是不是就把ES当成数据库了，查出来的数据要直接能用。

2.4.3.3 store

Elasticsearch在创建文档索引时，会将文档中的原始数据备份，保存到一个叫做_source的属性中。而且我们可以通过过滤_source来选择哪些要显示，哪些不显示。

而如果设置store为true，就会在_source以外额外存储一份数据，多余，因此一般我们都会将store设置为false，事实上，store的默认值就是false。

2.5 新增数据

格式如下，如果不定义ID，则会创建一个随机ID

POST /索引/类型/ID

{

    文档内容

}

示例

POST /test_index/goods/1

{

    "title":"小米手机",

    "images":"http://image.leyou.com/12479122.jpg",

    "price":2699.00

}

2.5.1 智能添加字段

如果添加的数据中有未定义的字段，ES会自动添加

POST /test_index/goods/100

{

    "title":"超米手机",

    "images":"http://image.leyou.com/12479122.jpg",

    "price":2899.00,

    "stock": 200,

    "saleable":true

}

添加后查询结果如图所示：

添加的字段不会影响到其他已经存在的数据的_source

Elastic Search笔记

但是索引的映射结构会变化

存入的数据包含什么字段，_source就会包含什么字段。

Elastic Search笔记

2.6 修改数据

发送方式改为PUT，指定Id即可修改数据

id对应文档存在，则修改
id对应文档不存在，则新增

示例

PUT /test_index/goods/100

{

    "title":"超级大米手机",

    "images":"http://image.leyou.com/12479122.jpg",

    "price":2899.00,

    "stock": 200

}

2.7 删除数据

语法

DELETE /索引库名/类型名/id值

例子

DELETE /test_index/goods/3

4. 搜索

先存点数据：

POST /test_index/goods/1

{

    "title":"小米手机",

    "images":"http://image.leyou.com/12479122.jpg",

    "price":2699.00

}

POST /test_index/goods/2

{

    "title":"大米手机",

    "images":"http://image.leyou.com/12479122.jpg",

    "price":2899.00

}

PUT /test_index/goods/3

{

    "title":"小米电视4A",

    "images":"http://image.leyou.com/12479122.jpg",

    "price":3899.00

}

接下来开始重头戏：花式查询

4.1 查询

4.1.1 match_all(查询所有)

GET /test_index/_search

{

    "query":{

        "match_all": {}

    }

}

结果

{

  "took" : 0,

  "timed_out" : false,

  "_shards" : {

    "total" : 3,

    "successful" : 3,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 3,

    "max_score" : 1.0,

    "hits" : [

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "2",

        "_score" : 1.0,

        "_source" : {

          "title" : "大米手机",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 2899.0

        }

      },

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "1",

        "_score" : 1.0,

        "_source" : {

          "title" : "小米手机",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 2699.0

        }

      },

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "3",

        "_score" : 1.0,

        "_source" : {

          "title" : "小米电视4A",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 3899.0

        }

      }

    ]

  }

}

注意结果里面的_score,这是文档相关性得分，得分越高说明越符合搜索条件。

4.1.2 match(匹配查询)

单字段查询：OR关系

把小米电视分成小米和电视两个词分别查询，多个词语的查询条件是or的关系

相当于title like '%小米%' or title like '%电视%'

关键词命中越多，搜索得分越高，结果越靠前

GET /test_index/_search

{

    "query":{

        "match":{

            "title":"小米电视"

        }

    }

}

结果

{

  "took" : 0,

  "timed_out" : false,

  "_shards" : {

    "total" : 3,

    "successful" : 3,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 2,

    "max_score" : 0.77041245,

    "hits" : [

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "3",

        "_score" : 0.77041245,

        "_source" : {

          "title" : "小米电视4A",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 3899.0

        }

      },

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "1",

        "_score" : 0.21110918,

        "_source" : {

          "title" : "小米手机",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 2699.0

        }

      }

    ]

  }

}

单字段查询：AND关系

小米和电视两个词，查询条件用and组合起来，相当于title like '%小米%' and title like '%电视%'

GET /test_index/_search

{

    "query":{

        "match": {

          "title": {

            "query": "小米电视",

            "operator": "and"

          }

        }

    }

}

结果

{

  "took" : 1,

  "timed_out" : false,

  "_shards" : {

    "total" : 3,

    "successful" : 3,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 1,

    "max_score" : 0.77041245,

    "hits" : [

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "3",

        "_score" : 0.77041245,

        "_source" : {

          "title" : "小米电视4A",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 3899.0

        }

      }

    ]

  }

}

单字段查询：匹配度

“小米曲面电视” 在ik_max_word的设置下，会被分为小米、曲面、电视三个词。

如果需要查到能够匹配其中两个词语的结果，设置匹配度>=(2/3)即可。

实验表明，设置67%的查询结果为“小米电视4A”；66%的查询结果为“小米电视4A”和“小米手机”

GET /test_index/_search

{

    "query":{

        "match":{

            "title":{

            	"query":"小米曲面电视",

            	"minimum_should_match": "67%"

            }

        }

    }

}

结果

{

  "took" : 1,

  "timed_out" : false,

  "_shards" : {

    "total" : 3,

    "successful" : 3,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 1,

    "max_score" : 0.77041245,

    "hits" : [

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "3",

        "_score" : 0.77041245,

        "_source" : {

          "title" : "小米电视4A",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 3899.0

        }

      }

    ]

  }

}

4.1.3 multi_match(多字段查询)

在title和subTitle两个字段都匹配

GET /test_index/_search

{

    "query":{

        "multi_match": {

            "query":    "小米",

            "fields":   [ "title", "subTitle" ]

        }

	}

}

4.1.4 term(精确匹配)

查询price=2699.00的数据

GET /test_index/_search

{

    "query":{

        "term":{

            "price":2699.00

        }

    }

}

4.1.5 terms(多词条精确匹配)

查询price=数组中的任何一个数字的结果

GET /test_index/_search

{

    "query":{

        "terms":{

            "price":[2699.00,2899.00,3899.00]

        }

    }

}

4.1.6 bool(布尔查询)

must 与
must_not 非
should 或

下面的查询是要找：title字段中必须包含“大米”，必须不包含“电视”，可以包含“手机”的结果。

GET /test_index/_search

{

    "query":{

        "bool":{

        	"must":     { "match": { "title": "大米" }},

        	"must_not": { "match": { "title":  "电视" }},

        	"should":   { "match": { "title": "手机" }}

        }

    }

}

结果

{

  "took" : 0,

  "timed_out" : false,

  "_shards" : {

    "total" : 3,

    "successful" : 3,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 1,

    "max_score" : 0.5753642,

    "hits" : [

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "2",

        "_score" : 0.5753642,

        "_source" : {

          "title" : "大米手机",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 2899.0

        }

      }

    ]

  }

}

4.1.7 range(范围查询)

一般是数值和时间范围的查询

操作符	说明
gt	大于
gte	大于等于
lt	小于
lte	小于等于

例子：查询price>=1000 and price < 2800的结果

GET /test_index/_search

{

    "query":{

        "range": {

            "price": {

                "gte":  1000.0,

                "lt":   2800.00

            }

    	}

    }

}

4.1.8 fuzzy(模糊查询)

允许输入内容有些偏差，但还能返回正确的结果：比如输入了appla却能够查到apple

例子：

新增商品“apple手机”

POST /test_index/goods/4

{

    "title":"apple手机",

    "images":"http://image.leyou.com/12479122.jpg",

    "price":6899.00

}

模糊查询，设置偏移量为2，即偏差<=2

GET /test_index/_search

{

  "query": {

    "fuzzy": {

        "title": {

            "value":"appla",

            "fuzziness":2

        }

    }

  }

}

结果是能找到apple手机。

4.1.9 结果字段的显示

指定返回结果的字段为title和price

GET /test_index/_search

{

  "_source": ["title","price"],

  "query": {

    "term": {

      "price": 2699

    }

  }

}

返回结果

{

  "took" : 0,

  "timed_out" : false,

  "_shards" : {

    "total" : 3,

    "successful" : 3,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 1,

    "max_score" : 1.0,

    "hits" : [

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "1",

        "_score" : 1.0,

        "_source" : {

          "price" : 2699.0,

          "title" : "小米手机"

        }

      }

    ]

  }

}

指定includes和excludes

includes指定包含的字段，excludes指定要排除的字段。

例如

GET /test_index/_search

{

  "_source": {

    "includes":["title","price"]

  },

  "query": {

    "term": {

      "price": 2699

    }

  }

}

GET /test_index/_search

{

  "_source": {

    "excludes": ["images"]

  },

  "query": {

    "term": {

      "price": 2699

    }

  }

}

4.2 过滤

查询和过滤有何区别？

参考这篇博客:[https://blog.csdn.net/en_joker/article/details/78017306

所有的查询都会影响到文档的评分及排名。如果我们需要在查询结果中进行过滤，并且不希望过滤条件影响评分，那么就不要把过滤条件作为查询条件来用。而是使用filter方式：

GET /test_index/_search

{

    "query":{

        "bool":{

        	"must":{ "match": { "title": "小米手机" }},

        	"filter":{

                "range":{"price":{"gt":2000.00,"lt":3800.00}}

        	}

        }

    }

}

注意：filter中还可以再次进行bool组合条件过滤。

如果一次查询只有过滤，没有查询条件，不希望进行评分，我们可以使用constant_score取代只有 filter 语句的 bool 查询。在性能上是完全相同的，但对于提高查询简洁性和清晰度有很大帮助。

GET /test_index/_search

{

    "query":{

        "constant_score":   {

            "filter": {

            	 "range":{"price":{"gt":2000.00,"lt":3000.00}}

            }

        }

}

查询相比于过滤，最重要的特点是：关注相关性

4.3 排序

4.3.1 单字段排序

GET /test_index/_search

{

  "query": {

    "match": {

      "title": "小米手机"

    }

  },

  "sort": [

    {

      "price": {

        "order": "desc"

      }

    }

  ]

}

结果

{

  "took" : 3,

  "timed_out" : false,

  "_shards" : {

    "total" : 3,

    "successful" : 3,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 4,

    "max_score" : null,

    "hits" : [

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "4",

        "_score" : null,

        "_source" : {

          "title" : "apple手机",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 6899.0

        },

        "sort" : [

          6899.0

        ]

      },

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "3",

        "_score" : null,

        "_source" : {

          "title" : "小米电视4A",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 3899.0

        },

        "sort" : [

          3899.0

        ]

      },

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "2",

        "_score" : null,

        "_source" : {

          "title" : "大米手机",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 2899.0

        },

        "sort" : [

          2899.0

        ]

      },

      {

        "_index" : "test_index",

        "_type" : "goods",

        "_id" : "1",

        "_score" : null,

        "_source" : {

          "title" : "小米手机",

          "images" : "http://image.leyou.com/12479122.jpg",

          "price" : 2699.0

        },

        "sort" : [

          2699.0

        ]

      }

    ]

  }

}

4.3.2 多字段排序

查询结果先按照价格排序，再按照相关性得分排序

GET /test_index/_search

{

    "query":{

        "bool":{

        	"must":{ "match": { "title": "小米手机" }},

        	"filter":{

                "range":{"price":{"gt":2,"lt":300000}}

        	}

        }

    },

    "sort": [

      { "price": { "order": "desc" }},

      { "_score": { "order": "desc" }}

    ]

}

结果不再展示

5. 聚合

聚合可以让我们极其方便的实现对数据的统计、分析。例如：

什么品牌的手机最受欢迎？
这些手机的平均价格、最高价格、最低价格？
这些手机每月的销售情况如何？

实现这些统计功能的比数据库的sql要方便的多，而且查询速度非常快，可以实现实时搜索效果。

5.1 基本概念

桶（bucket）

桶的作用，是按照某种方式对数据进行分组，每一组数据在ES中称为一个桶，例如我们根据国籍对人划分，可以得到中国桶、英国桶，日本桶……或者我们按照年龄段对人进行划分：0_10,1020,20_30,3040等。

Elasticsearch中提供的划分桶的方式有很多：

Date Histogram Aggregation：根据日期阶梯分组，例如给定阶梯为周，会自动每周分为一组
Histogram Aggregation：根据数值阶梯分组，与日期类似
Terms Aggregation：根据词条内容分组，词条内容完全匹配的为一组
Range Aggregation：数值和日期的范围分组，指定开始和结束，然后按段分组
……

度量（metrics）

分组完成以后，我们一般会对组中的数据进行聚合运算，例如求平均值、最大、最小、求和等，这些在ES中称为度量

比较常用的一些度量聚合方式：

Avg Aggregation：求平均值
Max Aggregation：求最大值
Min Aggregation：求最小值
Percentiles Aggregation：求百分比
Stats Aggregation：同时返回avg、max、min、sum、count等
Sum Aggregation：求和
Top hits Aggregation：求前几
Value Count Aggregation：求总数
……

注意：在ES中，需要进行聚合、排序、过滤的字段其处理方式比较特殊，不能被分词。比如字符串的类型必须为keyword，而不是text，因为text能被分词。

5.2 导入数据

导入汽车销售统计数据

先创建索引

PUT /cars

{

  "settings": {

    "number_of_shards": 1,

    "number_of_replicas": 0

  },

  "mappings": {

    "transactions": {

      "properties": {

        "color": {

          "type": "keyword"

        },

        "make": {

          "type": "keyword"

        }

      }

    }

  }

}

批量导入数据

POST /cars/transactions/_bulk

{ "index": {}}

{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }

{ "index": {}}

{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }

{ "index": {}}

{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }

{ "index": {}}

{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }

{ "index": {}}

{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }

{ "index": {}}

{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }

{ "index": {}}

{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }

{ "index": {}}

{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }

5.3 聚合为桶

下面的例子演示：统计每种颜色的汽车销量。

GET /cars/_search

{

    "size" : 0,

    "aggs" : {

        "popular_colors" : {

            "terms" : {

              "field" : "color"

            }

        }

    }

}

size：查询条数，这里设置为0，因为我们不关心搜索到的数据，只关心聚合结果，提高效率
aggs：声明这是一个聚合查询，是aggregations的缩写
- popular_colors：给这次聚合起一个名字，任意。
  - terms：划分桶的方式，这里是根据词条划分
    - field：划分桶的字段

结果

{

  "took" : 0,

  "timed_out" : false,

  "_shards" : {

    "total" : 1,

    "successful" : 1,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 8,

    "max_score" : 0.0,

    "hits" : [ ]

  },

  "aggregations" : {

    "popular_colors" : {

      "doc_count_error_upper_bound" : 0,

      "sum_other_doc_count" : 0,

      "buckets" : [

        {

          "key" : "red",

          "doc_count" : 4

        },

        {

          "key" : "blue",

          "doc_count" : 2

        },

        {

          "key" : "green",

          "doc_count" : 2

        }

      ]

    }

  }

}

hits：查询结果为空，因为我们设置了size为0
aggregations：聚合的结果
popular_colors：我们定义的聚合名称
buckets：查找到的桶，每个不同的color字段值都会形成一个桶
- key：这个桶对应的color字段的值
- doc_count：这个桶中的文档数量

观察结果可以发现红色小车最畅销。

5.4 桶内度量

5.3中只是对数据进行了聚合操作，但通常在聚合之后还要进行度量，比如查询每种颜色的车的价格平均值。

发送请求

GET /cars/_search

{

    "size" : 0,

    "aggs" : {

        "popular_colors" : {

            "terms" : {

              "field" : "color"

            },

            "aggs":{

                "avg_price": {

                   "avg": {

                      "field": "price"

                   }

                }

            }

        }

    }

}

得到结果

{

  "took" : 0,

  "timed_out" : false,

  "_shards" : {

    "total" : 1,

    "successful" : 1,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 8,

    "max_score" : 0.0,

    "hits" : [ ]

  },

  "aggregations" : {

    "popular_colors" : {

      "doc_count_error_upper_bound" : 0,

      "sum_other_doc_count" : 0,

      "buckets" : [

        {

          "key" : "red",

          "doc_count" : 4,

          "avg_price" : {

            "value" : 32500.0

          }

        },

        {

          "key" : "blue",

          "doc_count" : 2,

          "avg_price" : {

            "value" : 20000.0

          }

        },

        {

          "key" : "green",

          "doc_count" : 2,

          "avg_price" : {

            "value" : 21000.0

          }

        }

      ]

    }

  }

}

5.5 桶嵌套桶

在5.4统计条件的基础上，增加聚合条件：查询每种颜色的汽车分别都是哪几个品牌。

GET /cars/_search

{

    "size" : 0,

    "aggs" : {

        "popular_colors" : {

            "terms" : {

              "field" : "color"

            },

            "aggs":{

                "avg_price": {

                   "avg": {

                      "field": "price"

                   }

                },

                "maker":{

                    "terms":{

                        "field":"make"

                    }

                }

            }

        }

    }

}

结果

{

  "took" : 1,

  "timed_out" : false,

  "_shards" : {

    "total" : 1,

    "successful" : 1,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 8,

    "max_score" : 0.0,

    "hits" : [ ]

  },

  "aggregations" : {

    "popular_colors" : {

      "doc_count_error_upper_bound" : 0,

      "sum_other_doc_count" : 0,

      "buckets" : [

        {

          "key" : "red",

          "doc_count" : 4,

          "maker" : {

            "doc_count_error_upper_bound" : 0,

            "sum_other_doc_count" : 0,

            "buckets" : [

              {

                "key" : "honda",

                "doc_count" : 3

              },

              {

                "key" : "bmw",

                "doc_count" : 1

              }

            ]

          },

          "avg_price" : {

            "value" : 32500.0

          }

        },

        {

          "key" : "blue",

          "doc_count" : 2,

          "maker" : {

            "doc_count_error_upper_bound" : 0,

            "sum_other_doc_count" : 0,

            "buckets" : [

              {

                "key" : "ford",

                "doc_count" : 1

              },

              {

                "key" : "toyota",

                "doc_count" : 1

              }

            ]

          },

          "avg_price" : {

            "value" : 20000.0

          }

        },

        {

          "key" : "green",

          "doc_count" : 2,

          "maker" : {

            "doc_count_error_upper_bound" : 0,

            "sum_other_doc_count" : 0,

            "buckets" : [

              {

                "key" : "ford",

                "doc_count" : 1

              },

              {

                "key" : "toyota",

                "doc_count" : 1

              }

            ]

          },

          "avg_price" : {

            "value" : 21000.0

          }

        }

      ]

    }

  }

}

可以看出来，红色车里面本田车最多。

5.6 其他划分桶的方式

前面是根据词条内容划分桶，还有很多其他的分桶方式，比如

Histogram（柱状图）分桶

直方图的X轴是按照固定间隔分开的，因此我们需要一个阶梯值（interval）来指定这个固定间隔。

示例

对汽车价格进行分组，指定间隔为5000，并且不显示统计数量为0的桶。

GET /cars/_search

{

  "size":0,

  "aggs":{

    "price":{

      "histogram": {

        "field": "price",

        "interval": 5000,

        "min_doc_count": 1

      }

    }

  }

}

结果

{

  "took" : 0,

  "timed_out" : false,

  "_shards" : {

    "total" : 1,

    "successful" : 1,

    "skipped" : 0,

    "failed" : 0

  },

  "hits" : {

    "total" : 8,

    "max_score" : 0.0,

    "hits" : [ ]

  },

  "aggregations" : {

    "price" : {

      "buckets" : [

        {

          "key" : 10000.0,

          "doc_count" : 2

        },

        {

          "key" : 15000.0,

          "doc_count" : 1

        },

        {

          "key" : 20000.0,

          "doc_count" : 2

        },

        {

          "key" : 25000.0,

          "doc_count" : 1

        },

        {

          "key" : 30000.0,

          "doc_count" : 1

        },

        {

          "key" : 80000.0,

          "doc_count" : 1

        }

      ]

    }

  }

}

价格统计为：

[10000, 15000) : 2个
[15000, 20000) : 1个
[20000, 25000) : 2个
[25000, 30000) : 1个
[30000, 80000) : 1个
[80000, ...) : 1个

Elastic Search笔记

1.简介

2.概念和工具使用

2.1 基本概念

2.2 使用kibana

3.操作索引和数据

2.3 索引

2.3.1 创建索引

2.3.2 查看索引设置

2.3.3 删除索引

2.4 索引映射到文档

2.4.1 创建映射

2.4.2 查看映射

2.4.3 字段属性详解

2.4.3.1 type

2.4.3.2 index

2.4.3.3 store

2.5 新增数据

2.5.1 智能添加字段

2.6 修改数据

2.7 删除数据

4. 搜索

4.1 查询

4.1.1 match_all(查询所有)

4.1.2 match(匹配查询)

4.1.3 multi_match(多字段查询)

4.1.4 term(精确匹配)

4.1.5 terms(多词条精确匹配)

4.1.6 bool(布尔查询)

4.1.7 range(范围查询)

4.1.8 fuzzy(模糊查询)

4.1.9 结果字段的显示

4.2 过滤

4.3 排序

4.3.1 单字段排序

4.3.2 多字段排序

5. 聚合

5.1 基本概念

5.2 导入数据

5.3 聚合为桶

5.4 桶内度量

5.5 桶嵌套桶

5.6 其他划分桶的方式

相关文章