Elasticsearch由浅入深（七）搜索引擎：_search含义、_multi-index搜索模式、分页搜索以及深分页性能问题、query string search语法以及_all metadata原理

_search含义

_search查询返回结果数据含义分析

GET _search

{

  "took": ,

  "timed_out": false,

  "_shards": {

    "total": ,

    "successful": ,

    "failed":

  },

  "hits": {

    "total": ,

    "max_score": ,

    "hits": [

      {

        "_index": ".kibana",

        "_type": "config",

        "_id": "5.2.0",

        "_score": ,

        "_source": {

          "buildNum":

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "AWypxxLYFCl_S-ox4wvd",

        "_score": ,

        "_source": {

          "test_content": "my test"

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": ,

        "_source": {

          "test_field": "test client 2"

        }

      },

      {

        "_index": "test_index",

        "_type": "test_doc",

        "_id": "",

        "_score": ,

        "_source": {

          "test_field": "test10 routing _id"

        }

      },

      {

        "_index": "test_index",

        "_type": "test_doc",

        "_id": "",

        "_score": ,

        "_routing": "",

        "_source": {

          "test_field": "test routing not _id"

        }

      },

      {

        "_index": "ecommerce",

        "_type": "product",

        "_id": "",

        "_score": ,

        "_source": {

          "name": "jiajieshi yagao",

          "desc": "youxiao fangzhu",

          "price": ,

          "producer": "jiajieshi producer",

          "tags": [

            "fangzhu"

          ]

        }

      },

      {

        "_index": "ecommerce",

        "_type": "product",

        "_id": "",

        "_score": ,

        "_source": {

          "name": "special yagao",

          "desc": "special meibai",

          "price": ,

          "producer": "special yagao producer",

          "tags": [

            "meibai"

          ]

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": ,

        "_source": {

          "test_field": "test test"

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": ,

        "_source": {

          "test_field": "test4"

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": ,

        "_source": {

          "test_field": "replaces test2"

        }

      }

    ]

  }

}

took: 整个搜索请求花费了多少毫秒
timed_out:表示请求是否超时
hits:total:value表示返回结果的总数，relation表示关系例如一般是eq表示相等
hits:max_score: 表示本次搜索的所有结果中，最大的相关度分数是多少，每一条document对于search的相关度，越相关，_score分数就越大，排位就越靠前
hits:hits：表示查询出来document的结果集合
shards:total表示打到的所有分片，
shards:successful表示打到的分片中查询成功的分片,
shards:skipped表示打到的分片中跳过的分片,
shards:failed表示打到的分片中查询失败的分片

search timeout机制

因为ES默认是没有timeout的，所以先描述一下场景假设我们有些搜索应用，对时间是很敏感的，比如电商网站，你不能让用户等个10分钟，如果那样的话，人家早就走了，不来买东西了。

于是我们就需要有timeout机制，指定每个shard,就只能在timeout时间范围内，将搜索到的部分数据（也可能全都搜索到了），直接返回给客户端，而不是等到所有数据全都搜索出来以后在返回。

这样就可以确保说，一次搜索请求可以在用户指定的timeout时长内完成，为一些时间敏感的搜索应用提供良好的支持。

注意：ES在默认情况下是没有所谓的timeout的，比如说如果你的搜索特别慢，每个shard都要花好几分钟才能查询出来所有的数据，那么你的搜索请求也会等待好几分钟之后才会返回。
下面画图简单描述一下timeout机制

Elasticsearch由浅入深（七）搜索引擎：_search含义、_multi-index搜索模式、分页搜索以及深分页性能问题、query string search语法以及_all metadata原理

语法：

GET _search?timeout=10ms

_multi-index&multi-type搜索模式

先说明一下，低版本的ES一个index是支持多type的，所以就有multi-type这一种搜索模式，这里不做详细讲解，因为和multi-index搜索模式是基本一样的。而且高版本的ES会弃用type。

multi-index搜索模式

/_search:所有索引下的所有数据都搜索出来
```
GET /_search
```
/{index}/_search：指定一个index,搜索这个索引下的所有数据
```
GET /test/_search
```
/index1,index2/_search:同时搜索两个索引下的数据
```
GET /test_index,test/_search
```
/1,2/_search: 通过通配符匹配多个索引，查询多个索引下的数据
```
GET /test*/_search
```
/_all/_search: 代表所有的index
```
GET /_all/_search
```

搜索原理浅析

当客户端发送查询请求到ES时，会把请求打到所有的primary shard上去执行，因为每个shard都包含部分数据，所有每个shard都可能会包含搜索请求的结果，但是如果primary shard有replica shard，那么请求也可以打到replica shard上去。
如下图所示：

Elasticsearch由浅入深（七）搜索引擎：_search含义、_multi-index搜索模式、分页搜索以及深分页性能问题、query string search语法以及_all metadata原理

分页搜索以及deep paging性能揭秘

在实际应用中，分页是必不可少的，例如，前端页面展示数据给用户往往都是分页进行展示的。

ES分页搜索

Elasticsearch分页搜索采用的是from+size。from表示查询结果的起始下标，size表示从起始下标开始返回文档的个数。
示例：

GET test_index/test_type/_search?from=&size=

{

  "took": ,

  "timed_out": false,

  "_shards": {

    "total": ,

    "successful": ,

    "failed":

  },

  "hits": {

    "total": ,

    "max_score": ,

    "hits": [

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "AWypxxLYFCl_S-ox4wvd",

        "_score": ,

        "_source": {

          "test_content": "my test"

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": ,

        "_source": {

          "test_field": "test client 2"

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": ,

        "_source": {

          "test_field": "test test"

        }

      }

    ]

  }

}

深分页性能问题

什么是深分页（deep paging）?简单来说，就是搜索的特别深，比如总共有60000条数据，三个primary shard,每个shard上分了20000条数据，每页是10条数据，这个时候，你要搜索到第1000页，实际上要拿到的是10001~10010。

注意这里千万不要理解成每个shard都是返回10条数据。这样理解是错误的！

下面做一下详细的分析：
请求首先可能是打到一个不包含这个index的shard的node上去，这个node就是一个协调节点coordinate node，那么这个coordinate node就会将搜索请求转发到index的三个shard所在的node上去。比如说我们之前说的情况下，要搜索60000条数据中的第1000页，实际上每个shard都要将内部的20000条数据中的第10001~10010条数据，拿出来，不是才10条，是10010条数据。3个shard的每个shard都返回10010条数据给协调节点coordinate node，coordinate node会收到总共30030条数据，然后在这些数据中进行排序，根据_score相关度分数，然后取到10001~10010这10条数据，就是我们要的第1000页的10条数据。
如下图所示：

Elasticsearch由浅入深（七）搜索引擎：_search含义、_multi-index搜索模式、分页搜索以及深分页性能问题、query string search语法以及_all metadata原理

deep paging问题就是说from + size分页太深，那么每个shard都要返回大量数据给coordinate node协调节点，会消耗大量的带宽，内存，CPU。

query string search语法以及_all metadata原理

query string基础语法

GET /test_index/test_type/_search?q=test_field:test

{

  "took": ,

  "timed_out": false,

  "_shards": {

    "total": ,

    "successful": ,

    "failed":

  },

  "hits": {

    "total": ,

    "max_score": 0.843298,

    "hits": [

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": 0.843298,

        "_source": {

          "test_field": "test test"

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": 0.43445712,

        "_source": {

          "test_field": "test client 2"

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": 0.25316024,

        "_source": {

          "test_field": "test client 1"

        }

      }

    ]

  }

}

GET /test_index/test_type/_search?q=+test_field:test

{

  "took": ,

  "timed_out": false,

  "_shards": {

    "total": ,

    "successful": ,

    "failed":

  },

  "hits": {

    "total": ,

    "max_score": 0.843298,

    "hits": [

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": 0.843298,

        "_source": {

          "test_field": "test test"

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": 0.43445712,

        "_source": {

          "test_field": "test client 2"

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": 0.25316024,

        "_source": {

          "test_field": "test client 1"

        }

      }

    ]

  }

}

GET /test_index/test_type/_search?q=-test_field:test

{

  "took": ,

  "timed_out": false,

  "_shards": {

    "total": ,

    "successful": ,

    "failed":

  },

  "hits": {

    "total": ,

    "max_score": ,

    "hits": [

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "AWypxxLYFCl_S-ox4wvd",

        "_score": ,

        "_source": {

          "test_content": "my test"

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": ,

        "_source": {

          "test_field": "test4"

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": ,

        "_source": {

          "test_field": "replaces test2"

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": ,

        "_source": {

          "test_field1": "test field1",

          "test_field2": "partial updated test1"

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": ,

        "_source": {

          "num": ,

          "tags": []

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": ,

        "_source": {

          "test_field": "test3"

        }

      }

    ]

  }

}

对于query string只要掌握q=field:search content的语法，以及+和-的含义

+：代表包含这个筛选条件结果
-：代表不包含这个筛选条件的结果

_all metadata

GET /test_index/test_type/_search?q=test

{

  "took": ,

  "timed_out": false,

  "_shards": {

    "total": ,

    "successful": ,

    "failed":

  },

  "hits": {

    "total": ,

    "max_score": 0.843298,

    "hits": [

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": 0.843298,

        "_source": {

          "test_field": "test test"

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "AWypxxLYFCl_S-ox4wvd",

        "_score": 0.3794414,

        "_source": {

          "test_content": "my test"

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": 0.31387395,

        "_source": {

          "test_field": "test client 2"

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": 0.18232156,

        "_source": {

          "test_field": "test client 1"

        }

      },

      {

        "_index": "test_index",

        "_type": "test_type",

        "_id": "",

        "_score": 0.16203022,

        "_source": {

          "test_field1": "test field1",

          "test_field2": "partial updated test1"

        }

      }

    ]

  }

}

也就是在使用query string的时候，如果不指定field，那么默认就是_all。_all元数据是在建立索引的时候产生的，我们插入一条document，它里面包含了多个field,此时ES会自动将多个field的值全部用字符串的方式串联起来，变成一个长的字符串。这个长的字符串就是_all field的值。同时建立索引。
举个例子：
对于一个document：

{

  "name": "jack",

  "age": ,

  "email": "jack@sina.com",

  "address": "guamgzhou"

}

那么"jack 26 jack@sina.com guamazhou",就会作为这个document的_all fieldd的值，同时进行分词后建立对应的倒排索引。
注意在生产环境中一般不会使用query string这种查询方式。

Elasticsearch由浅入深（七）搜索引擎：_search含义、_multi-index搜索模式、分页搜索以及深分页性能问题、query string search语法以及_all metadata原理的更多相关文章

ES 25 - Elasticsearch的分页查询及其深分页问题 (deep paging)
目录 1 分页查询方法 2 分页查询的deep paging问题 1 分页查询方法在GET请求中拼接from和size参数 // 查询10条数据, 默认从第0条数据开始 GET book_shop/ ...
Elasticsearch由浅入深（一）
什么是Elasticsearch 什么是搜索百度:我们比如说想找寻任何的信息的时候,就会上百度去搜索一下,比如说找一部自己喜欢的电影,或者说找一本喜欢的书,或者找一条感兴趣的新闻(提到搜索的第一印象 ...
Elasticsearch由浅入深（十）搜索引擎：相关度评分 TF&amp&semi;IDF算法、doc value正排索引、解密query、fetch phrase原理、Bouncing Results问题、基于scoll技术滚动搜索大量数据
相关度评分 TF&IDF算法 Elasticsearch的相关度评分(relevance score)算法采用的是term frequency/inverse document frequen ...
Elasticsearch由浅入深（九）搜索引擎：query DSL、filter与query、query搜索实战
search api的基本语法语法概要: GET /_search {} GET /index1,index2/type1,type2/_search {} GET /_search { , } h ...
ELK(elasticsearch+kibana+logstash)搜索引擎(二)： elasticsearch基础教程
1.elasticsearch的结构首先elasticsearch目前的结构为 /index/type/id id对应的就是存储的文档ID,elasticsearch一般将数据以JSON格式存储. ...
Elasticsearch笔记七之setting,mapping,分片查询方式
Elasticsearch笔记七之setting,mapping,分片查询方式 setting 通过setting可以更改es配置可以用来修改副本数和分片数. 1:查看,通过curl或浏览器可以看到副 ...
使用Node，Vue和ElasticSearch构建实时搜索引擎
(译者注:相关阅读:node.js,vue.js,Elasticsearch) 介绍 Elasticsearch是一个分布式的RESTful搜索和分析引擎,能够解决越来越多的用例. Elasticse ...
转 Solr vs&period; Elasticsearch谁是开源搜索引擎王者
转 https://www.cnblogs.com/xiaoqi/p/6545314.html Solr vs. Elasticsearch谁是开源搜索引擎王者当前是云计算和数据快速增长的时代,今天 ...
ElasticSearch（十五) &lowbar;search api 分页搜索及deep paging性能问题
1.分页搜索语法: size,from GET /_search?size=10 GET /_search?size=10&from=0 GET /_search?size=10&f ...

随机推荐

C++调用约定和名字约定
C++调用约定和名字约定转自http://www.cppblog.com/mzty/archive/2007/04/20/22349.html 调用约定:__cdecl __fastcall与 __ ...
Python 数据类型笔记
Python有以下几种数据类型:1.字符串(str),2.布尔类型(bool),3.数字(int,float),4.列表(list),5.元组(tuple),6.字典(dict).1. 字符串. 对于 ...
svn import-纳入版本控制
转svn import-纳入版本控制 import: 将未纳入版本控制的文件或目录树提交到版本库.用法: import [PATH] URL 递归地提交 PATH 的副本至 URL. 如果省略 PA ...
SQL Server 2014新特性——事务持久性控制
控制事务持久性 SQL Server 2014之后事务分为2种:完全持久, 默认或延迟的持久. 完全持久,当事务被提交之后,会把事务日志写入到磁盘,完成后返回给客户端. 延迟持久,事务提交是异步的,在 ...
【JSP】JSP基础学习记录（二）—— JSP的7个动作指令
2.JSP的7个动作指令: 动作指令与编译指令不同,编译指令是通知Servlet引擎的处理消息,而动作指令只是运行时的动作.编译指令在将JSP编译成Servlet时起作用:而处理指令通常可替换成JSP ...
nyoj&lowbar;95&lowbar;众数问题&lowbar;map练习
众数问题时间限制:3000 ms | 内存限制:65535 KB 难度:3 描述所谓众数,就是对于给定的含有N个元素的多重集合,每个元素在S中出现次数最多的成为该元素的重数, 多重集合S重 ...
asp&period;net各种类型视频播放代码(全)
1.avi格式代码片断如下: <object id="video" width="400" height="200" border= ...
Linux磁盘占用100&percnt;解决方法
/opt分区被web日志堆满了,导致一些服务无法正常运行,于是rm -fr掉这些日志(近11GB),但是服务仍没有恢复正常,用df -hT看,该分区占用还是100%: [root@anjing opt ...
mvc模式jsp+servel+jdbc oracle基本增删改查demo
mvc模式jsp+servel+jdbc oracle基本增删改查demo 下载地址
admin 配置
如何创建admin管理员 python manage.py createsuperuser #这里要注意python是环境变量中你设置的名字也可能是python2,python3 admin编程中文环 ...