ES 2.0 集群运维命令整理

时间:2022-03-30 21:52:32

ES 2.0 集群运维命令整理


_cat命令

_cat用于查看集群当前状态,涉及到shard/node/cluster几个层次

基本参数

  1. verbose: 显示列名, 请求参数为v

    示例: curl localhost:9200/_cat/master?v

  2. help: 显示当前命令的各列含义, 请求参数为help. 某些命令部分列默认不显示,可通过help该命令可显示的所有列

    示例: curl localhost:9200/_cat/master?help

  3. bytes: 数值列还原为原始值. 如diskSize, 默认转为以kb/mb/gb表示, 打开后还原为原始值

    示例: curl localhost:9200/_cat/indices?bytes=b

  4. header: 显示指定列的信息, 请求参数为h

    示例: curl localhost:9200/_cat/indices?h=i,tm(显示集群各索引的内存使用)

查看segement详细信息(/_cat/segements)

查看各index的segment详细信息,包括segment名, 所属shard, 内存/磁盘占用大小, 是否刷盘, 是否merge为compound文件等. 可以查看指定index的segment信息(/_cat/segments/${index}). 示例:

> curl "localhost:9200/_cat/segments/idx1?v"
index shard prirep ip segment generation docs.count docs.deleted size size.memory committed searchable version compound
idx1 0 p 127.0.0.1 _a 10 17 0 3.7kb 2764 true true 5.2.1 false
idx1 0 p 127.0.0.1 _b 11 2 0 2.9kb 2764 true true 5.2.1 true
idx1 0 p 127.0.0.1 _c 12 2 0 2.9kb 2764 true true 5.2.1 true
idx1 0 r 127.0.0.1 _a 10 16 0 3.6kb 2764 true true 5.2.1 false
idx1 0 r 127.0.0.1 _b 11 3 0 2.9kb 2764 true true 5.2.1 true
idx1 0 r 127.0.0.1 _c 12 2 0 2.9kb 2764 true true 5.2.1 true
idx1 1 p 127.0.0.1 _a 10 17 0 3.7kb 2764 true true 5.2.1 false
idx1 1 p 127.0.0.1 _b 11 2 0 2.9kb 2764 true true 5.2.1 true
idx1 1 p 127.0.0.1 _c 12 2 0 2.9kb 2764 true true 5.2.1 true
idx1 1 r 127.0.0.1 _a 10 16 0 3.6kb 2764 true true 5.2.1 false
idx1 1 r 127.0.0.1 _b 11 3 0 2.9kb 2764 true true 5.2.1 true
idx1 1 r 127.0.0.1 _c 12 2 0 2.9kb 2764 true true 5.2.1 true

查看index详细信息(/_cat/indices)

查看集群中所有index的详细信息,包括index状态,shard个数(primary/replica),doc个数等,可参考help. 可以查看指定index的信息(/_cat/indices/${index}). 示例:

> curl localhost:9200/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open idx2 5 1 100 0 92.5kb 32.6kb
green open idx1 5 1 100 0 97.7kb 51.6kb
green open customer 5 1 0 0 1.5kb 780b

查看alias详细信息(/_cat/aliases)

查看集群中所有alias信息,包括alias对应的index, 路由配置等. 可以查看指定alias的信息(/_cat/aliases/${alias}). 示例:

> curl '192.168.56.10:9200/_cat/aliases?v'
alias index filter routing.index routing.search
alias2 test1 * - -
alias4 test1 - 2 1,2
alias1 test1 - - -
alias3 test1 - 1 1

查看shard详细信息(/_cat/shards)

查看各shard的详细情况,包括shard的分布, 当前状态(对于分配失败的shard会有失败原因), doc数量, 磁盘占用情况, shard的访问情况(如所有get请求的成功/失败次数以及对应耗时等). 可以指定index只查看某个index的shard信息(/_cat/shards/${index}). 示例:

> curl "localhost:9200/_cat/shards/idx1?v"
index shard prirep state docs store ip node
idx1 1 p STARTED 21 9.8kb 127.0.0.1 node-1
idx1 1 r STARTED 21 9.8kb 127.0.0.1 node-2
idx1 3 p STARTED 18 12.4kb 127.0.0.1 node-1
idx1 3 r STARTED 18 12.4kb 127.0.0.1 node-2
idx1 4 p STARTED 23 9.9kb 127.0.0.1 node-1
idx1 4 r STARTED 23 9.9kb 127.0.0.1 node-2
idx1 2 p STARTED 17 9.5kb 127.0.0.1 node-1
idx1 2 r STARTED 17 3.9kb 127.0.0.1 node-2
idx1 0 p STARTED 21 9.8kb 127.0.0.1 node-1
idx1 0 r STARTED 21 9.8kb 127.0.0.1 node-2

对于RELOCATING的shard, 该命令会给出源node和目标node, 官方示例:

> curl 192.168.56.10:9200/_cat/shards | fgrep RELO
wiki1 0 r RELOCATING 3014 31.1mb 192.168.56.20 Commander Kraken -> 192.168.56.30 Frankie Raye
wiki1 1 r RELOCATING 3013 29.6mb 192.168.56.10 Stiletto -> 192.168.56.30 Frankie Raye

查看单节点分配信息(/_cat/allocation)

查看单节点的shard分配整体情况.示例:

> curl localhost:9200/_cat/allocation?v,

shards disk.used disk.avail disk.total disk.percent host ip node
5 20.3gb 302.1gb 322.5gb 6 127.0.0.1 127.0.0.1 node-1
5 20.3gb 302.1gb 322.5gb 6 127.0.0.1 127.0.0.1 node-2

: diskUsed是节点磁盘使用情况,不仅仅是shard大小

查看单节点的自定义属性(/_cat/nodeattrs)

查看单节点的自定义属性,示例

> curl 192.168.56.10:9200/_cat/nodeattrs

node host ip attr value
Black Bolt epsilon 192.168.1.8 rack rack314
Black Bolt epsilon 192.168.1.8 azone us-east-1

查看集群当前状态(/_cat/health)

查看集群当前状态, 包括data节点个数,primary shard个数等基本信息. 示例:

> curl localhost:9200/_cat/health?v
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1468399080 16:38:00 test-es green 2 2 30 15 0 0 0 0 - 100.0%

status列为green时表示集群正常; yellow表示部分shards的primary已分配,replica未分配; red表示部分shard的primary未分配

: 该命令可用于跟踪集群由于节点宕机导致的recover过程, 官方示例:

> while true; do curl 192.168.56.10:9200/_cat/health; sleep 120; done
1384309446 18:24:06 foo red 3 3 20 20 0 0 1812 0
1384309566 18:26:06 foo yellow 3 3 950 916 0 12 870 0
1384309686 18:28:06 foo yellow 3 3 1328 916 0 12 492 0
1384309806 18:30:06 foo green 3 3 1832 916 4 0 0
^C

查看集群各个节点的当前状态(/_cat/nodes)

查看集群各个节点的当前状态, 包括节点的物理参数(包括os/jdk版本, uptime, 当前mem/disk/fd使用情况等), 请求访问情况(如search/index成功和失败的次数)等详细信息, 示例:

>  curl "localhost:9200/_cat/nodes?v"
host ip heap.percent ram.percent load node.role master name
127.0.0.1 127.0.0.1 5 93 0.26 d m node-2
127.0.0.1 127.0.0.1 9 93 0.26 d * node-1

查看集群master节点(/_cat/master)

查看集群中的master节点, 示例

> curl localhost:9200/_cat/master?v

id host ip node
i-SDbdpAQIaPv0J9SIoAvA 127.0.0.1 127.0.0.1 node-1

查看集群fielddata内存占用情况(/_cat/fielddata)

查看当前集群各个节点的fielddata内存使用情况,示例:

> curl '192.168.56.10:9200/_cat/fielddata?v'
id host ip node total body text
c223lARiSGeezlbrcugAYQ myhost1 10.20.100.200 Jessica Jones 385.6kb 159.8kb 225.7kb
waPCbitNQaCL6xC8VxjAwg myhost2 10.20.100.201 Adversary 435.2kb 159.8kb 275.3kb
yaDkp-G3R0q1AJ-HUEvkSQ myhost3 10.20.100.202 Microchip 284.6kb 109.2kb 175.3kb

total列表示fielddata在该节点的内存占用情况

查看集群doc数量(/_cat/count)

查看当前集群的doc数量; 也可显示指定index的doc数量,格式为/_cat/count/${index}, 示例:

> curl localhost:9200/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open idx2 5 1 100 0 92.5kb 32.6kb
green open idx1 5 1 100 0 97.7kb 51.6kb
green open customer 5 1 0 0 1.5kb 780b

> curl localhost:9200/_cat/count?v
epoch timestamp count
1468399423 16:43:43 200

> curl localhost:9200/_cat/count/idx1?v
epoch timestamp count
1468399428 16:43:48 100

> curl localhost:9200/_cat/count/idx2?v
epoch timestamp count
1468399430 16:43:50 100

返回前两列是命令当前时间,第三列count列是doc的count值

查看集群的pendingTask情况(/_cat/pending_tasks)

查看当前集群的pending task, 示例:

> curl 'localhost:9200/_cat/pending_tasks?v'
insertOrder timeInQueue priority source
1685 855ms HIGH update-mapping [foo][t]
1686 843ms HIGH update-mapping [foo][t]
1693 753ms HIGH refresh-mapping [foo][[t]]
1688 816ms HIGH update-mapping [foo][t]
1689 802ms HIGH update-mapping [foo][t]
1690 787ms HIGH update-mapping [foo][t]
1691 773ms HIGH update-mapping [foo][t]

查看集群各节点的plugin信息(/_cat/plugins)

查看集群各个节点上的plugin信息, 示例:

> curl "localhost:9200/_cat/plugins?v"
name component version type url
node-2 head master s /_plugin/head/
node-2 kopf 2.1.2 s /_plugin/kopf/

查看集群的recovery情况(/_cat/recovery)

查看集群内每个shard的recovery过程. 调整replica,恢复snapshot或者节点启动都会触发shard的recover.

示例1(节点启动的recovery, 来自官方doc):

> curl -XGET 'localhost:9200/_cat/recovery?v'
index shard time type stage source target files percent bytes percent
wiki 0 73 store done hostA hostA 36 100.0% 24982806 100.0%
wiki 1 245 store done hostA hostA 33 100.0% 24501912 100.0%
wiki 2 230 store done hostA hostA 36 100.0% 30267222 100.0%

示例2(增加replica, 来自官方doc):

> curl -XPUT 'localhost:9200/wiki/_settings' -d'{"number_of_replicas":1}'
{"acknowledged":true}

> curl -XGET 'localhost:9200/_cat/recovery?v'
index shard time type stage source target files percent bytes percent
wiki 0 1252 store done hostA hostA 4 100.0% 23638870 100.0%
wiki 0 1672 replica index hostA hostB 4 75.0% 23638870 48.8%
wiki 1 1698 replica index hostA hostB 4 75.0% 23348540 49.4%
wiki 1 4812 store done hostA hostA 33 100.0% 24501912 100.0%
wiki 2 1689 replica index hostA hostB 4 75.0% 28681851 40.2%
wiki 2 5317 store done hostA hostA 36 100.0% 30267222 100.0%

示例3(恢复snapshot, 来自官方doc):

> curl -XPOST 'localhost:9200/_snapshot/imdb/snapshot_2/_restore'
{"acknowledged":true}

> curl -XGET 'localhost:9200/_cat/recovery?v'
index shard time type stage repository snapshot files percent bytes percent
imdb 0 1978 snapshot done imdb snap_1 79 8.0% 12086 9.0%
imdb 1 2790 snapshot index imdb snap_1 88 7.7% 11025 8.1%
imdb 2 2790 snapshot index imdb snap_1 85 0.0% 12072 0.0%
imdb 3 2796 snapshot index imdb snap_1 85 2.4% 12048 7.2%
imdb 4 819 snapshot init imdb snap_1 0 0.0% 0 0.0%

查看集群各节点的threadpool统计信息(/_cat/thread_pool)

查看集群各节点内部不同类型的threadpool的统计信息, 覆盖了es对外所有请求的threadpool.统计指标包括了threadpool的类型, 线程存活时间,活跃线程数和最大线程数,任务队列大小以及当前任务数等. 示例:

> curl "localhost:9200/_cat/thread_pool?v"
host ip bulk.active bulk.queue bulk.rejected index.active index.queue index.rejected search.active search.queue search.rejected
127.0.0.1 127.0.0.1 0 0 0 0 0 0 0 0 0
127.0.0.1 127.0.0.1 0 0 0 0 0 0 0 0 0

由于当前本机没有index/search/bulk请求,所以示例中active/rejected/queue指标为0


后边在补充_node_cluster的命令