Elasticsearch学习笔记——分词

1.测试Elasticsearch的分词

Elasticsearch有多种分词器（参考:https://www.jianshu.com/p/d57935ba514b）

Set the shape to semi-transparent by calling set_trans(5)

（1）standard analyzer：标准分词器（默认是这种）
set,the,shape,to,semi,transparent by,calling,set_trans,5

（2）simple analyzer：简单分词器
set, the, shape, to, semi, transparent, by, calling, set, trans

（3）whitespace analyzer：空白分词器。大小写，下划线等都不会转换
Set, the, shape, to, semi-transparent, by, calling, set_trans(5)

（4）language analyzer：（特定语言分词器，比如说English英语分瓷器）
set, shape, semi, transpar, call, set_tran, 5

2.为Elasticsearch的index设置分词

这样就将这个index里面的所有type的分词设置成了simple

PUT my_index

{

"settings": {

    "analysis": {

      "analyzer": {"default":{"type":"simple"}}

    }

  }

}

标准分词器 : standard analyzer

http://localhost:9200/_analyze?analyzer=standard&pretty=true&text=test测试

分词结果

{

  "tokens" : [

    {

      "token" : "test",

      "start_offset" : 0,

      "end_offset" : 4,

      "type" : "<ALPHANUM>",

      "position" : 0

    },

    {

      "token" : "测",

      "start_offset" : 4,

      "end_offset" : 5,

      "type" : "<IDEOGRAPHIC>",

      "position" : 1

    },

    {

      "token" : "试",

      "start_offset" : 5,

      "end_offset" : 6,

      "type" : "<IDEOGRAPHIC>",

      "position" : 2

    }

  ]

}

简单分词器 : simple analyzer

http://localhost:9200/_analyze?analyzer=simple&pretty=true&text=test_测试

结果

{

  "tokens" : [

    {

      "token" : "test",

      "start_offset" : 0,

      "end_offset" : 4,

      "type" : "word",

      "position" : 0

    },

    {

      "token" : "测试",

      "start_offset" : 5,

      "end_offset" : 7,

      "type" : "word",

      "position" : 1

    }

  ]

}

IK分词器 : ik_max_word analyzer 和 ik_smart analyzer

首先需要安装

https://github.com/medcl/elasticsearch-analysis-ik

下zip包,然后使用install plugin进行安装,我机器上的es版本是5.6.10,所以安装的就是5.6.10

./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.10/elasticsearch-analysis-ik-5.6.10.zip

然后重新启动Elasticsearch就可以了

进行测试

http://localhost:9200/_analyze?analyzer=ik_max_word&pretty=true&text=test_tes_te测试

结果

{

  "tokens" : [

    {

      "token" : "test_tes_te",

      "start_offset" : 0,

      "end_offset" : 11,

      "type" : "LETTER",

      "position" : 0

    },

    {

      "token" : "test",

      "start_offset" : 0,

      "end_offset" : 4,

      "type" : "ENGLISH",

      "position" : 1

    },

    {

      "token" : "tes",

      "start_offset" : 5,

      "end_offset" : 8,

      "type" : "ENGLISH",

      "position" : 2

    },

    {

      "token" : "te",

      "start_offset" : 9,

      "end_offset" : 11,

      "type" : "ENGLISH",

      "position" : 3

    },

    {

      "token" : "测试",

      "start_offset" : 11,

      "end_offset" : 13,

      "type" : "CN_WORD",

      "position" : 4

    }

  ]

}

秒客网

Elasticsearch学习笔记——分词

相关文章