同步数据库到elasticsearch

时间:2022-09-05 13:46:41

elasticsearch-jdbc同步数据库到elasticsearch

一、elasticsearch-jdbc的介绍
同步数据库到elasticSearch的插件elasticsearch-jdbc
之前的名称是elasticsearch-river-jdbc

下载地址:
http://xbib.org/repository/org/xbib/elasticsearch/importer/elasticsearch-jdbc//elasticsearch-jdbc--dist.zip
下载时将替换成相应的版本即可。
如:
http://xbib.org/repository/org/xbib/elasticsearch/importer/elasticsearch-jdbc/2.3.2.0/elasticsearch-jdbc-2.3.2.0-dist.zip

附:源码地址
https://github.com/jprante/elasticsearch-jdbc

二、elasticsearch-jdbc的使用
这里只介绍使用elasticsearch-jdbc同步mssql数据到elasticsearch中

1.创建索引
curl -XPUT ‘http://localhost:9200/baikeDb

2.创建数据表与索引映射

curl -XPUT 'http://localhost:9200/baikeDb/user/_mapping' -d '
{
"
user": {
"
properties": {
"
id": {
"
type": "string",
"
store": "yes"
},
"
name": {
"
type": "string",
"
store": "yes"
},
"
login_name": {
"
type": "string",
"
store": "yes"
}
}
}
}'

3.运行river同步数据

curl -XPUT 'http://localhost:9200/baikeDb/_meta' -d '{
"
type": "jdbc",
"
jdbc": {
"
url": "jdbc:sqlserver://localhost:1433;databaseName=baikeDb",
"user": "sa",
"password": "123456",
"sql": "select id as _id,name,login_name from user",
"index": "baikeDb",
"type": "user",
"bulk_size": 100,
"max_bulk_requests": 30,
"bulk_timeout": "10s",
"flush_interval": "5s",
"schedule": "0 0-59 0-23 ? * *"
}
}'

4.增量更新索引
增量更新,表需要维护时间戳,发现时间戳更新的列需要更新

curl -XPUT 'http://localhost:9200/baikeDb/_meta' -d '{
"
type": "jdbc",
"
jdbc": {
"
url": "jdbc:sqlserver://localhost:1433;databaseName=baikeDb",
"user": "sa",
"password": "123456",
"sql": [
{
"statement": "select id as _id,name,login_name from user where mytimestamp > ?",
"parameter": [
"$river.state.last_active_begin"
]

}
],
"index": "baikeDb",
"type": "user",
"bulk_size": 100,
"max_bulk_requests": 30,
"bulk_timeout": "10s",
"flush_interval": "5s",
"schedule": "0 0-59 0-23 ? * *"
}
}'

5.删除索引

curl -XDELETE ‘localhost:9200/baikeDb’

参考:
http://blog.csdn.net/kingice1014/article/details/53492773