不能强制GROK解析器在haproxy日志上强制执行整数/浮点类型

时间:2022-01-16 08:45:15

Doesn't matter if integer/long or float, fields like time_duration (all time_* really ) map as strings in kibana logstash index.

无论是整数/长整数还是浮点数,像time_duration这样的字段(all time_ * really)都映射为kibana logstash索引中的字符串。

I tried using mutate (https://www.elastic.co/blog/little-logstash-lessons-part-using-grok-mutate-type-data) did not work either.

我尝试使用mutate(https://www.elastic.co/blog/little-logstash-lessons-part-using-grok-mutate-type-data)也不起作用。

How can i correctly enforce numeric type instead of strings on these fields?

如何在这些字段上正确实施数字类型而不是字符串?

My /etc/logstash/conf.d/haproxy.conf:

input {
  syslog {
    type => haproxy
    port => 5515
  }
}
filter {
  if [type] == "haproxy" { 
    grok {
      patterns_dir => "/usr/local/etc/logstash/patterns"
      match => ["message", "%{HAPROXYHTTP}"]
      named_captures_only => true
    }
    geoip {
      source => "client_ip"
      target => "geoip"
      database => "/etc/logstash/GeoLiteCity.dat"
      add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
      add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]
    }
    mutate {
      convert => [ "[geoip][coordinates]", "float"]
    }
  }
}

And my pattern for HAPROXYHTTP:

我的HAPROXYHTTP模式:

HAPROXYHTTP  %{IP:client_ip}:%{INT:client_port} \[%{HAPROXYDATE:accept_date}\] %{NOTSPACE:frontend_name} %{NOTSPACE:backend_name}/%{NOTSPACE:server_name} %{INT:time_request:int}/%{INT:time_queue:int}/%{INT:time_backend_connect:int}/%{INT:time_backend_response:int}/%{NOTSPACE:time_duration:int} %{INT:http_status_code} %{NOTSPACE:bytes_read:int} %{DATA:captured_request_cookie} %{DATA:captured_response_cookie} %{NOTSPACE:termination_state} %{INT:actconn:int}/%{INT:feconn:int}/%{INT:beconn:int}/%{INT:srvconn:int}/%{NOTSPACE:retries:int} %{INT:srv_queue:int}/%{INT:backend_queue:int} (\{%{HAPROXYCAPTUREDREQUESTHEADERS}\})?( )?(\{%{HAPROXYCAPTUREDRESPONSEHEADERS}\})?( )?"(<BADREQ>|(%{WORD:http_verb} (%{URIPROTO:http_proto}://)?(?:%{USER:http_user}(?::[^@]*)?@)?(?:%{URIHOST:http_host})?(?:%{URIPATHPARAM:http_request})?( HTTP/%{NUMBER:http_version})?))?"

1 个解决方案

#1


It's quite possible that Logstash is doing the right thing here (your configuration looks correct), but how Elasticsearch maps the fields is another matter. If a field in an Elasticsearch document at some point has been dynamically mapped as a string, subsequent documents added to the same index will also be mapped as strings even though they're integers or floating point numbers in the source document. To change this you have to reindex, but with timeseries-based Logstash indexes you can just wait until the next day when you get a new index.

Logstash很可能在这里做正确的事情(您的配置看起来正确),但Elasticsearch如何映射字段是另一回事。如果某个点的Elasticsearch文档中的字段已动态映射为字符串,则添加到同一索引的后续文档也将映射为字符串,即使它们是源文档中的整数或浮点数。要更改此项,您必须重新编制索引,但使用基于时间序列的Logstash索引,您可以等到第二天获得新索引。

#1


It's quite possible that Logstash is doing the right thing here (your configuration looks correct), but how Elasticsearch maps the fields is another matter. If a field in an Elasticsearch document at some point has been dynamically mapped as a string, subsequent documents added to the same index will also be mapped as strings even though they're integers or floating point numbers in the source document. To change this you have to reindex, but with timeseries-based Logstash indexes you can just wait until the next day when you get a new index.

Logstash很可能在这里做正确的事情(您的配置看起来正确),但Elasticsearch如何映射字段是另一回事。如果某个点的Elasticsearch文档中的字段已动态映射为字符串,则添加到同一索引的后续文档也将映射为字符串,即使它们是源文档中的整数或浮点数。要更改此项,您必须重新编制索引,但使用基于时间序列的Logstash索引,您可以等到第二天获得新索引。