前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Elasticsearch 5.x 版本升级到 6.x 版本,数据写入异常处理方案

Elasticsearch 5.x 版本升级到 6.x 版本,数据写入异常处理方案

原创
作者头像
zjiekou
发布2022-11-30 14:52:07
9350
发布2022-11-30 14:52:07
举报

一、问题背景

某客户将云ES从5.6.4版本升级到6.8.2版本后出现数据写入异常,数据丢失的情况。需协助紧急协助处理

客户业务写入方式为filebeat---->logstash-------->es

二、原因分析

查看logstash日志有很多如下异常报错信息

Could not index event to Elasticsearch. {:status=>400, :action=>"index", {:_id=>nil, :_index=>"logstash-f1-hq-access-2022.09.02.12", :routing=>nil, :_type=>"doc"}, #<LogStash::Event:0x38205eef>, :response=>{"index"=>{"index"=>"logstash-f1-hq-access-2022.09.02.12", "_type"=>"doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"Failed to parse mapping [_default]: include_in_all is not allowed for indices created on or after version 6.0.0 as _all is deprecated. As a replacement, you can use an copy_to on mapping fields to create your own catch all field.", "caused_by"=>{"type"=>"mapper_parsing_exception", "reason"=>"include_in_all is not allowed for indices created on or after version 6.0.0 as _all is deprecated. As a replacement, you can use an copy_to on mapping fields to create your own catch all field."}}}}}

报错表示索引 mapping 参数include_in_all,在6.0版本之后创建的索引中无法使用(5.x 版本创建包含此设置的索引在升级 6.x 版本后可以兼容)详情参考The include_in_all mapping parameter is now disallowed

用户是通过logstash索引模板创建索引写入数据到ES,所以先从模板入手确认问题所在

客户logstash索引模板如下

{
  "order": 0,
  "version": 50001,
  "index_patterns": [
    "logstash-*"
  ],
  "settings": {
    "index": {
      "refresh_interval": "5s"
    }
  },
  "mappings": {
    "_default_": {
      "dynamic_templates": [
        {
          "message_field": {
            "path_match": "message",
            "match_mapping_type": "string",
            "mapping": {
              "type": "text",
              "norms": false
            }
          }
        },
        {
          "string_fields": {
            "match": "*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "text",
              "norms": false,
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      ],
      "properties": {
        "@timestamp": {
          "type": "date",
          "include_in_all": true
        },
        "@version": {
          "type": "keyword",
          "include_in_all": true
        },
        "geoip": {
          "dynamic": true,
          "properties": {
            "ip": {
              "type": "ip"
            },
            "location": {
              "type": "geo_point"
            },
            "latitude": {
              "type": "half_float"
            },
            "longitude": {
              "type": "half_float"
            }
          }
        }
      }
    }
  },
  "aliases": {}
}

三、解决方案

1、发现用户模板中的@timestamp和@version有用到include_in_all,经跟用户会议沟通,我们将"include_in_all": true修改为"include_in_all": false后写入ES正常了。

调整后的模板:

{
  "order": 0,
  "version": 50001,
  "index_patterns": [
    "logstash-*"
  ],
  "settings": {
    "index": {
      "refresh_interval": "5s"
    }
  },
  "mappings": {
    "_default_": {
      "dynamic_templates": [
        {
          "message_field": {
            "path_match": "message",
            "match_mapping_type": "string",
            "mapping": {
              "type": "text",
              "norms": false
            }
          }
        },
        {
          "string_fields": {
            "match": "*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "text",
              "norms": false,
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      ],
      "properties": {
        "@timestamp": {
          "type": "date",
          "include_in_all": false
        },
        "@version": {
          "type": "keyword",
          "include_in_all": false
        },
        "geoip": {
          "dynamic": true,
          "properties": {
            "ip": {
              "type": "ip"
            },
            "location": {
              "type": "geo_point"
            },
            "latitude": {
              "type": "half_float"
            },
            "longitude": {
              "type": "half_float"
            }
          }
        }
      }
    }
  },
  "aliases": {}
}

2、客户疑问:新的数据已经写入成功了,丢失的这部分数据如何找回如何补救?

客户的filebeat是有持续的发送数据的,filebeat有数据持久化不会丢失数据。重新发一份就行。

Filebeat如何确保文件内容不丢失(至少发送一次)

registry记录每个harvester最后读取到文件的offset,只要数据被发送成功时,才会记录。如果发送失败,则会一直重复发送

如果filebeat正在运行时,需要关闭。filebeat不会等待所有接收方确认完,而是立刻关闭。等再次启动时,这部分未确认的内容会重新发送(至少发送一次)。

因此做出如下方案补缺失的数据

  1. 复制一份新的filebeat ,配置文件注明要补的文件,上报到一个新的索引名,比如叫A1
  2. 通过reindex 命令将 A1 从19:01:03(举例)到21:20:04(举例)的数据reindex 到目标索引A2 命令参考如下: POST _reindex { "source": { "index": "A1", "query": { "range": { "@timestamp": { "from": "2022-09-02 19:00:00.001", "to": "2022-09-02 20:40:00.001", "include_lower": true, "include_upper": true, "time_zone": "+08:00", "format": "yyyy-MM-dd HH:mm:ss.SSS" } } } }, "dest": { "index": "A2" } }

四、升级注意事项

跨版本升级一定要处理好兼容事项,升级检查遇到的warn和error一定要处理谨慎

ES 版本升级检查

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 一、问题背景
  • 二、原因分析
  • 三、解决方案
  • 四、升级注意事项
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档