前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >ES节点丢失导致实时数据导入速度特别慢

ES节点丢失导致实时数据导入速度特别慢

作者头像
YG
发布2018-05-23 17:17:32
4K0
发布2018-05-23 17:17:32
举报
文章被收录于专栏:YG小书屋YG小书屋

一个节点死机了,无法自动重启。通过logtash导数据,由于当天入的数据是0备份,节点丢失后,某些shard丢失,导致集群一直处于red状态。节点丢失后,该索引的导入速度直线下降。经测试发现是logtash的原因,logtash的input阶段是一个线程,filter和output用一个线程。中间通过一个同步队列缓存数据。如果在output的过程中出现问题,那么失败的数据会无限制地放回同步队列,然后队列中的数据被再次分配shard导入,分配到丢失shard的数据会再次失败,再次放入同步队列。因此数据一直在同步队列和es的bulk中循环,导致整个索引的导入速度变慢。

用测试机测试出的结果如下: 1、正常导数据:

代码语言:javascript
复制
xxx-20170925              1     p      STARTED   24713  24.7mb xxx.7.67   node-xxx.7.67-performance_test
xxx-20170925              5     p      STARTED   24256  33.7mb xxx.7.67   node-xxx.7.67-performance_test
xxx-20170925              2     p      STARTED   24702  24.2mb xxx.11.131 node-xxx.11.131-performance_test
xxx-20170925              3     p      STARTED   24626  24.2mb xxx.7.81   node-xxx.7.81-performance_test
xxx-20170925              7     p      STARTED   24916  34.2mb xxx.7.81   node-xxx.7.81-performance_test
xxx-20170925              4     p      STARTED   23970  38.2mb xxx.6.105  node-xxx.6.105-performance_test
xxx-20170925              6     p      STARTED   24786    24mb xxx.11.131 node-xxx.11.131-performance_test
xxx-20170925              0     p      STARTED   24824  34.4mb xxx.6.105  node-xxx.6.105-performance_test

2 关闭一个节点

代码语言:javascript
复制
xxx-20170925              6     p      STARTED     128179 110.8mb xxx.11.131 node-xxx.11.131-performance_test
xxx-20170925              1     p      UNASSIGNED                                
xxx-20170925              4     p      STARTED     128263 108.1mb xxx.6.105  node-xxx.6.105-performance_test
xxx-20170925              7     p      STARTED     128593 109.3mb xxx.7.81   node-xxx.7.81-performance_test
xxx-20170925              2     p      STARTED     128613 112.8mb xxx.11.131 node-xxx.11.131-performance_test
xxx-20170925              5     p      UNASSIGNED                                
xxx-20170925              3     p      STARTED     127969 115.6mb xxx.7.81   node-xxx.7.81-performance_test
xxx-20170925              0     p      STARTED     128322 110.3mb xxx.6.105  node-xxx.6.105-performance_test

3 经过一段时间后查看shard,发现其他shard增长的速度特别慢

代码语言:javascript
复制
xxx-20170925              6     p      STARTED     128436 111.1mb xxx.11.131 node-xxx.11.131-performance_test
xxx-20170925              5     p      UNASSIGNED                                
xxx-20170925              3     p      STARTED     128231 110.9mb xxx.7.81   node-xxx.7.81-performance_test
xxx-20170925              7     p      STARTED     128814 109.6mb xxx.7.81   node-xxx.7.81-performance_test
xxx-20170925              1     p      UNASSIGNED                                
xxx-20170925              2     p      STARTED     128871 182.6mb xxx.11.131 node-xxx.11.131-performance_test
xxx-20170925              4     p      STARTED     128502 108.5mb xxx.6.105  node-xxx.6.105-performance_test
xxx-20170925              0     p      STARTED     128568 109.1mb xxx.6.105  node-xxx.6.105-performance_test

logtash的日志如下:

代码语言:javascript
复制
[2017-11-21T11:04:26,780][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[xxx-20170925][5] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [xxx-20170925] containing [19] requests]"})
[2017-11-21T11:04:26,780][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[xxx-20170925][5] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [xxx-20170925] containing [19] requests]"})
[2017-11-21T11:04:26,780][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[xxx-20170925][1] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [xxx-20170925] containing [15] requests]"})
[2017-11-21T11:04:26,780][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[xxx-20170925][5] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [xxx-20170925] containing [19] requests]"})
[2017-11-21T11:04:26,784][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[xxx-20170925][5] primary shard is not active Timeout: [1m], request: [BulkShardRequest to [xxx-20170925] containing [19] requests]"})
[2017-11-21T11:04:26,784][ERROR][logstash.outputs.elasticsearch] Retrying individual actions
[2017-11-21T11:04:26,784][ERROR][logstash.outputs.elasticsearch] Action
[2017-11-21T11:04:26,784][ERROR][logstash.outputs.elasticsearch] Action
[2017-11-21T11:04:26,784][ERROR][logstash.outputs.elasticsearch] Action
[2017-11-21T11:04:26,784][ERROR][logstash.outputs.elasticsearch] Action
[2017-11-21T11:04:26,784][ERROR][logstash.outputs.elasticsearch] Action
[2017-11-21T11:04:26,784][ERROR][logstash.outputs.elasticsearch] Action
[2017-11-21T11:04:26,784][ERROR][logstash.outputs.elasticsearch] Action

4 数据恢复后

代码语言:javascript
复制
xxx-20170925              4     p      STARTED     154764 125.3mb xxx.6.105  node-xxx.6.105-performance_test
xxx-20170925              5     p      STARTED     157936 126.4mb xxx.7.67   node-xxx.7.67-performance_test
xxx-20170925              2     p      STARTED     154945 138.9mb xxx.11.131 node-xxx.11.131-performance_test
xxx-20170925              7     p      STARTED     155224 156.8mb xxx.7.81   node-xxx.7.81-performance_test
xxx-20170925              1     p      STARTED     158080 124.8mb xxx.7.67   node-xxx.7.67-performance_test
xxx-20170925              3     p      STARTED     154243 153.8mb xxx.7.81   node-xxx.7.81-performance_test
xxx-20170925              6     p      STARTED     154909 146.9mb xxx.11.131 node-xxx.11.131-performance_test
xxx-20170925              0     p      STARTED     154681   127mb xxx.6.105  node-xxx.6.105-performance_test
本文参与 腾讯云自媒体分享计划,分享自作者个人站点/博客。
原始发表:2017.11.22 ,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
Elasticsearch Service
腾讯云 Elasticsearch Service(ES)是云端全托管海量数据检索分析服务,拥有高性能自研内核,集成X-Pack。ES 支持通过自治索引、存算分离、集群巡检等特性轻松管理集群,也支持免运维、自动弹性、按需使用的 Serverless 模式。使用 ES 您可以高效构建信息检索、日志分析、运维监控等服务,它独特的向量检索还可助您构建基于语义、图像的AI深度应用。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档