前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >How to Monitor Zookeeper

How to Monitor Zookeeper

作者头像
heidsoft
发布2018-10-16 11:11:42
2K0
发布2018-10-16 11:11:42
举报

Monitoring Zookeeper: Metrics and Alerts

As per previous articles, our general rule of thumb is “collect all possible/reasonable metrics that can help when troubleshooting, alert only on those that require an action from you”. Well, the Zookeeper list that satisfies this criteria is not that long.

Zookeeper process is running

Metric

Comments

Suggested Alert

Zookeeper process

Is the right binary daemon process running?

When process list contains the regexp /usr/bin/java*org.apache.zookeeper$.

You can also use the following script to check if the server is running:

代码语言:javascript
复制
$INSTALL_PREFIX/zk-server-3/bin/zkServer.sh status

Or if you run Zookeeper via supervisord (recommended) you can alert the supervisord resource instead.

System Metrics

Metric

Comments

Suggested Alert

Memory usage

Zookeeper should run entirely on RAM. JVM heap size shouldn’t be bigger than your available RAM. That is to avoid swapping.

None

Swap usage

Watch for swap usage, as it will degrade performance on Zookeeper and lead to operations timing out (set vm.swappiness = 0).

When used swap is > 128MB.

Network bandwidth

Zookeeper servers can incur a high network usage. Keep an eye on this, especially if you notice any performance degradation. Also look out for dropped packet errors. Zookeeper standards are: 20% writes, 80% reads. More nodes result in more writes and higher overall traffic.

None

Disk usage

Zookeeper data is usually ephemeral and small. Still we recommend dataLogDir to be on a dedicated partition and watch for disk usage. Use purge task to clean up dataDir and dataLogDir.

When disk is > 85% usage.

Zookeeper disk writes are asynchronous which means they shouldn’t have high IO requirements. Still, keep an eye on this, especially if your server is shared with other services, say Kafka.

Here is how Server Density graphs disk usage and memory usage. Note the up and down curves created by the purge task:

And here are some Zookeeper alerts configured in Server Density:

Zookeeper Metrics

Metric

Comments

Suggested Alert

Request Avg/Max Latency

Amount of time it takes for the server to respond to a client request (since the server was started).

When latency > 10 (Ticks).

Outstanding Requests

Number of queued requests in the server. This goes up when the server receives more requests than it can process.

When count > 10.

Received

Number of client requests (typically operations) received.

None

Sent

Number of client packets sent (responses and notifications).

None

File Descriptors

Number of file descriptors used over the limit.

When FD percentage > 85 %.

Mode

Serving mode: leader or follower, or standalone if not running in an ensemble.

None

Pending syncs

(Only exposed by the leader) number of pending syncs from the followers.

When pending > 10.

Followers

(Only exposed by the leader) number of followers within the ensemble. You can deduce the number of servers from the MBeam Quorum Size.

When followers != (number of ensemble servers -1).

Node count

Number of znodes in the Zookeeper namespace

None

Watch count

Number of watchers setup over Zookeeper nodes.

None

Heap Memory Usage

Memory allocated dynamically by the Java process, Zookeeper in this case.

None

Here is a Zookeeper monitoring graph including Latency average and Outstanding requests:

Zookeeper Monitoring Tools

The simplest way to monitor Zookeeper and collect these metrics is by using the commands known as “4 letter words” within the ZK community. You can run these using telnet or netcat directly:

代码语言:javascript
复制
$ echo ruok | nc 127.0.0.1 5111
imok
 
$ echo mntr | nc localhost 5111
zk_version  3.4.0
zk_avg_latency  0
zk_max_latency  0
zk_min_latency  0
zk_packets_received 70
zk_packets_sent 69
zk_outstanding_requests 0
zk_server_state leader
zk_znode_count   4
zk_watch_count  0
zk_ephemerals_count 0
zk_approximate_data_size    27
zk_followers    4                   - only exposed by the Leader
zk_synced_followers 4               - only exposed by the Leader
zk_pending_syncs    0               - only exposed by the Leader
zk_open_file_descriptor_count 23    - only available on Unix platforms
zk_max_file_descriptor_count 1024   - only available on Unix platforms

We’ve looked at mytop for MySQL, and memcache-top for Memcached. Well, Zookeeper has one too, zktop:

代码语言:javascript
复制
$ ./zktop.py --servers "localhost:2181,localhost:2182,localhost:2183"
Ensemble -- nodecount:10 zxid:0x1300000001 sessions:4
SERVER           PORT M      OUTST    RECVD     SENT CONNS MINLAT AVGLAT MAXLAT
localhost        2181 F          0       93       92     2      2      7     13
localhost        2182 F          0       37       36     1      0      0      0
localhost        2183 L          0       36       35     1      0      0      0

CLIENT           PORT I   QUEUE RECVD  SENT
127.0.0.1       34705 1       0    56    56
127.0.0.1       35943 1       0     1     0
127.0.0.1       33999 1       0     1     0
127.0.0.1       37988 1       0     1     0

If you are after more detailed metrics, you can access those through JMX. You could also take the DIY road and go for JMXTrans and Graphite, or use Nagios/Cacti/Ganglia with check_zookeeper.py. Alternatively, you can save time (and preserve your sanity) by choosing a hosted service like Server Density (that’s us!).

If you want to test the quality and performance of your Zookeeper ensemble, then zk-smoketest with zk-smoketest.py and zk-latencies.py are great tools to check out.

Zookeeper Management tools

There are not too many management options out there. The folks at Netflix have released Exhibitor, a tool that provides some basic monitoring, log cleaning up (for old versions), backup/restore, ensemble configuration and nodes visualization. There is also zookeeper_dashboard, but it hasn’t been updated in years.

Further reading

Did this article pique your interest in Zookeeper? Nice, keep reading. We found Scott Leberknight’s Zookeeper series of blog posts to be worthwhile. We also like these presentations:

  • Building an Impenetrable Zookeeper (includes video).
  • Apache Zookeeper is a long presentation covering some required concepts of distributed systems
  • Zookeeper in the Wild goes straight to the point on operating a Zookeeper ensemble.
本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2018-08-08,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 云数智圈 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Monitoring Zookeeper: Metrics and Alerts
  • Zookeeper process is running
  • System Metrics
  • Zookeeper Metrics
  • Zookeeper Monitoring Tools
    • Zookeeper Management tools
    • Further reading
    相关产品与服务
    云数据库 MySQL
    腾讯云数据库 MySQL(TencentDB for MySQL)为用户提供安全可靠,性能卓越、易于维护的企业级云数据库服务。其具备6大企业级特性,包括企业级定制内核、企业级高可用、企业级高可靠、企业级安全、企业级扩展以及企业级智能运维。通过使用腾讯云数据库 MySQL,可实现分钟级别的数据库部署、弹性扩展以及全自动化的运维管理,不仅经济实惠,而且稳定可靠,易于运维。
    领券
    问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档