由于应用稳定性或者服务器资源限制等问题,应用就会出现自动挂掉的情况,此时就需要自动拉起应用。
生产环境,为了防止因为意外宕机造成服务长时间中断,一般都会设置服务进程监控拉起机制。
Monit - utility for monitoring services on a Unix system
Monit 是 Unix 系统上的服务监控工具。可以用来监控和管理进程、程序、文件、目录和设备等。
优点
缺点
1# 安装 epel 源
2$ yum -y install epel-release
3
4# 安装 monit
5$ yum -y install monit
6
7# 验证
8$ monit -V
9This is Monit version 5.26.0
10Built with ssl, with ipv6, with compression, with pam and with large files
11Copyright (C) 2001-2019 Tildeslash Ltd. All Rights Reserved.
12
13# 启动服务
14$ systemctl start monit
15
16# 启动 monit 守护进程
17$ monit
官方手册:https://mmonit.com/monit/documentation/monit.html
命令格式: monit [options]+ [command]
1# 查看帮助信息
2$ monit -h
yum 安装后的默认配置文件如下: 全局参数配置文件 :/etc/monitrc 服务监控配置文件目录:/etc/monit.d 日志文件:/var/log/monit.log
1# 配置文件
2$ grep -v "^#" /etc/monitrc
3# 每 5 秒检查被监控服务的状态
4set daemon 5 # check services at 30 seconds intervals
5set log syslog
6
7# 启用内置的 web 服务器
8set httpd port 2812 and
9 use address 10.0.0.2 # only accept connection from localhost (drop if you use M/Monit)
10 # 允许 localhost 连接
11 allow localhost # allow localhost to connect to the server and
12 # 解决本地命令报错问题:Error receiving data -- Connection reset by peer
13 allow 10.0.0.2
14 # 运行外网 IP 访问
15 allow x.x.x.x
16 # web登录的用户名和密码
17 allow admin:monit # require user 'admin' with password 'monit'
18 #with ssl { # enable SSL/TLS and set path to server certificate
19 # pemfile: /etc/ssl/certs/monit.pem
20 #}
21
22# 监控服务配置文件目录
23include /etc/monit.d/*
1# 查看 nexus 监控文件
2$ cat /etc/monit.d/nexus
3check process nexus
4 matching "org.sonatype.nexus.karaf.NexusMain"
5 start program = "/root/nexus3/nexus-3.12.1-01/bin/nexus start"
6 stop program = "/root/nexus3/nexus-3.12.1-01/bin/nexus stop"
7 if failed port 18081 then restart
8
9# 查看 nexus 监控状态
10$ monit status nexus
11Monit 5.26.0 uptime: 3h 48m
12
13Process 'nexus'
14 status OK
15 monitoring status Monitored
16 monitoring mode active
17 on reboot start
18 pid 15191
19 parent pid 1
20 uid 0
21 effective uid 0
22 gid 0
23 uptime 1m
24 threads 96
25 children 0
26 cpu 0.2%
27 cpu total 0.2%
28 memory 14.3% [1.1 GB]
29 memory total 14.3% [1.1 GB]
30 security attribute -
31 disk read 0 B/s [1.6 MB total]
32 disk write 0 B/s [232.5 MB total]
33 port response time 1.756 ms to localhost:18081 type TCP/IP protocol DEFAULT
34 data collected Wed, 13 May 2020 14:36:27
35
36# 验证 nexus 停机自动拉起
37$ kill -9 15191
38
39# 间隔时间内还未拉起
40$ monit status nexus
41Monit 5.26.0 uptime: 3h 48m
42
43Process 'nexus'
44 status Does not exist
45 monitoring status Monitored
46 monitoring mode active
47 on reboot start
48 data collected Wed, 13 May 2020 14:36:42
49
50# 查看自动拉起后的 nexus 监控状态
51$ monit status nexus
52Monit 5.26.0 uptime: 3h 48m
53
54Process 'nexus'
55 status OK
56 monitoring status Monitored
57 monitoring mode active
58 on reboot start
59 pid 15830
60 parent pid 1
61 uid 0
62 effective uid 0
63 gid 0
64 uptime 0m
65 threads 52
66 children 0
67 cpu 64.0%
68 cpu total 64.0%
69 memory 4.5% [349.2 MB]
70 memory total 4.5% [349.2 MB]
71 security attribute -
72 disk read 0 B/s [84 kB total]
73 disk write 0 B/s [36.9 MB total]
74 port response time -
75 data collected Wed, 13 May 2020 14:36:45
76
77# 查看过程日志
78$ tailf -20 /var/log/monit.log
79......
80[CST May 13 14:35:09] error : 'nexus' process is not running
81[CST May 13 14:35:09] info : 'nexus' trying to restart
82[CST May 13 14:35:09] info : 'nexus' start: '/root/nexus3/nexus-3.12.1-01/bin/nexus start'
83[CST May 13 14:35:17] info : Reinitializing monit daemon
84[CST May 13 14:35:17] info : Reinitializing Monit -- control file '/etc/monitrc'
85[CST May 13 14:35:17] info : 'VM_0_2_centos' Monit reloaded
86[CST May 13 14:36:42] error : 'nexus' process is not running
87[CST May 13 14:36:42] info : 'nexus' trying to restart
88[CST May 13 14:36:42] info : 'nexus' start: '/root/nexus3/nexus-3.12.1-01/bin/nexus start'
89[CST May 13 14:36:45] info : 'nexus' process is running with pid 15830
web 控制台地址:http://10.0.0.2:2812/
主页面:
监控运行信息:
系统监控信息:
进程监控信息: