前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >EM agent无法启动的原因及分析 (r6笔记第75天)

EM agent无法启动的原因及分析 (r6笔记第75天)

作者头像
jeanron100
发布2018-03-16 16:21:52
1.2K0
发布2018-03-16 16:21:52
举报

昨天收到一条报警短信,短信内容大体如下: Agent is Unreachable(REASON=javax.net.ssl.SSLPeerUnverifiedException:xxxx.com:cn=xxxxx).Host is unreachable. 看着短信内容,应该是agent罢工了。可能出现了网络问题。 结果不一会儿就接到了同事的电话,让我看看是不是有问题。 登录到agent所在的服务器,查看agent进程还是存在的。 这个时候尝试agent的upload操作失败,就准备重新启动一下agent试试,但是尝试重启的时候报了下面的错误。 $ ./emctl start agent Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0. Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved. Starting agent ....... failed. Failed to start HTTP listener. Consult the log files in: /U01/app/oracle/product/agent10g/sysman/log 在指定的目录下查看日志,里面提示进程自动退出,也没有给出很明确的信息来 41921 :: Wed Sep 30 09:07:34 2015::AgentLifeCycle.pm: Exited loop with retCode=1 9380 :: Wed Sep 30 10:06:27 2015::AgentLifeCycle.pm: Processing status agent 9380 :: Wed Sep 30 10:06:27 2015::AgentStatus.pm:Processing status agent 9380 :: Wed Sep 30 10:06:27 2015::AgentStatus.pm:emdctl status returned 1 22086 :: Wed Sep 30 11:06:50 2015::AgentLifeCycle.pm: Processing status agent 22086 :: Wed Sep 30 11:06:50 2015::AgentStatus.pm:Processing status agent 22086 :: Wed Sep 30 11:06:50 2015::AgentStatus.pm:emdctl status returned 1 861 :: Wed Sep 30 11:40:27 2015::AgentLifeCycle.pm: Processing start agent 861 :: Wed Sep 30 11:40:27 2015::AgentLifeCycle.pm: EMHOME is /U01/app/oracle/product/agent10g 861 :: Wed Sep 30 11:40:27 2015::AgentLifeCycle.pm: service name is 861 :: Wed Sep 30 11:40:28 2015::AgentLifeCycle.pm:status agent returned with retCode=1 861 :: Wed Sep 30 11:40:32 2015::AgentLifeCycle.pm:Watch dog processs id: 906 exited with an exit code of 55 861 :: Wed Sep 30 11:40:32 2015::AgentLifeCycle.pm: Exited loop with retCode=1 查看历史日志,发现这个agent已经已经很久没有使用了,可能这个问题还没有想象的那么简单。算是个遗留问题吧。 ----- Sat Jun 27 15:09:49 2015::Checking status of EMAgent : 24232 ----- -- Timestamp (2014,07,01,14,53,38) of file /U01/app/oracle/product/agent10g/sysman/emd/upload/rawdata4.dat is more than 24 hours old. Current Time is Sat Jun 27 15:09: 52 2015----- Sat Jun 27 15:09:52 2015::Received restart request from EMAgent : 24232 ----- ----- Sat Jun 27 15:09:52 2015::Stopping EMAgent : 24232 ----- ----- Sat Jun 27 15:10:02 2015::Failed to reap child process EMAgent : 24232 ----- ----- Sat Jun 27 15:10:02 2015::Agent Launched with PID 20174 at time Sat Jun 27 15:10:02 2015 ----- ----- Sat Jun 27 15:10:02 2015::Execing EMAgent process is taking longer than expected 120 secs. ----- ----- Sat Jun 27 15:10:02 2015::Time elapsed between Launch of Watchdog process and execing EMAgent is 15996749 secs ----- (pid=20174): starting emagent version 10.2.0.5.0 (pid=20174): emagent now exiting abnormally - initialization failure. Consult '.trc' and '.log' files. ----- Sat Jun 27 15:10:05 2015::Checking status of EMAgent : 20174 ----- ----- Sat Jun 27 15:10:05 2015::EMAgent exited at Sat Jun 27 15:10:05 2015 with return value 55. ----- ----- Sat Jun 27 15:10:05 2015::EMAgent has exited due to initialization failure. ----- ----- Sat Jun 27 15:10:05 2015::Stopping other components. ----- ----- Sat Jun 27 15:10:05 2015::Commiting Process death. ----- ----- Sat Jun 27 15:10:05 2015::Exiting watchdog loop 查看最近的日志 --- Standalone agent ----- Wed Sep 30 11:44:40 2015::Agent Launched with PID 3354 at time Wed Sep 30 11:44:40 2015 ----- ----- Wed Sep 30 11:44:40 2015::Time elapsed between Launch of Watchdog process and execing EMAgent is 0 secs ----- (pid=3354): starting emagent version 10.2.0.5.0 (pid=3354): emagent now exiting abnormally - initialization failure. Consult '.trc' and '.log' files. ----- Wed Sep 30 11:44:43 2015::Checking status of EMAgent : 3354 ----- ----- Wed Sep 30 11:44:43 2015::EMAgent exited at Wed Sep 30 11:44:43 2015 with return value 55. ----- ----- Wed Sep 30 11:44:43 2015::EMAgent has exited due to initialization failure. ----- ----- Wed Sep 30 11:44:43 2015::Stopping other components. ----- ----- Wed Sep 30 11:44:43 2015::Commiting Process death. ----- ----- Wed Sep 30 11:44:43 2015::Exiting watchdog loop ----- --- Standalone agent ----- Wed Sep 30 11:51:57 2015::Agent Launched with PID 6076 at time Wed Sep 30 11:51:57 2015 ----- ----- Wed Sep 30 11:51:57 2015::Time elapsed between Launch of Watchdog process and execing EMAgent is 1 secs ----- (pid=6076): starting emagent version 10.2.0.5.0 (pid=6076): emagent now exiting abnormally - initialization failure. Consult '.trc' and '.log' files. ----- Wed Sep 30 11:52:00 2015::Checking status of EMAgent : 6076 ----- ----- Wed Sep 30 11:52:00 2015::EMAgent exited at Wed Sep 30 11:52:00 2015 with return value 55. ----- ----- Wed Sep 30 11:52:00 2015::EMAgent has exited due to initialization failure. ----- ----- Wed Sep 30 11:52:00 2015::Stopping other components. ----- ----- Wed Sep 30 11:52:00 2015::Commiting Process death. ----- ----- Wed Sep 30 11:52:00 2015::Exiting watchdog loop 查看其它相关的日志,发现了这么一段内容。 2015-09-30 13:53:08,557 Thread-37161840 ERROR engine: [oracle_emd,sadb.com:3872,EMDUploadStats] : nmeegd_GetMetricData failed : em_error=failed to get upload statistics: Inappropriate ioctl for device Error connecting to https://sadbcom:3872/emd/main/ 2015-09-30 13:53:08,557 Thread-37161840 WARN collector: <nmecmc.c> Error exit. Error message: em_error=failed to get upload statistics: Inappropriate ioctl for device Error connecting to https://sadb.com:3872/emd/main/ 这部分日志算是唯一能够看明白,而且相关的了。那么到底是怎么回事,为什么尝试连接的时候出错了? 排查了EM sever段的网络连接情况,是没有问题的。 那么我们还是按部就班来看看这个地址为什么不通。试试老办法ping # ping sadb.com PING sadb.com (10.127.xxx.34) 56(84) bytes of data. 64 bytes from sadb.com (10.127.xxx.34): icmp_seq=1 ttl=64 time=0.185 ms 64 bytes from sadb.com (10.127.xxx.34): icmp_seq=2 ttl=64 time=0.182 ms 查看网卡配置的IP情况 # ifconfig eth0 Link encap:Ethernet HWaddr C8:1F:66:B8:97:xxxx inet addr:10.127.xxx.71 Bcast:10.127.xxx.255 Mask:255.255.255.0 inet6 addr: fe80::ca1f:66ff:feb8:975a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7609183417 errors:0 dropped:0 overruns:0 frame:0 TX packets:6700541034 errors:0 dropped:162 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:957747424255 (891.9 GiB) TX bytes:1573971126949 (1.4 TiB) Interrupt:35 这个时候就有些奇怪了,ping的主机名应该是本机,怎么ping出来的IP不同了。 看看配置文件里怎么写的。 # cat /etc/hosts #127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 #::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.11.154.95 xxx.no.xxxxx.com 10.127.xxxx.34 sadb.com 127.0.0.1 localhost localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 10.127.xxx.134 adb.com 10.127.xxx.88 sadb3.com 10.127.2.128 sadb2.com 10.127.xxx.71 sadb.com 10.127.2.85 newcomm.com 10.25.36.117 xxx.no.xxx.com 这段内容还是蛮考验眼力的。 如果眼尖还是会发现配置上有问题,那就是adb.com出现了两个IP,一个是xxx.34 一个是xxx.71 这个错误还是一个很常规的错误,但是结合具体的问题分析就很容易被误导。 再次尝试ping就会发现IP地址已经指向了正确的地址了。 # ping sadb.com PING sadb.com (10.127.xxx.71) 56(84) bytes of data. 64 bytes from sadb.com (10.127.xxx.71): icmp_seq=1 ttl=64 time=0.026 ms 这个时候再没有尝试启动,只是做了一个简单验证,发现upload已经没有问题了。 $ ./emctl status agent Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0. Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved. --------------------------------------------------------------- Agent Version : 10.2.0.5.0 OMS Version : 10.2.0.5.0 Protocol Version : 10.2.0.5.0 Agent Home : /U01/app/oracle/product/agent10g Agent binaries : /U01/app/oracle/product/agent10g Agent Process ID : 24232 Parent Process ID : 39174 Agent URL : https://sadb.com:3872/emd/main/ Repository URL : https://emserver.com:1159/em/upload Started at : 2015-06-27 13:09:43 Started by user : oracle Last Reload : 2015-06-27 13:09:43 Last successful upload : 2015-09-30 14:08:02 Total Megabytes of XML files uploaded so far : 897.22 Number of XML files pending upload : 0 Size of XML files pending upload(MB) : 0.00 Available disk space on upload filesystem : 72.83% Last successful heartbeat to OMS : 2015-09-30 08:17:43 --------------------------------------------------------------- Agent is Running and Ready

明白了问题的原因,其实发现还是一个蛮低级的错误,但是为什么会出现这种情况呢,还是因为做了故障转移,主备进行了切换,结果/etc/hosts里面的主机信息只是做了追加。 所以这些潜在的问题细小,但是出现问题的时候又很容易被误导。尤其需要注意一下。

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2015-10-01,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 杨建荣的学习笔记 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
数据库
云数据库为企业提供了完善的关系型数据库、非关系型数据库、分析型数据库和数据库生态工具。您可以通过产品选择和组合搭建,轻松实现高可靠、高可用性、高性能等数据库需求。云数据库服务也可大幅减少您的运维工作量,更专注于业务发展,让企业一站式享受数据上云及分布式架构的技术红利!
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档