DBA必备技能:通过truss跟踪解决监听无法启动案例

作者简介:刘斌,云和恩墨高级技术专家,擅长数据库故障诊断分析,数据库性能优化,自动化运维开发,坚持学习、写作、分享,

在Oracle DBA的日常工作中,通过各种跟踪手段,从数据库内外部发现问题,最终找到解决方案,是必备的重要技能。

以下这则案例,就是通过OS系统级别的跟踪,快速定位并解决问题的一个例证。在Oracle数据库的跟踪时,OS上Truss是非常重要的工具。

登录数据库主机发现一个节点监听异常:

尝试手工启动,一样报错:

grid@xxxxdbb:/home/grid $lsnrctl statuss LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.4.0 - Production on 04-MAR-2016 01:23:34 Copyright (c) 1991, 2013, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER))) TNS-12541: TNS:no listener TNS-12560: TNS:protocol adapter error TNS-00511: No listener IBM/AIX RISC System/6000 Error: 79: Connection refused

第一怀疑就是hosts文件被改了,查看host没发现修改过。从上面看出是使用ipc方式启动,尝试truss看看能不能有信息。

在Linux上调用truss跟踪一个操作非常简单,以下跟踪了lsnrctl start操作的过程:

truss lsnrctl start

execve("/usr/bin/lsnrctl", 0x2FF22BF8, 0x200138A8) Err#2 ENOENT execve("/etc/lsnrctl", 0x2FF22BF8, 0x200138A8) Err#2 ENOENT execve("/usr/sbin/lsnrctl", 0x2FF22BF8, 0x200138A8) Err#2 ENOENT execve("/usr/ucb/lsnrctl", 0x2FF22BF8, 0x200138A8) Err#2 ENOENT execve("/home/grid/bin/lsnrctl", 0x2FF22BF8, 0x200138A8) Err#2 ENOENT execve("/usr/bin/X11/lsnrctl", 0x2FF22BF8, 0x200138A8) Err#2 ENOENT execve("/sbin/lsnrctl", 0x2FF22BF8, 0x200138A8) Err#2 ENOENT execve("./lsnrctl", 0x2FF22BF8, 0x200138A8) Err#2 ENOENT execve("/u01/app/11.2.0.4/grid/bin/lsnrctl", 0x2FF22BF8, 0x200138A8) argc: 2 kusla(2, 0x09FFFFFFF0001170) Err#1 EPERM read_sysconfig(0x09001000A07D1550,, 0x09001000A0806E68) = 0x0000000000000000 sbrk(0x0000000000000000) = 0x00000001101D54D8 vmgetinfo(0x0FFFFFFFFFFFF140, 7, 16) = 0 sbrk(0x0000000000000000) = 0x00000001101D54D8 sbrk(0x0000000000000008) = 0x00000001101D54D8 __libc_sbrk(0x0000000000010020) = 0x00000001101D54E0 thread_init(0x0900000000520760, 0x09001000A0888470) = sbrk(0x0000000000000000) = 0x00000001101E5500 vmgetinfo(0x0FFFFFFFFFFFF780, 7, 16) = 0 smcr_procattr(0, 1, 0x0FFFFFFFFFFFF778) Err#109 ENOSYS getrpid(-1, -1, 648535941220675944) = 8912948 _getpid() = 8912948 getprocs64(0x00000001101D7BF0, 5024, , 1) = 1 appulimit(1005, 0) = 0x0FFFFFFFFE000000 _thread_self() = 66453675 thread_setmystate(0x0000000000000000, 0x0FFFFFFFFFFFF2A0) = 0 .....省略.... _getpid() = 8912948 accessx("/etc/secvars.cfg", 04, 0) = 0 statx("/etc/secvars.cfg", 0x0FFFFFFFFFFF6FB0, 176, 0) = 0 _getpid() = 8912948 accessx("/etc/passwd", 04, 0) = 0 statx("/etc/passwd", 0x0FFFFFFFFFFF6E20, 176, 0) = 0 accessx("/etc/security/passwd", 04, 0) Err#13 EACCES accessx("/etc/passwd", 04, 0) = 0 statx("/etc/passwd", 0x0FFFFFFFFFFF6B50, 176, 0) = 0 accessx("/etc/passwd", 04, 0) = 0 statx("/etc/passwd", 0x0FFFFFFFFFFF6C20, 176, 0) = 0 _getpid() = 8912948 accessx("/etc/passwd", 04, 0) = 0 statx("/etc/passwd", 0x0FFFFFFFFFFF6E20, 176, 0) = 0 accessx("/etc/security/passwd", 04, 0) Err#13 EACCES accessx("/etc/passwd", 04, 0) = 0 statx("/etc/passwd", 0x0FFFFFFFFFFF6B50, 176, 0) = 0 accessx("/etc/passwd", 04, 0) = 0 statx("/etc/passwd", 0x0FFFFFFFFFFF6C20, 176, 0) = 0 accessx("/etc/security/passwd", 04, 0) Err#13 EACCES close(5) = 0 gethostname(0x0FFFFFFFFFFF9C00, 512) = 0 access("/tmp/.oracle", 0) = 0 chmod("/tmp/.oracle", 01777) Err#1 EPERM socket(1, 1, 0) = 5 access("/tmp/.oracle/sLISTENER", 0) = 0 connext(5, 0x0FFFFFFFFFFF5BA8, 1025) Err#79 ECONNREFUSED -->注意这里,发现要读取/tmp/.oracle/sLISTENER access("/tmp/.oracle/sLISTENER", 0) = 0 _nsleep(0x0FFFFFFFFFFF5580, 0x0FFFFFFFFFFF5650) = 0 close(5) = 0 socket(1, 1, 0) = 5 connext(5, 0x0FFFFFFFFFFF5BA8, 1025) Err#79 ECONNREFUSED access("/tmp/.oracle/sLISTENER", 0) = 0 -->注意,这里大量读取出错的信息抛出 _nsleep(0x0FFFFFFFFFFF5580, 0x0FFFFFFFFFFF5650) = 0 close(5) = 0 socket(1, 1, 0) = 5 connext(5, 0x0FFFFFFFFFFF5BA8, 1025) Err#79 ECONNREFUSED access("/tmp/.oracle/sLISTENER", 0) = 0 _nsleep(0x0FFFFFFFFFFF5580, 0x0FFFFFFFFFFF5650) = 0 close(5) = 0 。。。。。 socket(1, 1, 0) = 5 connext(5, 0x0FFFFFFFFFFF5BA8, 1025) Err#79 ECONNREFUSED access("/tmp/.oracle/sLISTENER", 0) = 0 _nsleep(0x0FFFFFFFFFFF5580, 0x0FFFFFFFFFFF5650) = 0 close(5) = 0 socket(1, 1, 0) = 5 connext(5, 0x0FFFFFFFFFFF5BA8, 1025) Err#79 ECONNREFUSED access("/tmp/.oracle/sLISTENER", 0) = 0 close(5) = 0 kopen("/u01/app/11.2.0.4/grid/network/mesg/tnsus.msb", O_RDONLY) = 5 kfcntl(5, F_SETFD, 0x0000000000000001) = 0 lseek(5, 0, 0) = 0 kread(5, "1513 "011303\t\t\0\0\0\0".., 256) = 256 lseek(5, 512, 0) = 512 kread(5, " 19B\0\0\0\0\0\0\0\0\0\0".., 512) = 512 lseek(5, 1024, 0) = 1024 kread(5, "\0\t\012\01A\0 &\0 -\0 F".., 172) = 172 lseek(5, 19456, 0) = 19456 kread(5, "\00F04 $\0\0\0 b04 %\0\0".., 512) = 512 Starting /u01/app/11.2.0.4/grid/bin/tnslsnr: please wait... kwrite(1, " S t a r t i n g / u 0".., 61) = 61 kfcntl(1, F_GETFL, 0x0000000000000008) = 67110914 pipe(0x0FFFFFFFFFFF80E0) = 0 pipe(0x0FFFFFFFFFFF80D8) = 0 sigprocmask(0, 0x09001000A090DB70, 0x09001000A090DB90) = 0 kfork() = 19988596 kread(8, " N T P 0 1 4 4 8 3 5 7".., 64) = 14 _getpid() = 8912948 kfcntl(8, F_SETFD, 0x0000000000000001) = 0 kwrite(7, "\0 \001\0\0\001 :01 ,".., 179) = 179 kread(8, "\0\f\0\004\0\0\0 "\001 ., 8208) = 492 close(7) = 0 close(8) = 0 --> 在经过了连续的尝试无法锁定文件后,出错,提示无法启动: TNSLSNR for IBM/AIX RISC System/6000: Version 11.2.0.4.0 - Production System parameter file is /u01/app/11.2.0.4/grid/network/admin/listener.ora Log messages written to /u01/app/grid/diag/tnslsnr/xxxxdbb/listener/alert/log.xml Error listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER))) TNS-12555: TNS:permission denied TNS-12560: TNS:protocol adapter error TNS-00525: Insufficient privilege for operation IBM/AIX RISC System/6000 Error: 1: Not owner kwrite(1, " T N S L S N R f o r ".., 470) = 470 kwrite(1, "\n", 1) = 1 lseek(5, 19968, 0) = 19968 kread(5, "\0\r04 5\0\0\0 V04 6\0\0".., 512) = 512 Listener failed to start. See the error message(s) above... kwrite(1, " L i s t e n e r f a i".., 61) = 61 kfcntl(1, F_GETFL, 0x0000000000000008) = 67110914 close(4) = 0 kfcntl(1, F_GETFL, 0x0000000000000008) = 67110914 close(5) = 0 kfcntl(1, F_GETFL, 0x0000000013F600AB) = 67110914 kfcntl(2, F_GETFL, 0x0000000013F600AB) = 67110914 _exit(1)

尝试删除/tmp/.oracle/sLISTENER后监听正常启动.

事实上,在 /tmp/.oracle 目录,或者有些平台在 /var/tmp/.oracle 目录,存放一些数据库运行时的临时文件,包括基于socket的协议监听临时文件。在某些异常情况下,数据库关闭时没有及时清理这些文件,导致启动出现问题。这个案例就是这样的情况之一。

以下是一段相关描述:

The hidden directory '/var/tmp/.oracle' (or /tmp/.oracle on some platforms) or its content was removed while instances & the CRS stack were up and running. Typically this directory contains a number of "special" socket files that are used by local clients to connect via the IPC protocol (sqlnet) to various Oracle processes including the TNS listener, the CSS, CRS & EVM daemons or even database or ASM instances. These files are created when the "listening" process starts.

这是应用truss的诊断案例一则,与大家分享。


原文发布于微信公众号 - 数据和云(OraNews)

原文发表时间:2017-06-26

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏乐沙弥的世界

publickey,gssapi-with-mic,Unspecified GSS failure

        最近的MHA配置时碰到了Permission denied (publickey,gssapi-with-mic,password)这个错误提示...

582
来自专栏Netkiller

Spring Cloud Config

摘要: 本文节选自《Netkiller Java 手札》 Spring Cloud Config 本文节选自《Netkiller Java 手札》 https:...

3507
来自专栏后端技术探索

使用Swoole异步发送邮件

最近做公司的一个管理系统,需要把每天的统计信息发送到领导的邮箱。由于使用SMTP协议发送邮件的速度太慢,所以只能异步发送。刚开始实现了一个基于php-reque...

551
来自专栏c#开发者

数据驱动开发For Silverlight WCF RIA1.0 三步曲

数据驱动开发For Silverlight WCF RIA1.0 三步曲 WCF RIA 1.0的正式发布,让Silverlight开发业务应用系统变得更加简...

2474
来自专栏名山丶深处

springboot集成rabbitmq(实战)

952
来自专栏ASP.NETCore

.NET Core使用NLog通过Kafka实现日志收集

NET Core越来越受欢迎,因为它具有在多个平台上运行的原始.NET Framework的强大功能。Kafka正迅速成为软件行业的标准消息传递技术。这篇文章简...

525
来自专栏杨建荣的学习笔记

dg broker校验失败的一个奇怪问题(二) (r8笔记第51天)

对昨天提出的问题做了一个简单的分析和排查,也算是有了一个交代,上一篇文章在 dg broker校验失败的一个奇怪问题 我查看了最近的日志,发现在半个月以前有一...

3275
来自专栏智能计算时代

Microservices Ecosystem Transit Map

…we assembled a map of the ecosystem to help guide practitioners, vendors, inves...

2984
来自专栏逸鹏说道

★Kali信息收集★8.Nmap :端口扫描

突然发现,微信一次最多推送8篇 参数:(Zenmap是Nmap图形化工具,不想打指令的可以直接使用) 详细:https://nmap.org/man/zh/in...

3494
来自专栏喵了个咪的博客空间

EMQ百万级MQTT消息服务(TLS Docker Golang)

1553

扫描关注云+社区