DBA必备技能:通过truss跟踪解决监听无法启动案例

作者简介:刘斌,云和恩墨高级技术专家,擅长数据库故障诊断分析,数据库性能优化,自动化运维开发,坚持学习、写作、分享,

在Oracle DBA的日常工作中,通过各种跟踪手段,从数据库内外部发现问题,最终找到解决方案,是必备的重要技能。

以下这则案例,就是通过OS系统级别的跟踪,快速定位并解决问题的一个例证。在Oracle数据库的跟踪时,OS上Truss是非常重要的工具。

登录数据库主机发现一个节点监听异常:

尝试手工启动,一样报错:

grid@xxxxdbb:/home/grid $lsnrctl statuss LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.4.0 - Production on 04-MAR-2016 01:23:34 Copyright (c) 1991, 2013, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER))) TNS-12541: TNS:no listener TNS-12560: TNS:protocol adapter error TNS-00511: No listener IBM/AIX RISC System/6000 Error: 79: Connection refused

第一怀疑就是hosts文件被改了,查看host没发现修改过。从上面看出是使用ipc方式启动,尝试truss看看能不能有信息。

在Linux上调用truss跟踪一个操作非常简单,以下跟踪了lsnrctl start操作的过程:

truss lsnrctl start

execve("/usr/bin/lsnrctl", 0x2FF22BF8, 0x200138A8) Err#2 ENOENT execve("/etc/lsnrctl", 0x2FF22BF8, 0x200138A8) Err#2 ENOENT execve("/usr/sbin/lsnrctl", 0x2FF22BF8, 0x200138A8) Err#2 ENOENT execve("/usr/ucb/lsnrctl", 0x2FF22BF8, 0x200138A8) Err#2 ENOENT execve("/home/grid/bin/lsnrctl", 0x2FF22BF8, 0x200138A8) Err#2 ENOENT execve("/usr/bin/X11/lsnrctl", 0x2FF22BF8, 0x200138A8) Err#2 ENOENT execve("/sbin/lsnrctl", 0x2FF22BF8, 0x200138A8) Err#2 ENOENT execve("./lsnrctl", 0x2FF22BF8, 0x200138A8) Err#2 ENOENT execve("/u01/app/11.2.0.4/grid/bin/lsnrctl", 0x2FF22BF8, 0x200138A8) argc: 2 kusla(2, 0x09FFFFFFF0001170) Err#1 EPERM read_sysconfig(0x09001000A07D1550,, 0x09001000A0806E68) = 0x0000000000000000 sbrk(0x0000000000000000) = 0x00000001101D54D8 vmgetinfo(0x0FFFFFFFFFFFF140, 7, 16) = 0 sbrk(0x0000000000000000) = 0x00000001101D54D8 sbrk(0x0000000000000008) = 0x00000001101D54D8 __libc_sbrk(0x0000000000010020) = 0x00000001101D54E0 thread_init(0x0900000000520760, 0x09001000A0888470) = sbrk(0x0000000000000000) = 0x00000001101E5500 vmgetinfo(0x0FFFFFFFFFFFF780, 7, 16) = 0 smcr_procattr(0, 1, 0x0FFFFFFFFFFFF778) Err#109 ENOSYS getrpid(-1, -1, 648535941220675944) = 8912948 _getpid() = 8912948 getprocs64(0x00000001101D7BF0, 5024, , 1) = 1 appulimit(1005, 0) = 0x0FFFFFFFFE000000 _thread_self() = 66453675 thread_setmystate(0x0000000000000000, 0x0FFFFFFFFFFFF2A0) = 0 .....省略.... _getpid() = 8912948 accessx("/etc/secvars.cfg", 04, 0) = 0 statx("/etc/secvars.cfg", 0x0FFFFFFFFFFF6FB0, 176, 0) = 0 _getpid() = 8912948 accessx("/etc/passwd", 04, 0) = 0 statx("/etc/passwd", 0x0FFFFFFFFFFF6E20, 176, 0) = 0 accessx("/etc/security/passwd", 04, 0) Err#13 EACCES accessx("/etc/passwd", 04, 0) = 0 statx("/etc/passwd", 0x0FFFFFFFFFFF6B50, 176, 0) = 0 accessx("/etc/passwd", 04, 0) = 0 statx("/etc/passwd", 0x0FFFFFFFFFFF6C20, 176, 0) = 0 _getpid() = 8912948 accessx("/etc/passwd", 04, 0) = 0 statx("/etc/passwd", 0x0FFFFFFFFFFF6E20, 176, 0) = 0 accessx("/etc/security/passwd", 04, 0) Err#13 EACCES accessx("/etc/passwd", 04, 0) = 0 statx("/etc/passwd", 0x0FFFFFFFFFFF6B50, 176, 0) = 0 accessx("/etc/passwd", 04, 0) = 0 statx("/etc/passwd", 0x0FFFFFFFFFFF6C20, 176, 0) = 0 accessx("/etc/security/passwd", 04, 0) Err#13 EACCES close(5) = 0 gethostname(0x0FFFFFFFFFFF9C00, 512) = 0 access("/tmp/.oracle", 0) = 0 chmod("/tmp/.oracle", 01777) Err#1 EPERM socket(1, 1, 0) = 5 access("/tmp/.oracle/sLISTENER", 0) = 0 connext(5, 0x0FFFFFFFFFFF5BA8, 1025) Err#79 ECONNREFUSED -->注意这里,发现要读取/tmp/.oracle/sLISTENER access("/tmp/.oracle/sLISTENER", 0) = 0 _nsleep(0x0FFFFFFFFFFF5580, 0x0FFFFFFFFFFF5650) = 0 close(5) = 0 socket(1, 1, 0) = 5 connext(5, 0x0FFFFFFFFFFF5BA8, 1025) Err#79 ECONNREFUSED access("/tmp/.oracle/sLISTENER", 0) = 0 -->注意,这里大量读取出错的信息抛出 _nsleep(0x0FFFFFFFFFFF5580, 0x0FFFFFFFFFFF5650) = 0 close(5) = 0 socket(1, 1, 0) = 5 connext(5, 0x0FFFFFFFFFFF5BA8, 1025) Err#79 ECONNREFUSED access("/tmp/.oracle/sLISTENER", 0) = 0 _nsleep(0x0FFFFFFFFFFF5580, 0x0FFFFFFFFFFF5650) = 0 close(5) = 0 。。。。。 socket(1, 1, 0) = 5 connext(5, 0x0FFFFFFFFFFF5BA8, 1025) Err#79 ECONNREFUSED access("/tmp/.oracle/sLISTENER", 0) = 0 _nsleep(0x0FFFFFFFFFFF5580, 0x0FFFFFFFFFFF5650) = 0 close(5) = 0 socket(1, 1, 0) = 5 connext(5, 0x0FFFFFFFFFFF5BA8, 1025) Err#79 ECONNREFUSED access("/tmp/.oracle/sLISTENER", 0) = 0 close(5) = 0 kopen("/u01/app/11.2.0.4/grid/network/mesg/tnsus.msb", O_RDONLY) = 5 kfcntl(5, F_SETFD, 0x0000000000000001) = 0 lseek(5, 0, 0) = 0 kread(5, "1513 "011303\t\t\0\0\0\0".., 256) = 256 lseek(5, 512, 0) = 512 kread(5, " 19B\0\0\0\0\0\0\0\0\0\0".., 512) = 512 lseek(5, 1024, 0) = 1024 kread(5, "\0\t\012\01A\0 &\0 -\0 F".., 172) = 172 lseek(5, 19456, 0) = 19456 kread(5, "\00F04 $\0\0\0 b04 %\0\0".., 512) = 512 Starting /u01/app/11.2.0.4/grid/bin/tnslsnr: please wait... kwrite(1, " S t a r t i n g / u 0".., 61) = 61 kfcntl(1, F_GETFL, 0x0000000000000008) = 67110914 pipe(0x0FFFFFFFFFFF80E0) = 0 pipe(0x0FFFFFFFFFFF80D8) = 0 sigprocmask(0, 0x09001000A090DB70, 0x09001000A090DB90) = 0 kfork() = 19988596 kread(8, " N T P 0 1 4 4 8 3 5 7".., 64) = 14 _getpid() = 8912948 kfcntl(8, F_SETFD, 0x0000000000000001) = 0 kwrite(7, "\0 \001\0\0\001 :01 ,".., 179) = 179 kread(8, "\0\f\0\004\0\0\0 "\001 ., 8208) = 492 close(7) = 0 close(8) = 0 --> 在经过了连续的尝试无法锁定文件后,出错,提示无法启动: TNSLSNR for IBM/AIX RISC System/6000: Version 11.2.0.4.0 - Production System parameter file is /u01/app/11.2.0.4/grid/network/admin/listener.ora Log messages written to /u01/app/grid/diag/tnslsnr/xxxxdbb/listener/alert/log.xml Error listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER))) TNS-12555: TNS:permission denied TNS-12560: TNS:protocol adapter error TNS-00525: Insufficient privilege for operation IBM/AIX RISC System/6000 Error: 1: Not owner kwrite(1, " T N S L S N R f o r ".., 470) = 470 kwrite(1, "\n", 1) = 1 lseek(5, 19968, 0) = 19968 kread(5, "\0\r04 5\0\0\0 V04 6\0\0".., 512) = 512 Listener failed to start. See the error message(s) above... kwrite(1, " L i s t e n e r f a i".., 61) = 61 kfcntl(1, F_GETFL, 0x0000000000000008) = 67110914 close(4) = 0 kfcntl(1, F_GETFL, 0x0000000000000008) = 67110914 close(5) = 0 kfcntl(1, F_GETFL, 0x0000000013F600AB) = 67110914 kfcntl(2, F_GETFL, 0x0000000013F600AB) = 67110914 _exit(1)

尝试删除/tmp/.oracle/sLISTENER后监听正常启动.

事实上,在 /tmp/.oracle 目录,或者有些平台在 /var/tmp/.oracle 目录,存放一些数据库运行时的临时文件,包括基于socket的协议监听临时文件。在某些异常情况下,数据库关闭时没有及时清理这些文件,导致启动出现问题。这个案例就是这样的情况之一。

以下是一段相关描述:

The hidden directory '/var/tmp/.oracle' (or /tmp/.oracle on some platforms) or its content was removed while instances & the CRS stack were up and running. Typically this directory contains a number of "special" socket files that are used by local clients to connect via the IPC protocol (sqlnet) to various Oracle processes including the TNS listener, the CSS, CRS & EVM daemons or even database or ASM instances. These files are created when the "listening" process starts.

这是应用truss的诊断案例一则,与大家分享。


原文发布于微信公众号 - 数据和云(OraNews)

原文发表时间:2017-06-26

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏源哥的专栏

SaaS行业命名规范

    很多企业在启动软件开发的时候,完成没有命名规范,导致代码的可读性极差。而业界对于命名,却没有一个统一的命名规范,比如说,获取客户列表,Java类的方法是...

1333
来自专栏生信技能树

【直播】我的基因组72:把基因检测芯片数据转为vcf格式

这个需求比较少见,主要是因为有很多朋友都做了基因检测芯片数据,而芯片检测的结果只有4列,分别是dbSNP数据库ID号,染色体,坐标,还有基因型。如下: head...

43511
来自专栏杨建荣的学习笔记

dg的奇怪问题终结和分区问题答疑 (r7笔记第77天)

今天来说几个问题,一个是对昨天《让我焦灼的四个问题》的升华,不能起博眼球的题目,技术分析给大家兜底了,你们看看有没有类似的问题。 还有几个小问题说说今天的感受和...

3435
来自专栏生信技能树

SNV突变(96种)频谱的制作

昨天我们学习了正常情况下,6种SNV(C>A, C>G, C>T, T>A, T>C, T>G)突变频谱的制作,但是如果考虑到突变的上下文,就可以变成96种(如...

3205
来自专栏杨建荣的学习笔记

GoldenGate数据迁移的问题总结(二)(r10笔记第85天)

昨天使用GoldenGate同步数据,数据量玩得有些大了。最后发现很多小问题变得更加严峻,比如空间问题。 而且由于没有更多的经验,导致这个问题被我引入了另外一个...

3239
来自专栏生信技能树

【直播】我的基因组46:SNV突变(96种)频谱的制作

昨天我们学习了正常情况下,6种SNV(C>A, C>G, C>T, T>A, T>C, T>G)突变频谱的制作,但是如果考虑到突变的上下文,就可以变成96种(如...

3778
来自专栏生信技能树

HiC数据分析实战(一)

来自于Tung B. K. Le et al. Science 2013 :https://www.ncbi.nlm.nih.gov/sra/?term=srr...

2603
来自专栏Android群英传

Badge分析&如何逼死处女座

1553
来自专栏吾真本

Release It! 第2版目录(中英文对照)

注:其中,最后17个小节标题尚未译完,其余译完的标题尚未定稿,会与正式版有出入。仅供参考。

2762
来自专栏雪胖纸的玩蛇日常

Vue+Django2.0 REST framework打造前后端分离的生鲜电商项目(三)设计数据库以及导入原始数据

4005

扫码关注云+社区