对于存在NAT或防火墙的RAC数据库,在启用了服务器端的load balance后,经常会碰到ORA-12545连接错误,这是因为服务器端转发客户端连接请求到其它节点后,客户端使用返回的IP再次发出连接请求而出现不可识别的IP地址或主机名而造成的。本文描述了这个问题并给出了解决方案。
有关RAC监听配置请参考 ORACLE RAC 监听配置 (listener.ora tnsnames.ora) ORACLE RAC 下非缺省端口监听配置(listener.ora tnsnames.ora)
一、NAT下监听描述图
二、配置情况
1、服务器端的配置情况
################Server node 1 listener.ora########################
LISTENER_VMDB01P =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = vmdb01pvip)(PORT = 1314)(IP = FIRST))
(ADDRESS = (PROTOCOL = TCP)(HOST = vmdb01p)(PORT = 1314)(IP = FIRST))
)
)
SID_LIST_LISTENER_VMDB01P =
(SID_LIST =
(SID_DESC =
(SID_NAME = PLSExtProc)
(ORACLE_HOME = /u01/oracle/db)
(PROGRAM = extproc)
)
)
################Server node 2 listener.ora########################
LISTENER_VMDB02P =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = vmdb02pvip)(PORT = 1314)(IP = FIRST))
(ADDRESS = (PROTOCOL = TCP)(HOST = vmdb02p)(PORT = 1314)(IP = FIRST))
)
)
SID_LIST_LISTENER_VMDB02P =
(SID_LIST =
(SID_DESC =
(SID_NAME = PLSExtProc)
(ORACLE_HOME = /u01/oracle/db)
(PROGRAM = extproc)
)
)
################Server side tnsnames.ora, same for both nodes########################
remote_lsnr_lm5330 =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = vmdb01pvip)(PORT = 1314))
(ADDRESS = (PROTOCOL = TCP)(HOST = vmdb02pvip)(PORT = 1314))
)
local_lsnr_lm5330a =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = vmdb01pvip)(PORT = 1314))
)
local_lsnr_lm5330b =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = vmdb02pvip)(PORT = 1314))
)
################Server side host table, same for both nodes########################
127.0.0.1 localhost
10.200.48.17 vmdb01p.oradb.com vmdb01p
10.200.48.18 vmdb02p.oradb.com vmdb02p
10.200.48.15 vmdb01pvip.oradb.com vmdb01pvip
10.200.48.16 vmdb02pvip.oradb.com vmdb02pvip
192.168.48.17 vmdb01pph.oradb.com vmdb01pph
192.168.48.18 vmdb02pph.oradb.com vmdb02pph
SQL> show parameter instance_name
NAME TYPE VALUE
------------------------------------ ------------------- ------------------
instance_name string LM5330A
SQL> show parameter listener
NAME TYPE VALUE
------------------------------------ ------------------- -------------------
local_listener string local_lsnr_lm5330a
remote_listener string remote_lsnr_lm5330
SQL> show parameter instance_name
NAME TYPE VALUE
------------------------------------ ------------------- ------------------
instance_name string LM5330B
SQL> show parameter listener
NAME TYPE VALUE
------------------------------------ ------------------- ------------------
local_listener string local_lsnr_lm5330b
remote_listener string remote_lsnr_lm5330
2、客户端tnsnames.ora配置
#For NAT client
LM5330TKO =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.255.48.15)(PORT = 1314))
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.255.48.16)(PORT = 1314))
(LOAD_BALANCE = yes)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = LM5330)
(FAILOVER_MODE =
(TYPE = SELECT)
(METHOD = BASIC)
(RETRIES = 180)
(DELAY = 5)
)
)
)
#For non NAT client
LM5330VIP =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.200.48.15)(PORT = 1314))
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.200.48.16)(PORT = 1314))
(LOAD_BALANCE = yes)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = LM5330)
(FAILOVER_MODE =
(TYPE = SELECT)
(METHOD = BASIC)
(RETRIES = 180)
(DELAY = 5)
)
)
)
3、客户端hosts文件配置
c:\windows\system32\drivers\etc\hosts
10.200.48.15 vmdb01pvip.oradb.com vmdb01pvip
10.200.48.16 vmdb02pvip.oradb.com vmdb02pvip
三、ORA-12545错误信息
C:\Users\Administrator>sqlplus system/system@lm5330tko
SQL*Plus: Release 10.2.0.3.0 - Production on 星期二 11月 27 16:01:23 2012
Copyright (c) 1982, 2006, Oracle. All Rights Reserved.
ERROR:
ORA-12545: Connect failed because target host or object does not exist
对于客户端发起的连接请求,有时候可以成功建立连接,有时候无法建立连接并伴随给出ORA-12545错误
oracle@bo2dbp:~> oerr ora 12545
12545, 00000, "Connect failed because target host or object does not exist"
// *Cause: The address specified is not valid, or the program being
// connected to does not exist.
// *Action: Ensure the ADDRESS parameters have been entered correctly; the
// most likely incorrect parameter is the node name. Ensure that the
// executable for the server exists (perhaps "oracle" is missing.)
// If the protocol is TCP/IP, edit the TNSNAMES.ORA file to change the
// host name to a numeric IP address and try again.
从ORA-12545错误描述来看,主要是由于解析的问题产生该类错误
四、分析与解决
1、启用sqlnet trace
为客户端的sqlnet.ora文件添加下列项来启用sqlnet trace,trace后,建议移除这些选项以避免额外的网络开销
Trace_level_client=16
Trace_directory_client=<path_to_the_trace_directory> # use the full path to the trace directory
Trace_unique_client=on
Trace_timestamp_client=on
Diag_adr_enabled=off
#Author : Robinson
#Blog : http://blog.csdn.net/robinson_0612
下面是trace文件提供的信息,关于如何做 oracle net trace 请参考:http://blog.csdn.net/robinson_0612/article/details/8254720
文件名:cli_10648_1.trc
[27-11-2012 17:59:01:694] nsc2addr: (ADDRESS=(PROTOCOL=TCP)(HOST=vmdb02pvip)(PORT=1314))
[27-11-2012 17:59:01:694] nttbnd2addr: entry
[27-11-2012 17:59:01:694] snlinGetAddrInfo: entry
[27-11-2012 17:59:01:694] snlinGetAddrInfo: Invalid IP address string vmdb02pvip
[27-11-2012 17:59:01:694] snlinFreeAddrInfo: entry
[27-11-2012 17:59:01:694] snlinFreeAddrInfo: exit
[27-11-2012 17:59:01:694] snlinGetAddrInfo: exit
[27-11-2012 17:59:01:694] nttbnd2addr: looking up IP addr for host: vmdb02pvip
[27-11-2012 17:59:01:694] snlinGetAddrInfo: entry
[27-11-2012 17:59:04:259] snlinGetAddrInfo: Name resolution failed for vmdb02pvip
[27-11-2012 17:59:04:259] snlinFreeAddrInfo: entry
[27-11-2012 17:59:04:259] snlinFreeAddrInfo: exit
[27-11-2012 17:59:04:259] snlinGetAddrInfo: exit
[27-11-2012 17:59:04:259] nttbnd2addr: *** hostname lookup failure! ***
[27-11-2012 17:59:04:259] nttbnd2addr: exit
[27-11-2012 17:59:04:259] nserror: entry
[27-11-2012 17:59:04:259] nserror: nsres: id=0, op=77, ns=12545, ns2=12560; nt[0]=515, nt[1]=1004, nt[2]=0;
ora[0]=0, ora[1]=0, ora[2]=0
[27-11-2012 17:59:04:259] nsc2addr: error exit
2、故障分析
#分析上面的trace信息,trace信息中指出了无法解析主机vmdb02pvip
#客户端显示的12545错误的发生正式由于这个vmdb02pvip无法解析。
#那有时候可以连接成功是怎么一回事呢?
#当客户端发起连接请求,由于客户端配置了load balance,因此客户端会随机从ADDRESS列表中挑选一个IP进行连接。
#假定当前选择从NAT IP 10.255.48.15发出连接请求,10.255.48.15会被自动映射到相应的virtual ip,即10.200.48.15。
#当10.200.48.15上的监听器捕获到该请求,服务器端的load balance生效,pmon进程根据监听信息发现当前节点并不繁忙,直接建立连接。
#上面的描述就是可以成功建立连接的情形。
#当服务器的监听发现其余的节点空闲时,而当前节点繁忙,监听器将实现转发。
#也就是说服务器端返回一个新的IP地址给客户端,说我比较忙,你连接到10.200.48.16吧。
#于是客户端根据所提供的IP地址重新发起连接请求,而此时客户端到10.200.48.16根本不可达,即使将ADDRESS列表改为该IP。
#所以上面的分析是产生ORA-12545的原因。
3、解决
#下面通过tracert命令来检查节点是否可达
C:\Users\Administrator>tracert vmdb02pvip
Tracing route to vmdb02pvip [10.200.48.16]
over a maximum of 30 hops:
1 1 ms 1 ms 1 ms 192.168.9.254
2 <1 ms <1 ms <1 ms 10.97.5.1
3 * * * Request timed out. #超时,节点不可达
4 * * * Request timed out.
5 * * * Request timed out.
6 * * * Request timed out.
#考虑将客户端hosts文件作下列更改,即将映射关系的IP直接改为NAT的IP
c:\windows\system32\drivers\etc\hosts
#10.200.48.15 vmdb01pvip.oradb.com vmdb01pvip
#10.200.48.16 vmdb02pvip.oradb.com vmdb02pvip
10.255.48.15 vmdb01pvip.oradb.com vmdb01pvip
10.255.48.16 vmdb02pvip.oradb.com vmdb02pvip
#再次tracert
C:\Users\Administrator>tracert vmdb02pvip
Tracing route to v2048db02pvip [10.255.48.16]
over a maximum of 30 hops:
1 <1 ms 1 ms 1 ms 192.168.9.254
2 <1 ms <1 ms <1 ms 10.97.5.1
3 8 ms 5 ms 4 ms 10.100.34.4
4 6 ms 4 ms 4 ms vmdb02pvip [10.255.48.16]
5 8 ms 4 ms 4 ms vmdb02pvip [10.255.48.16]
Trace complete.
#再次建立连接时,不再出现ORA-12545
C:\Users\Administrator>netstat
Active Connections
Proto Local Address Foreign Address State
TCP 192.168.9.12:59541 vmdb01pvip:1314 ESTABLISHED
TCP 192.168.9.12:59558 vmdb02pvip:1314 ESTABLISHED
其次也可以通过禁用服务器端的load balance来避免该错误,当然此方法不被推荐。
alter system set remote_listener='' sid='*';
也可以通过配置CMAN(Oracle Connection Manager)来解决该问题