使用 SQLNET.EXPIRE_TIME 清除僵死连接

    数据库连接的客户端异常断开后,其占有的相应并没有被释放,如从v$session视图中依旧可以看到对应的session处于inactive,且对应的服务器进程也没有释放,导致资源长时间地被占用,对于这种情形开该如何处理呢?SQLNET.EXPIRE_TIME对于这个问题我们提供了解决方案,专门用于清理那些异常断开的情形,如网络异常中断,客户端异常掉电,异常重启等。本文描述了设置SQLNET.EXPIRE_TIME参数以及演示死连接以及资源被释放的情形。

1、理解SQLNET.EXPIRE_TIME参数    Use parameter SQLNET.EXPIRE_TIME to specify a the time interval, in minutes, to send a probe to verify that client/server    connections are active.    Setting a value greater than 0 ensures that connections are not left open indefinitely, due to an abnormal client termination.    If the probe finds a terminated connection, or a connection that is no longer in use, it returns an error, causing the    server process to exit.    This parameter is primarily intended for the database server,which typically handles multiple connections at any one time.    通过设定参数为非零值(分钟)来发送探测包以检查客户端的异常断开。一旦探测包找到了异常的连接将返回错误,清除对应的server process    下面是参数使用的一些限制。(缺省值为0,最小值0,建议值10。SQLNET.EXPIRE_TIME=10)    Limitations on using this terminated connection detection feature are:       It is not allowed on bequeathed connections.       Though very small, a probe packet generates additional traffic that may downgrade network performance.       Depending on which operating system is in use, the server may need to perform additional processing to distinguish       the connection probing event from other events that occur. This can also result in degraded network performance.

2、Dead Connection Detection (DCD)与Inactive Sessions

Dead connections:    These are previously valid connections with the database but the connection between the client and server processes has    terminated abnormally.    Examples of a dead connection:    - A user reboots/turns-off their machine without logging off or disconnecting from the database.    - A network problem prevents communication between the client and the server.    In these cases, the shadow process running on the server and the session in the database may not terminate.    Implemented by          * adding SQLNET.EXPIRE_TIME = <MINUTES> to the sqlnet.ora file    With DCD is enabled, the Server-side process sends a small 10-byte packet to the client process after the duration of    the time interval specified in minutes by the SQLNET.EXPIRE_TIME parameter.    If the client side connection is still connected and responsive, the client sends a response packet back to the database    server, resetting the timer..and another packet will be sent when next interval expires (assuming no other activity on    the connection).    If the client fails to respond to the DCD probe packet         * the Server side process is marked as a dead connection and         * PMON performs the clean up of the database processes / resources         * The client OS processes are terminated    NOTE: SQLNET.RECV_TIMEOUT can be set on the SERVER side sqlnet.ora file. This will set a timeout for the server process          to wait for data from the client process.

Inactive Sessions:    These are sessions that remain connected to the database with a status in v$session of INACTIVE.    Example of an INACTIVE session:    - A user starts a program/session, then leaves it running and idle for an extended period of time.

3、配置SQLNET.EXPIRE_TIME

#对于SQLNET.EXPIRE_TIME的配置,需要修改sqlnet.ora,然后添加SQLNET.EXPIRE_TIME项
[oracle@orasrv admin]$ more sqlnet.ora
sqlnet.expire_time = 1     #仅仅需要配置此项,后面的各项仅仅是为了生成跟踪日志,可省略
TRACE_LEVEL_SERVER = 16 
TRACE_FILE_SERVER = SERVER
TRACE_DIRECTORY_SERVER= /u01/app/oracle/network/trace 
TRACE_TIMESTAMP_ SERVER = ON 
TRACE_UNIQUE_SERVER = ON
DIAG_ADR_ENABLED=OFF

4、模拟及测试DCD连接

C:\Users\robinson.cheng>sqlplus scott/tiger@ora11g    --->从windows客户端发起连接

SQL*Plus: Release 11.2.0.1.0 Production on Tue Jun 25 09:57:59 2013

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

---Issued the sql to hold a lock
SQL> update emp set sal=sal*1.1 where deptno=20;   

5 rows updated.

--disabled the network adapter in VM setting
SQL> select * from dual;
select * from dual
       *
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 29522
Session ID: 15 Serial number: 447

--服务器端环境   
SQL> select * from v$version where rownum<2;  
  
BANNER  
--------------------------------------------------------------------------------   
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production  

--在服务器端查看session的情况,SCOTT用户的session状态为INACTIVE
SQL> @comm_sess_users;

+----------------------------------------------------+
| User Sessions (All)                                |
+----------------------------------------------------+
Instance     SID Serial ID    Status Oracle User     O/S User O/S PID Session Program              Terminal       Machine
--------- ------ --------- --------- ----------- ------------ ------- -------------------------- ---------- -------------
ora11g        15       447  INACTIVE       SCOTT Robinson.Che   29522 sqlplus.exe                      PC39  TRADESZ\PC39
             125         5  INACTIVE         SYS       oracle    4734 sqlplus@orasrv.com (TNS V1      pts/0    orasrv.com
             139         9    ACTIVE         SYS       oracle   29447 sqlplus@orasrv.com (TNS V1      pts/4    orasrv.com

--Get the spid for user scott by SID
SQL> @my_spid_from_sid
Enter value for input_sid: 15
old   4: AND s.sid = &input_sid
new   4: AND s.sid = 15

   SID    SERIAL# SPID
------ ---------- ------------------------
    15        447 29522

--To find the locked object
SQL> @lock_obj

OBJECT_NAME||''||LOCKED_MODE||''||CTIME||''||C.SID||''||SERIAL#
------------------------------------------------------------------
EMP   3   14   15  447
EMP   3   83   15  447

--The trace file exists
SQL> ho ls -hltr /u01/app/oracle/network/trace/s*29522*
-rw-r----- 1 oracle oinstall 241K Jun 25 09:59 /u01/app/oracle/network/trace/server_29522.trc

--->try to issue another sql. the sql is blocked
SQL> set time on;
10:03:46 SQL> delete scott.emp where deptno=20;  
delete scott.emp where deptno=20
             *
ERROR at line 1:
ORA-01013: user requested cancel of current operation

--Check the server process for scott 
10:04:37 SQL> ho ps -ef | grep 29522 | grep -v grep
oracle   29522     1  0 09:58 ?        00:00:00 oracleora11g (LOCAL=NO)

--Could not reach to client from server.
10:06:51 SQL> ho ping 192.168.7.133
PING 192.168.7.133 (192.168.7.133) 56(84) bytes of data.
From 192.168.7.40 icmp_seq=2 Destination Host Unreachable
From 192.168.7.40 icmp_seq=3 Destination Host Unreachable
From 192.168.7.40 icmp_seq=4 Destination Host Unreachable
From 192.168.7.40 icmp_seq=6 Destination Host Unreachable
From 192.168.7.40 icmp_seq=7 Destination Host Unreachable
From 192.168.7.40 icmp_seq=8 Destination Host Unreachable

--此时总进程数为27个
10:15:08 SQL> select count(*) from v$process;

  COUNT(*)
----------
        27

--从09:58进程启动开始到10:17:59进程依旧没有被释放
10:17:59 SQL> ho ps -ef | grep 29522 | grep -v grep
oracle   29522     1  0 09:58 ?        00:00:00 oracleora11g (LOCAL=NO)

-->At this time the server process was released
10:18:08 SQL> ho ps -ef | grep 29522 | grep -v grep

--进程释放后此时进程总数变为26个
10:19:45 SQL> select count(*) from v$process;

  COUNT(*)
----------
        26

-->the lock was released
10:19:54 SQL> @lock_obj

no rows selected

--Author : Robinson
--Blog   : http://blog.csdn.net/robinson_0612

--scott用户的session已经从v$session中被移除
10:20:03 SQL> @comm_sess_users;

+----------------------------------------------------+
| User Sessions (All)                                |
+----------------------------------------------------+

Instance     SID Serial ID    Status    Oracle User     O/S User O/S PID Session Program            Terminal    Machine
--------- ------ --------- --------- -------------- ------------ ------- -------------------------- -------- ----------
ora11g       125         5  INACTIVE            SYS       oracle    4734 sqlplus@orasrv.com (TNS V1    pts/0 orasrv.com
             139         9    ACTIVE            SYS       oracle   29447 sqlplus@orasrv.com (TNS V1    pts/4 orasrv.com

5、查看SQLNET.EXPIRE_TIME是否启用

#下面对跟踪日志过滤,可以看到09:58:02:853中提示开启dead connection detection
[oracle@orasrv trace]$ cat -n server_29522.trc |grep dead
    78  [25-JUN-2013 09:58:02:853] niotns: Enabling dead connection detection (1 min)

#下面的查询中,在09:58:03 timer被启动,10:18:26后,连接被彻底关闭(包括server process)    
[oracle@orasrv trace]$ cat -n server_29522.trc |grep timer
   447  [25-JUN-2013 09:58:03:050] nstimstart: starting timer at 25-JUN-2013 09:58:03
   451  [25-JUN-2013 09:58:03:051] nsconbrok: timer created for connection
  4092  [25-JUN-2013 10:18:26:173] nstimarmed: timer is armed, with value 3833

#下面是starting timer的详细信息  
[oracle@orasrv trace]$ head -451 server_29522.trc | tail -5
[25-JUN-2013 09:58:03:050] nstimstart: starting timer at 25-JUN-2013 09:58:03
[25-JUN-2013 09:58:03:051] nstimset: entry
[25-JUN-2013 09:58:03:051] nstimset: normal exit
[25-JUN-2013 09:58:03:051] nstimstart: normal exit
[25-JUN-2013 09:58:03:051] nsconbrok: timer created for connection 

#下面是timer被清除后的详细信息nstimclear: normal exit
[oracle@orasrv trace]$ head -4097 server_29522.trc | tail -7
[25-JUN-2013 10:18:26:173] nstimarmed: entry
[25-JUN-2013 10:18:26:173] nstimarmed: timer is armed, with value 3833
[25-JUN-2013 10:18:26:173] nstimarmed: normal exit
[25-JUN-2013 10:18:26:173] nstimclear: entry
[25-JUN-2013 10:18:26:173] nstimclear: normal exit
[25-JUN-2013 10:18:26:173] nttctl: entry
[25-JUN-2013 10:18:26:173] nttctl: entry 

6、小结 a、DCD连接通常指用户没有正常断开连接而重启客户端,关机以及网络问题导致客户端无法与服务器正常通信所致的连接 b、相对于DCD连接,INACTIVE session则是用户建立连接之后,尚未执行任何操作或操作已经完成但没有断开,等同于与处于idle状态 c、无论是DCD连接,还是出于idle状态的INACTIVE session,在v$session视图呈现的都是INACTIVE状态 d、对于使用resource_limit及profile配置后用户session超出idle_time的情形,在v$session视图呈现sniped状态 e、当在sqlnet.ora配置文件中设置了SQLNET.EXPIRE_TIME参数为非零值时,僵死连接在EXPIRE_TIME指定的时间后被清除 f、演示中仅仅设定EXPIRE_TIME为1分钟,而实际的释放时间接近20分钟左右,什么原因尚不清楚,有待进一步测试 g、设定SQLNET.EXPIRE_TIME为非零值之后,系统需要产生而外的开销以及带来网络性能的下降 h、对于需要及时释放OS及DB资源的情形,Oracle建议使用resource_limit 及 profile 限制用户连接的同时并设定SQLNET.EXPIRE_TIME为非零值 i、Reference: [ID 206007.1] [ID 395505.1] [ID 601605.1] [ID 151972.1]

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏乐沙弥的世界

自定义 RMAN 显示的日期时间格式

      缺省情况下,使用RMAN备份与恢复界面仅仅显示的是日期,而没有具体的时间。有时候需要查看具体的时间,如查看当天多个Incarnation的情形。对于...

11340
来自专栏FreeBuf

EE 4GEE Mini本地提权漏洞(CVE-2018-14327)分析

我在前段时间从买了一个4G调制解调器。这是一个便携式4G WiFi移动宽带调制解调器。有一天,我查看了安装在电脑上的用于故障排除的服务,我看到了一个奇怪的服务,...

9430
来自专栏linux驱动个人学习

基于input子系统的sensor驱动调试(二)

继上一篇:https://cloud.tencent.com/developer/article/1054078 一、驱动流程解析: 1、模块加载: 1 st...

64770
来自专栏Kotlin入门系列

win7基础 cmd 查看当前已经启动的服务列表

67660
来自专栏技术专栏

慕课网Spark SQL日志分析 - 4.从Hive平滑过渡到Spark SQL

老版本文档:http://spark.apache.org/docs/1.6.1/

28230
来自专栏移动端周边技术扩展

iOS打开系统功能对应的URL

18830
来自专栏吴伟祥

Linux命令缩写英文对照记忆(〇) 转

13320
来自专栏FreeBuf

SniffAir:无线渗透测试框架

SniffAir是一个开源的无线安全框架,可帮助你轻松解析被动收集的无线数据并发起复杂的无线渗透测试。此外,它还可以处理大型的或多个pcap文件,执行交叉检查和...

10620
来自专栏乐沙弥的世界

ORA-32004 的错误处理

启动数据库时,收到了ORA-32004 的错误,错误多是一些过时且在当前版本中不在使用的参数,如果碰到类似的错误,只需要将其

12120
来自专栏向治洪

android PakageManagerService启动流程分析

PakageManagerService的启动流程图 ? 1.PakageManagerService概述 PakageManagerService是andro...

552100

扫码关注云+社区

领取腾讯云代金券