前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >一封备库报警邮件的分析(r6笔记第82天)

一封备库报警邮件的分析(r6笔记第82天)

作者头像
jeanron100
发布2018-03-16 16:38:15
6640
发布2018-03-16 16:38:15
举报

对于orabbix报出的警报,自己都是心怀敬畏,因为这些表面的现象如果深入分析没准就能有所收获,当然目的还是解决问题,不是一味追求指标。 今天收到的报警邮件如下,然后在两分钟后问题自动恢复。 ### ZABBIX-监控系统: ------------------------------------ 报警内容: DG_issue ------------------------------------ 报警级别: PROBLEM ------------------------------------ 监控项目: dg_issue:2015-10-08 03:59:38.0Log Transport ServicesErrorPING[ARC1]: Error 1089 when pinging standby (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=adb1.xxxx.com)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=sadb1_XPT)(INSTANCE_NAME=adb1)(SERVER=dedicated))). ------------------------------------ 报警时间:2015.10.08-04:00:22 这是个一主两备的环境,因为这个问题是在主库的v$dataguard_status报出,而且对于1089的错误,oracle的解释如下: $ oerr ora 1089 01089, 00000, "immediate shutdown in progress - no operations are permitted" // *Cause: The SHUTDOWN IMMEDIATE command was used to shut down // a running ORACLE instance, so your operations have been // terminated. // *Action: Wait for the instance to be restarted, or contact your DBA. 所以还是分析了一下备库在问题时间段内的负载情况,没有发现明显的抖动。 sar的结果如下: 03:50:01 AM all 0.06 0.00 0.03 0.00 0.00 99.91 04:00:01 AM all 0.06 0.00 0.04 0.00 0.00 99.90 04:10:01 AM all 0.11 0.00 0.05 0.23 0.00 99.61 04:20:01 AM all 0.07 0.00 0.04 0.00 0.00 99.90 04:30:01 AM all 0.06 0.00 0.03 0.00 0.00 99.90 04:40:01 AM all 0.06 0.00 0.03 0.00 0.00 99.91 04:50:01 AM all 0.06 0.00 0.03 0.00 0.00 99.91 那么错误是从主库抛出的,来看看主库中的错误日志。就会发现在问题时间段尝试连接出错了,可能备库正在做shutdown immediate的操作。 Thu Oct 08 02:19:38 CST 2015 Thread 1 advanced to log sequence 7659 (LGWR switch) Current log# 1 seq# 7659 mem# 0: /U01/app/oracle/oradata/adb1/redo01.log Thu Oct 08 03:59:38 CST 2015 ARC1: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (1089) ARC1: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned PING[ARC1]: Error 1089 when pinging standby (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=xxxx.com)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=sadb1_XP T)(INSTANCE_NAME=adb1)(SERVER=dedicated))). 好了,问题的背景就是这样,那么是否有备库中的crontab来触发这类操作呢。 直白的看,好像没有和shutdown immediate相关的操作。 $ crontab -l 2-9,12-29,31-59 * * * * . $HOME/.adb1profile;$HOME/dbadmin/scripts/lsnr_check.sh 0 1 * * 1 . $HOME/.adb1profile;$HOME/dbadmin/scripts/lsnr_log.sh 43 1 * * * . $HOME/.adb1profile;$HOME/dbadmin/scripts/expdpfull.sh #20 1 * * * . $HOME/.adb1profile;$HOME/dbadmin/scripts/rman_full_backup.sh 因为这是一个10g的备库,所以尝试做expdp还是需要把备库置为read-only状态才可以做的。开启日志应用还是会有shutdown的操作,所以这个还是一个可能的疑点。 另外一个备库,db uniq name指定错误,应该是s2accdb1 查看逻辑备库的脚本,看到确实有dgmgrl 做的read-only的操作,这样问题就对上了。 备库1中的备份脚本如下: $ less $HOME/dbadmin/scripts/expdpfull.sh #!/bin/bash 。。。 export expfile dgmgrl / "edit database sadb1 set state='READ-ONLY'" if [ $? -ne 0 ]; then echo "set read-only failed!" exit 1 fi ... 备库2中的备份脚本如下: $ less $HOME/dbadmin/scripts/expdpfull.sh #!/bin/bash 。。。 export expfile dgmgrl / "edit database sadb1 set state='READ-ONLY'" if [ $? -ne 0 ]; then echo "set read-only failed!" exit 1 fi ... 如果仔细查看就会发现问题,两个脚本的内容是一致的,里面所指的database是同一个备库,也就意味着在同样的时间里,两个客户端同时发出请求,对同一个备库做read-only的操作,这种操作还是有很大的风险的。 那么备库2运行脚本的时候,把备库1置为read-only,然后尝试在备库2做逻辑备份,会不会成功呢,结果肯定是不行。 可以在备库2中看到备份脚本运行了,但是没有生成备份,备份日志中只有一条错误警告。 -rw-r--r-- 1 oracle oinstall 159 Oct 8 01:44 accdb1_20151008.log $ less accdb1_20151008.log EXP-00008: ORACLE error 1219 encountered ORA-01219: database not open: queries allowed on fixed tables/views only EXP-00000: Export terminated unsuccessfully 问题到此确实发现了备库中的一个大的问题,通过一个小小的报警就能有如此的收获,其实还有更多的隐患,既然查看了备库的日志,那么其中的ora错误是格外需要关注的。 结果发现备库中竟然每天都会有大量的ora错误,是和redo相关的错误。 Errors in file /U01/app/oracle/admin/adb1/bdump/adb1_mrp0_3644.trc: ORA-00313: open failed for members of log group 1 of thread 1 ORA-00312: online log 1 thread 1: '/U01/app/oracle/oradata/adb1/redo01.log' ORA-27037: unable to obtain file status Linux-x86_64 Error: 2: No such file or directory Additional information: 3 Thu Oct 08 03:24:43 CST 2015 Errors in file /U01/app/oracle/admin/adb1/bdump/adb1_mrp0_3644.trc: ORA-19527: physical standby redo log must be renamed ORA-00312: online log 1 thread 1: '/U01/app/oracle/oradata/adb1/redo01.log' Clearing online redo logfile 1 complete Media Recovery Log /U01/app/oracle/admin/adb1/arch/1_7658_782788423.dbf Thu Oct 08 03:24:44 CST 2015 Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE THROUGH ALL SWITCHOVER DISCONNECT NODELAY Thu Oct 08 03:24:45 CST 2015 Media Recovery Waiting for thread 1 sequence 7659 Thu Oct 08 03:25:29 CST 2015 Redo Shipping Client Connected as PUBLIC -- Connected User is Valid RFS[1015]: Assigned to RFS process 3928 RFS[1015]: Identified database type as 'physical standby' RFS[1015]: Archived Log: '/U01/app/oracle/admin/adb1/arch/1_7659_782788423.dbf' Thu Oct 08 03:25:35 CST 2015 Media Recovery Log /U01/app/oracle/admin/adb1/arch/1_7659_782788423.dbf Media Recovery Waiting for thread 1 sequence 7660 尽管有大量的ora错误,但是依然可以正常接收日志,使用dg broker来查看 DGMGRL> show configuration; Configuration Name: adb1 Enabled: YES Protection Mode: MaxPerformance Fast-Start Failover: DISABLED Databases: adb1 - Primary database sadb1 - Physical standby database s2adb1 - Physical standby database Current status for "accdb1": SUCCESS 可以看到验证是正常的,但是还是有潜在的问题,那么备库中的错误怎么和redo有关系呢。 如果使用show database verbose查看,是能发现一个潜在的问题的。 DGMGRL> show database verbose sadb1 Database Name: sadb1 Role: PHYSICAL STANDBY Enabled: YES Intended State: ONLINE Instance(s): adb1 Properties: InitialConnectIdentifier = 'sadb1' ObserverConnectIdentifier = '' LogXptMode = 'ARCH' Dependency = '' DelayMins = '0' Binding = 'OPTIONAL' MaxFailure = '0' MaxConnections = '1' ReopenSecs = '300' NetTimeout = '180' LogShipping = 'ON' PreferredApplyInstance = '' ApplyInstanceTimeout = '0' ApplyParallel = 'AUTO' StandbyFileManagement = 'auto' ArchiveLagTarget = '0' LogArchiveMaxProcesses = '2' LogArchiveMinSucceedDest = '1' DbFileNameConvert = '' LogFileNameConvert = '' FastStartFailoverTarget = '' StatusReport = '(monitor)' InconsistentProperties = '(monitor)' InconsistentLogXptProps = '(monitor)' SendQEntries = '(monitor)' LogXptStatus = '(monitor)' RecvQEntries = '(monitor)' HostName = 'xxxx.com' SidName = 'adb1' LocalListenerAddress = '(ADDRESS=(PROTOCOL=TCP)(HOST=xxxx.com)(PORT=1521))' StandbyArchiveLocation = '/U01/app/oracle/admin/adb1/arch' AlternateLocation = '' LogArchiveTrace = '0' LogArchiveFormat = '%t_%s_%r.dbf' LatestLog = '(monitor)' TopWaitEvents = '(monitor)' Current status for "sadb1": SUCCESS 这个时候可以尝试使用edit语句修改这些配置 edit database sadb1 set property LogFileNameConvert='/U01/app/oracle/oradata/adb1,/U01/app/oracle/oradata/adb1'; edit database sadb1 set property DbFileNameConvert='/U01/app/oracle/oradata/adb1,/U01/app/oracle/oradata/adb1'; 在spfile层级修改还是需要启动备库的,现在spfile定义一下。 alter system set db_file_name_convert='/U01/app/oracle/oradata/adb1','/U01/app/oracle/oradata/adb1' scope=spfile; alter system set log_file_name_convert='/U01/app/oracle/oradata/adb1','/U01/app/oracle/oradata/adb1' scope=spfile; 这个时候检查dg broker就会抛出下面的错误。 DGMGRL> show configuration; Configuration Name: adb1 Enabled: YES Protection Mode: MaxPerformance Fast-Start Failover: DISABLED Databases: adb1 - Primary database sadb1 - Physical standby database s2adb1 - Physical standby database Current status for "adb1": Warning: ORA-16608: one or more databases have warnings 对于这一类问题,如果不是很清楚,可以按照下面的思路来查。 DGMGRL> show database verbose sadb1; Database Name: sadb1 Role: PHYSICAL STANDBY Enabled: YES Intended State: ONLINE Instance(s): adb1 Properties: InitialConnectIdentifier = 'sadb1' ObserverConnectIdentifier = '' LogXptMode = 'ARCH' Dependency = '' DelayMins = '0' Binding = 'OPTIONAL' MaxFailure = '0' MaxConnections = '1' ReopenSecs = '300' NetTimeout = '180' LogShipping = 'ON' PreferredApplyInstance = '' ApplyInstanceTimeout = '0' ApplyParallel = 'AUTO' StandbyFileManagement = 'auto' ArchiveLagTarget = '0' LogArchiveMaxProcesses = '2' LogArchiveMinSucceedDest = '1' DbFileNameConvert = '/U01/app/oracle/oradata/adb1,/U01/app/oracle/oradata/adb1' LogFileNameConvert = '/U01/app/oracle/oradata/adb1,/U01/app/oracle/oradata/adb1' FastStartFailoverTarget = '' StatusReport = '(monitor)' InconsistentProperties = '(monitor)' InconsistentLogXptProps = '(monitor)' SendQEntries = '(monitor)' LogXptStatus = '(monitor)' RecvQEntries = '(monitor)' HostName = 'xxxx.com' SidName = 'adb1' LocalListenerAddress = '(ADDRESS=(PROTOCOL=TCP)(HOST=xxxx.com)(PORT=1521))' StandbyArchiveLocation = '/U01/app/oracle/admin/adb1/arch' AlternateLocation = '' LogArchiveTrace = '0' LogArchiveFormat = '%t_%s_%r.dbf' LatestLog = '(monitor)' TopWaitEvents = '(monitor)' Current status for "sadb1": Warning: ORA-16792: configuration property value is inconsistent with database setting 然后使用下面的命令来查看不匹配的地方。 DGMGRL> show database sadb1 'InconsistentProperties'; INCONSISTENT PROPERTIES INSTANCE_NAME PROPERTY_NAME MEMORY_VALUE SPFILE_VALUE BROKER_VALUE adb1 DbFileNameConvert /U01/app/oracle/oradata/adb1,/U01/app/oracle/oradata/adb1 /U01/app/oracle/oradata/adb1,/U01/app/oracle/oradata/adb1 adb1 LogFileNameConvert /U01/app/oracle/oradata/adb1,/U01/app/oracle/oradata/adb1 /U01/app/oracle/oradata/adb1,/U01/app/oracle/oradata/adb1 可以看到memory_value,spfile_value,broker_value还是不匹配的。 所以还是需要启动备库的。启动之后问题就会自然修复,对于这类ora错误可以参考mos文档。 ORA-19527 reported in Standby Database when starting Managed Recovery (文档 ID 352879.1) 所以经过以上处理,两个备库的处理思路就清晰了,备份也存在,日志中的ora错误也能够基本消除。

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2015-10-08,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 杨建荣的学习笔记 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
数据库
云数据库为企业提供了完善的关系型数据库、非关系型数据库、分析型数据库和数据库生态工具。您可以通过产品选择和组合搭建,轻松实现高可靠、高可用性、高性能等数据库需求。云数据库服务也可大幅减少您的运维工作量,更专注于业务发展,让企业一站式享受数据上云及分布式架构的技术红利!
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档