前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >磁盘空间满导致(空间释放后)GOLDENGATE进程无法启动

磁盘空间满导致(空间释放后)GOLDENGATE进程无法启动

作者头像
徐靖
发布2020-08-03 23:50:27
1.7K0
发布2020-08-03 23:50:27
举报
文章被收录于专栏:DB说

【背景】

最近有朋友反馈说OGG所在磁盘空间满,手动清理磁盘空间后,无法启动OGG进程,当时想想不应该,以前遇到很多次,空间满后,手动清理空间,如果mgr配置自启动或者手动启动进程,都是瞬间搞定。朋友说关闭mgr后,重启进程还是一样是abend状态,但是查看进程日志却无任何日志输出。

1、【OGG通过ggsci无法启动,但无任何报错】

GGSCI (TEST) 1> start DUMPTEST

Sending START request to MANAGER ...

EXTRACT DUMPTEST starting

GGSCI (TEST) 2> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING

JAGENT RUNNING

EXTRACT STOPPED DUMPTEST 00:00:00 00:06:39

EXTRACT STOPPED EXTTEST 00:00:02 00:00:08

GGSCI (TEST) 9> view report DUMPTEST

--无任何输出且任何alter命令设置添加extract进程都无法工作。

2、【怀疑是进程的文件存在问题导致】

一般是操作系统异常重启或者磁盘空间满,ogg进程出现假死情况,ogg进程启动后记录一个文件(类似lock文件),手动删除还是不行,基本上确认不是进程假死造成的。

3、【OGG却可以通过os命令启动--ggsci底层也是调用os命令】

extract PARAMFILE /ogg/dirprm/dumptest.prm REPORTFILE /ogg/dirrpt/DUMPHXTEST.rpt

extract PARAMFILE /ogg/dirprm/exttest.prm REPORTFILE /ogg/dirrpt/EXTTEST.rpt

再次验证ogg状态

GGSCI (test) 3> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING

JAGENT RUNNING

EXTRACT RUNNING DUMPHXTEST 00:00:00 00:00:08

EXTRACT RUNNING EXTTEST 00:00:01 00:00:05

虽然通过os层命令启动ogg进程,紧紧是临时处理方式,因为管理毕竟要通过ggsci接口去管理。

4、【分析ggserr.log】

刚开始初步检查进程report和ggserr文件并没有发现什么特别异常报错,经过仔细检查发现ggserr.log有一条很不起眼输出:

2018-09-14 23:08:52 WARNING OGG-01934 Oracle GoldenGate Manager for Oracle, mgr.prm: Datastore repair failed.

2018-09-14 23:09:01 ERROR OGG-01098 Oracle GoldenGate Capture for Oracle, exttest.prm: Could not flush "./dirdat/tt001535" (error 28, No space

left on device).

2018-09-14 23:09:02 WARNING OGG-01934 Oracle GoldenGate Manager for Oracle, mgr.prm: Datastore repair failed.

Datastore repair failed.--datastore怎么出现这个在这里,经过分析ogg存在jagent进程,是em 监控ogg或者管理ogg是创建java agent进程且采集数据存在datastore,此处发现jagent也是正常,怎么会影响OGG进程?比较诡异。

但是手动在ggsci命令下执行却没有报错.

GGSCI (test) 4>

REPAIR DATASTORE

Datastore repaired

GGSCI (TEST) 5> start DUMPTEST

Sending START request to MANAGER ...

EXTRACT DUMPTEST starting

GGSCI (TEST) 6> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING

JAGENT RUNNING

EXTRACT STOPPED DUMPTEST 00:00:00 00:06:39

EXTRACT STOPPED EXTTEST 00:00:02 00:00:08

GGSCI (TEST) 7> view report DUMPTEST

--依然无法启动,难道是无法修复?

5、【对jagent进行相关处理,来验证是否跟jagent有关系】

【停止jagent进程】--依然无法启动

GGSCI (TEST) 1>stop JAGEN

GGSCI (TEST) 1> start DUMPTEST

Sending START request to MANAGER ...

EXTRACT DUMPTEST starting

GGSCI (TEST) 2> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING

JAGENT STOPPED

EXTRACT STOPPED DUMPTEST 00:00:00 00:06:39

EXTRACT STOPPED EXTTEST 00:00:02 00:00:08

GGSCI (TEST) 9> view report DUMPTEST

【临时rename jagent对应目录】--居然可以启动

cd $OGG

mv dirbdb dirbdb.old

GGSCI (TEST) 1> start extract *

GGSCI (TEST) 2> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING

JAGENT STOPPED

EXTRACT RUNNING DUMPTEST 00:00:00 00:06:39

EXTRACT RUNNING EXTTEST 00:00:02 00:00:08

--经过验证ogg进程无法启动跟jagent有直接关系.

【验证jagent report文件】--有错误获取信息,下次重试

2018-09-17 12:56:59 [MessageCollector] INFO MessageCollector - Flushing messages for EXTTEST

2018-09-17 12:56:59 [MessageCollector] INFO MessageCollector - Flushing messages for DUMTEST

2018-09-17 12:56:59 [MessageCollector] INFO MessageCollector - Flushing messages for MGR

2018-09-17 12:56:59 [MessageCollector] ERROR MessageCollector - Error retrieveing messages. Try again in the next polling interval.

2018-09-17 12:57:04 [MessageCollector] ERROR MessageCollector - Error retrieveing messages. Try again in the next polling interval.

2018-09-17 12:57:09 [MessageCollector] ERROR MessageCollector - Error retrieveing messages. Try again in the next polling interval.

2018-09-17 12:57:14 [MessageCollector] ERROR MessageCollector - Error retrieveing messages. Try again in the next polling interval.

6、【datastore出现问题,只能重建】

--停止所有进程包括mgr和jagent

GGSCI (TEST) 1>stop *

GGSCI (TEST) 2>stop jagent

GGSCI (TEST) 3>stop mgr

--重建jagent datastore

GGSCI (TEST) 4> create datastore mmap

Datastore created

GGSCI (TEST) 5> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER STOPPED

JAGENT STOPPED

EXTRACT ABENDED DUMPTEST 00:00:00 00:13:06

EXTRACT ABENDED EXTTEST 00:00:03 00:12:57

GGSCI (TEST) 6> start mgr

Manager started.

GGSCI (TEST) 7> start jagent

GGSCI (TEST) 8> start *

GGSCI (TEST) 9> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING

JAGENT RUNNING

EXTRACT RUNNING DUMPTEST 00:00:00 00:05:31

EXTRACT RUNNING EXTTEST 00:00:03 00:05:22

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2018-09-17,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 udapp 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档