前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Oracle 19.8新特性"asmcmd credverify" and "asmcmd credfix"

Oracle 19.8新特性"asmcmd credverify" and "asmcmd credfix"

作者头像
数据和云
发布2021-05-31 15:36:11
7500
发布2021-05-31 15:36:11
举报
文章被收录于专栏:数据和云

平台:Redhat Linux 7.6

数据库版本:Oracle 19.10

问题描述:进行OCR磁盘替换DD破坏实验,并删除原有的OCR磁盘组。中间过程中,因spfile文件以及asm密码文件没有进行迁移新的OCR磁盘。导致两个节点crs集群的其中一个节点只能手工进行启动。

注:模拟破坏OCR磁盘过程中,发现OCR备份在OCR磁盘组中,一旦OCR发生损坏,恢复过程中无法读取OCR磁盘组中的OCR备份。无法按照常规restore进行还原。(参考:2717306.1 2256199.1 1929673.1 2264675.1)

1.通过以下命令查看集群状态,发现组件crsd异常、ora.storage异常,其余都是正常的

代码语言:javascript
复制
crsctl stat res -t -init

2.通过检查crsd日志发现问题

该问题是此次事故的第一个具体原因。

3.通过查看gpnp,来进一步观察spfile文件位置

通过以下可以看到文件位置是正常的。

4.通过init.ora文件将asm实例拉起

代码语言:javascript
复制
[grid@bjyctzdb10 ~]$ cat init.ora
+ASM1.__large_pool_size=20971520
+ASM2.__large_pool_size=20971520
+ASM1.__oracle_base=’/u02/app/grid’#ORACLE_BASE set from in memory value
+ASM2.__oracle_base=’/u02/app/grid’#ORACLE_BASE set from in memory value
+ASM1._asm_max_connected_clients=4
+ASM2._asm_max_connected_clients=4
*.asm_diskgroups=‘ARCH’,‘DATA’,‘OCR_NEW’#Manual Mount
.asm_diskstring=’/dev/oracleasm/disks/’
*.asm_power_limit=1
*.large_pool_size=12M
*.remote_login_passwordfile=‘EXCLUSIVE’
SQL> startup pfile=’/home/grid/init.ora’;
ASM instance started
Total System Global Area 1137173312 bytes
Fixed Size 8905536 bytes
Variable Size 1103101952 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
SQL>

5.查看集群中spfile位置描述

#从这里发现并未指定spfile文件位置,可能是该原因找不到spfile文件,导致集群无法启动。

代码语言:javascript
复制
SQL> show parameter spfile
NAME TYPE VALUE
-------------------------------------------
spfile string

指定spfile文件位置。

代码语言:javascript
复制
SQL> alter system set spfile=’+OCR_NEW/bjyctzd-cluster/ASMPARAMETERFILE/registry.253.1072795999’;
System altered.
SQL> show parameter spfile
NAME TYPE VALUE
--------------------------------------------
spfile string +OCR_NEW/bjyctzd-cluster/ASMPA
RAMETERFILE/registry.253.10727
95999

6.重启集群,再次观察–需要重启asm实例才可以

和之前问题一样。

7.再次使用init.ora去启动asm磁盘

代码语言:javascript
复制
SQL> startup pfile=’/home/grid/init.ora’;
ASM instance started
Total System Global Area 1137173312 bytes
Fixed Size 8905536 bytes
Variable Size 1103101952 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
SQL> show parameter spfile
NAME TYPE VALUE
--------------------------------------------
spfile string
SQL> show parameter spfile
NAME TYPE VALUE
--------------------------------------------
spfile string
SQL> alter system set spfile=’+OCR_NEW/bjyctzd-cluster/ASMPARAMETERFILE/registry.253.1072795999’;
System altered.
SQL> create pfile=’/home/grid/init.ora’ from spfile;
File created.
SQL> shutdown immediate
ASM diskgroups dismounted
ASM instance shutdown
SQL> startup
ASM instance started
Total System Global Area 1137173312 bytes
Fixed Size 8905536 bytes
Variable Size 1103101952 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
SQL> show parameter spfile
NAME TYPE VALUE
--------------------------------------------
spfile string +OCR_NEW/bjyctzd-cluster/ASMPA
RAMETERFILE/registry.253.10727
95999
SQL>

8.再次重启集群还是如上问题,asm实例需要手工启动、asm实例启动之后crsd集群组件才可以启动,通过查看日志发现问题

9.通过命令查看asm密码文件位置

发现密码文件和密码文件备份都在之前的OCR磁盘中,但是OCR已经dismount。

代码语言:javascript
复制
[grid@bjyctzdb10 ~]$ srvctl config asm
ASM home:
Password file: +OCR/orapwASM
Backup of Password file: +OCR/orapwASM_backup
ASM listener: LISTENER
ASM instance count: 3
Cluster ASM listener: ASMNET1LSNR_ASM
[grid@bjyctzdb10 ~]$

修改asm密码文件位置以及备份位置。

代码语言:javascript
复制
[root@bjyctzdb10 ~]# srvctl modify asm -pwfile OCR_NEW
[root@bjyctzdb10 ~]# srvctl config asm
ASM home:
Password file: OCR_NEW
Backup of Password file: +OCR/orapwASM_backup
ASM listener: LISTENER
ASM instance count: 3
Cluster ASM listener: ASMNET1LSNR_ASM
[root@bjyctzdb10 ~]#
[root@bjyctzdb10 ~]# srvctl modify asm -pwfilebackup ocr_new
[root@bjyctzdb10 ~]# srvctl config asm
ASM home:
Password file: OCR_NEW
Backup of Password file: ocr_new
ASM listener: LISTENER
ASM instance count: 3
Cluster ASM listener: ASMNET1LSNR_ASM
[root@bjyctzdb10 ~]#

10.再次尝试关闭集群以及启动集群

11.通过查看日志,还是因为无法找到密码文件打开asm磁盘

12.进一步观察asm密码文件

从这里我们可以看到Password file:后面没有跟具体的密码文件。

代码语言:javascript
复制
[grid@bjyctzdb09 ~]$ srvctl config asm
ASM home:
Password file: OCR_NEW
Backup of Password file: ocr_new
ASM listener: LISTENER
ASM instance count: 3
Cluster ASM listener: ASMNET1LSNR_ASM
[grid@bjyctzdb09 ~]$ exit

13.进入asmcmd,查看是否有密码文件

从下面我们可以发现没有asm密码文件。

代码语言:javascript
复制
ASMCMD> cd ocr_new
ASMCMD> ls
bjyctzd-cluster/

14.重新生成一个密码文件

提示已经存在一个密码文件。

代码语言:javascript
复制
[grid@bjyctzdb09 ~]$ orapwd file=’+ocr_new/orapwasm’ asm=y
Enter password for SYS:
OPW-00010: Could not create the password file. This resource has a Password File.

通过这个,我们可以看到密码文件是OCR_NEW。

代码语言:javascript
复制
[grid@bjyctzdb09 ~]$ srvctl config asm
ASM home:
Password file: OCR_NEW
Backup of Password file: ocr_new
ASM listener: LISTENER
ASM instance count: 3
Cluster ASM listener: ASMNET1LSNR_ASM
[grid@bjyctzdb09 ~]$ exit

15.根据上面提示,我们重新将asm密码文件位置设置为空

代码语言:javascript
复制
[grid@bjyctzdb09 ~]$ srvctl modify asm -pwfile
[grid@bjyctzdb09 ~]$ srvctl config asm
ASM home:
Password file:
Backup of Password file: ocr_new
ASM listener: LISTENER
ASM instance count: 3
Cluster ASM listener: ASMNET1LSNR_ASM

16.重新生成密码文件,并再次查看密码文件位置

这是我们可以发现,对应的密码文件节点1和节点2已经存在。

代码语言:javascript
复制
[grid@bjyctzdb09 ~]$ orapwd file=’+ocr_new/orapwasm’ asm=y
Enter password for SYS:
[grid@bjyctzdb09 ~]$ srvctl config asm
ASM home:
Password file: +OCR_NEW/orapwasm
Backup of Password file: ocr_new
ASM listener: LISTENER
ASM instance count: 3
Cluster ASM listener: ASMNET1LSNR_ASM
[grid@bjyctzdb10 ~]$ srvctl config asm
ASM home:
Password file: +OCR_NEW/orapwasm
Backup of Password file: ocr_new
ASM listener: LISTENER
ASM instance count: 3
Cluster ASM listener: ASMNET1LSNR_ASM

这是我们发现我们的密码文件已经完整的回来了。

17.再次重启集群,观察是否可以自动将crsd资源拉起来

失败,和之前的失败原因是一样的。

18.本次具体的报错信息

alert.log日志报错。

代码语言:javascript
复制
2021-05-17 19:09:52.326 [CRSD(123152)]CRS-0804: Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage Storage layer error [Insufficient quorum to open OCR devices] [0]]. Details at (:CRSD00111:) in /u02/app/grid/diag/crs/bjyctzdb09/crs/trace/crsd.trc.
2021-05-17 19:09:52.533 [CRSD(123176)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 123176
2021-05-17 19:09:58.341 [CRSD(123176)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /u02/app/grid/diag/crs/bjyctzdb09/crs/trace/crsd.trc.
2021-05-17 19:09:58.346 [CRSD(123176)]CRS-0804: Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage Storage layer error [Insufficient quorum to open OCR devices] [0]]. Details at (:CRSD00111:) in /u02/app/grid/diag/crs/bjyctzdb09/crs/trace/crsd.trc.
2021-05-17 19:09:58.555 [CRSD(123220)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 123220

ohasd_orarootagent_root.trc报错信息。

代码语言:javascript
复制
2021-05-17 18:58:56.791 : USRTHRD:2394859264: [ INFO] {0:5:3} [ora.storage] Error [kgfoAl06] in [kgfokge] at kgfo.c:3180
2021-05-17 18:58:56.791 : USRTHRD:2394859264: [ INFO] {0:5:3} [ora.storage] ORA-01017: invalid username/password; logon denied
2021-05-17 18:58:56.791 : USRTHRD:2394859264: [ INFO] {0:5:3} [ora.storage] Category: 7
2021-05-17 18:58:56.791 : USRTHRD:2394859264: [ INFO] {0:5:3} [ora.storage] DepInfo: 1017
2021-05-17 18:58:56.791 : USRTHRD:2394859264: [ INFO] {0:5:3} [ora.storage] – trace dump end –
2021-05-17 18:58:56.791 : USRTHRD:2394859264: [ INFO] {0:5:3} Thread:kgfoCheckMountExt isRunning is reset to false here
2021-05-17 18:58:56.791 : USRTHRD:2394859264: [ INFO] {0:5:3} Thread:kgfoCheckMountExt isFinished set to true
2021-05-17 18:58:56.791 : USRTHRD:3566778112: [ INFO] {0:5:3} Thread:kgfoCheckMountExt Tasklet::doTask m_cv.timewait returned
2021-05-17 18:58:56.791 : USRTHRD:3566778112: [ INFO] {0:5:3} Thread:kgfoCheckMountExt Tasklet::doTask(kgfoCheckMountExt) executed in 3 seconds
2021-05-17 18:58:56.791 :CLSDYNAM:3566778112: [ora.storage]{0:5:3} [start] StorageAgent::parsekgforetcodes retcode = 7, kgfoCheckMount(OCR_NEW), flag 2
2021-05-17 18:58:56.791 :CLSDYNAM:3566778112: [ora.storage]{0:5:3} [start] (null) category: 7, operation: kgfoAl06, loc: kgfokge, OS error: 1017, other: ORA-01017: invalid username/password; logon denied
2021-05-17 18:58:56.791 :CLSDYNAM:3566778112: [ora.storage]{0:5:3} [start] StorageAgent::check 260 kgfo returncode 1
2021-05-17 18:58:56.791 :CLSDYNAM:3566778112: [ora.storage]{0:5:3} [start] (:CLSN00140:)StorageAgent::check 300 parsekgforretcodes OCR dgName OCR_NEW state 1
2021-05-17 18:58:56.791 :CLSDYNAM:3566778112: [ora.storage]{0:5:3} [start] Storage::start waiting for check to not return PARTIAL or UNPLANNED_OFFLINE 1
2021-05-17 18:58:57.791 :CLSDYNAM:3566778112: [ora.storage]{0:5:3} [start] StorageAgent::check 160 NODEROLE_HUB getOCRdetails
2021-05-17 18:58:57.793 : USRTHRD:3566778112: [ INFO] {0:5:3} Thread:kgfoCheckMountExt thread constructor exit this:9c07c820 m_pThnd:0 m_thndMX:9c07c840, m_tintMX:9c07c890 &m_postMX:0x7f0a9c07c870
2021-05-17 18:58:57.793 : USRTHRD:3566778112: [ INFO] {0:5:3} TaskletObjManager::manageTasklets START
2021-05-17 18:58:57.793 : USRTHRD:3566778112: [ INFO] {0:5:3} Thread:kgfoCheckMountExt thread destructor entry { this:0x7f0a9c03a8e0 m_pThnd:0x7f0a9c0851f0 m_thndMX:0x7f0a9c03a900, m_tintMX:0x7f0a9c03a950 &m_postMX:0x7f0a9c03a930
2021-05-17 18:58:57.793 : USRTHRD:3566778112: [ INFO] {0:5:3} Thread:kgfoCheckMountExt thread destructor stop
2021-05-17 18:58:57.793 : USRTHRD:3566778112: [ INFO] {0:5:3} Thread:kgfoCheckMountExt stop { m_pThnd:0x7f0a9c0851f0 tid:2394859264 running:0 alive:0
2021-05-17 18:58:57.793 : USRTHRD:3566778112: [ INFO] {0:5:3} Thread:kgfoCheckMountExt Thread stop:thread is not running:0 or alive:0
2021-05-17 18:58:57.793 : USRTHRD:3566778112: [ INFO] {0:5:3} Thread:kgfoCheckMountExt Thread stop:thread acquire m_pThnd:0x7f0a9c0851f0 m_thndMX:0x7f0a9c03a900
2021-05-17 18:58:57.793 : USRTHRD:3566778112: [ INFO] {0:5:3} Thread:kgfoCheckMountExt Thread stop:thread sltstjn
2021-05-17 18:58:57.793 : USRTHRD:3566778112: [ INFO] {0:5:3} Thread:kgfoCheckMountExt stop release m_thndMX:0x7f0a9c03a900 m_tid:2394859264}
2021-05-17 18:58:57.793 : USRTHRD:3566778112: [ INFO] {0:5:3} Thread:kgfoCheckMountExt thread destructor m_pThnd:(nil) exit }
2021-05-17 18:58:57.793 : USRTHRD:3566778112: [ INFO] {0:5:3} TaskletObjManager::manageTasklets END
2021-05-17 18:58:57.793 : USRTHRD:3566778112: [ INFO] {0:5:3} Thread:kgfoCheckMountExt Thread::start { acquire thndMX:9c07c840
2021-05-17 18:58:57.793 : USRTHRD:3566778112: [ INFO] {0:5:3} Thread:kgfoCheckMountExt Thread::start spawn pThnd:0x7f0a9c04dff0 thndType:1
2021-05-17 18:58:57.793 : USRTHRD:3566778112: [ INFO] {0:5:3} Thread:kgfoCheckMountExt Thread::start thread spawned tid:2394859264
2021-05-17 18:58:57.793 : USRTHRD:3566778112: [ INFO] {0:5:3} Thread:kgfoCheckMountExt Thread::start spawned release thndMX:9c07c840 }
2021-05-17 18:58:57.799 : CLSNS:2394859264: clsns_SetTraceLevel:trace level set to 1.
2021-05-17 18:58:57.827 : USRTHRD:2394859264: [ INFO] {0:5:3} [ora.storage] 9607 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS

19.查看asm实例中的用户名以及权限信息

从这里发现此次事故的第二个具体原因。

代码语言:javascript
复制
[root@bjyctzdb09 dbs]# asmcmd lspwusr
Username sysdba sysoper sysasm
SYS TRUE TRUE TRUE

对照正常的asm实例用户如下:(12c之后的版本都是如下–11g版本没有验证)。

20.添加对应的用户,并授予对应的权限

代码语言:javascript
复制
[root@bjyctzdb09 dbs]# asmcmd orapwusr --add ASMSNMP
Enter password: *******
[root@bjyctzdb09 dbs]# asmcmd orapwusr --grant sysdba ASMSNMP
[root@bjyctzdb09 dbs]# asmcmd lspwusr
Username sysdba sysoper sysasm
SYS TRUE TRUE TRUE
ASMSNMP TRUE FALSE FALSE
[root@bjyctzdb09 dbs]# orapwusr --add CRSUSER__ASM_001
-bash: orapwusr: command not found
[root@bjyctzdb09 dbs]# asmcmd orapwusr --add CRSUSER__ASM_001
Enter password: ******
[root@bjyctzdb09 dbs]# asmcmd orapwusr --grant sysdba CRSUSER__ASM_001
[root@bjyctzdb09 dbs]# asmcmd orapwusr --grant sysasm CRSUSER__ASM_001

从这里我们可以发现,用户已经完全正确,且对应的权限也没有什么问题。

代码语言:javascript
复制
[root@bjyctzdb09 dbs]# asmcmd lspwusr
Username sysdba sysoper sysasm
SYS TRUE TRUE TRUE
ASMSNMP TRUE FALSE FALSE
CRSUSER__ASM_001 TRUE FALSE TRUE

21.再次重启crs,观察是否能够自动启动

报错,和之前一样的报错信息。

22.进一步查阅mos文档,发现最终问题

数据库版本19.8或者之后的ru补丁后的数据库,带来一个新特性"asmcmd credverify" and “asmcmd credfix” 命令来创建asm密码文件。

执行如下命令。该命令OCR、OLR和密码文件中是否一致,以及验证是否存在连接集群的凭证。

代码语言:javascript
复制
[grid@bjyctzdb10 ~]$ asmcmd --nocp credverify
credverify: Cannot connect using credentials for CRSUSER__ASM_001, please run ‘credfix’ to fix the credentials.

根据上述提示,无法使用功能CRSUSER__ASM_001,并需要运行一下命令来修复凭证。

从下面可以看出删除了CRSUSER__ASM_001用户,并创建了新的用户CRSUSER__ASM_002,而且最后将连接集群的证书修复。

代码语言:javascript
复制
[grid@bjyctzdb10 ~]$ asmcmd --nocp credfix
credfix: Failed to connect using credentials for CRSUSER__ASM_001.
op=addcrscreds wrap=/tmp/creds0.xml
credfix: Creating new credentials, no valid credentials in OCR.
credfix: New user CRSUSER__ASM_002 created.
credfix: Starting SSH session on node bjyctzdb09.
credfix: OLR for bjyctzdb09 has been fixed if credentials were created incorrectly. Exiting SSH session.
op=credimport wrap=/tmp/creds0.xml olr=true force=true
credfix: OLR for bjyctzdb10 has been fixed if credentials were created incorrectly.
op=delcrscreds crs_user=CRSUSER__ASM_001
credfix: Deleted CRSUSER__ASM_001 from OCR.
credverify: starting SSH session on node bjyctzdb09
credverify: Credentials created correctly on bjyctzdb09. Exiting SSH session.
credverify: Credentials created correctly on bjyctzdb10.
credfix: Credentials have been fixed if they were created incorrectly

23.再次查看asm实例中的用户和密码

代码语言:javascript
复制
[grid@bjyctzdb10 ~]$ asmcmd lspwusr
Username sysdba sysoper sysasm
SYS TRUE TRUE TRUE
ASMSNMP TRUE FALSE FALSE
CRSUSER__ASM_002 TRUE FALSE TRUE

24.再次重启集群,观察crs是否自动启动

到此,问题得到完美解决!

墨天轮原文链接:https://www.modb.pro/db/65395(复制到浏览器或者点击“阅读原文”立即查看)

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-05-20,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 数据和云 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1.通过以下命令查看集群状态,发现组件crsd异常、ora.storage异常,其余都是正常的
  • 2.通过检查crsd日志发现问题
  • 3.通过查看gpnp,来进一步观察spfile文件位置
  • 4.通过init.ora文件将asm实例拉起
  • 5.查看集群中spfile位置描述
  • 6.重启集群,再次观察–需要重启asm实例才可以
  • 7.再次使用init.ora去启动asm磁盘
  • 8.再次重启集群还是如上问题,asm实例需要手工启动、asm实例启动之后crsd集群组件才可以启动,通过查看日志发现问题
  • 9.通过命令查看asm密码文件位置
  • 10.再次尝试关闭集群以及启动集群
  • 11.通过查看日志,还是因为无法找到密码文件打开asm磁盘
  • 12.进一步观察asm密码文件
  • 13.进入asmcmd,查看是否有密码文件
  • 14.重新生成一个密码文件
  • 15.根据上面提示,我们重新将asm密码文件位置设置为空
  • 16.重新生成密码文件,并再次查看密码文件位置
  • 17.再次重启集群,观察是否可以自动将crsd资源拉起来
  • 18.本次具体的报错信息
  • 19.查看asm实例中的用户名以及权限信息
  • 20.添加对应的用户,并授予对应的权限
  • 21.再次重启crs,观察是否能够自动启动
  • 22.进一步查阅mos文档,发现最终问题
  • 23.再次查看asm实例中的用户和密码
  • 24.再次重启集群,观察crs是否自动启动
相关产品与服务
文字识别
文字识别(Optical Character Recognition,OCR)基于腾讯优图实验室的深度学习技术,将图片上的文字内容,智能识别成为可编辑的文本。OCR 支持身份证、名片等卡证类和票据类的印刷体识别,也支持运单等手写体识别,支持提供定制化服务,可以有效地代替人工录入信息。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档