Home
 | 
Articles
 | 
Backup and Recovery
 | 
How to recover OCR from loss and recreae Votedisk?
 | 
How to recover OCR from loss
 and recreate Votedisk?
Overview
Recent Article
All Archives
Topics
Comments 
Last modified: August 2016
»»
What if we lost the diskgroup +OCRDG where our OCR, Voting Disk, ASM spfile, mgmtDB and its spfile exist due to the underlying ASM disk failure.
There are a couple of things to remember:
  • If OCR is stored in ASM, we cannot restore OCR from a manual or automatic backup directly
  • Should ASM to start, CRS stack must be up
  • Should we restore OCR, OCR shouldn't be in use. In other words, CRS daemon must not be running
So, it's a kind of never ending cyclic dependency between CRS and ASM. However, there is a way to overcome this scenario.
Let's see what we have in the ASM diskgroup +OCRDG which I'm going to simulate the underlying disk failure and try to recover. This assumes that we have a proper backup of OCR (either manual or automatic).
Oracle Cluster Registry (OCR)
$
[grid@ol-alpha ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          4
         Total space (kbytes)     :     409568
         Used space (kbytes)      :       1640
         Available space (kbytes) :     407928
         ID                       :  989059424
         Device/File Name         :     +OCRDG
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check bypassed due to non-privileged user
Voting Disk
$
[grid@ol-alpha ~]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   0b682e4b9ba24f57bfa766f5b5a8f609 (ORCL:ASMDISK_SDB1) [OCRDG]
Located 1 voting disk(s).
ASM SPFILE
$
[grid@ol-alpha ~]$ sqlplus / as sysasm
SQL><>  
ASM spfile is also located under this diskgroup.

SQL> SHOW PARAMETER SPFILE

NAME                                 TYPE        VALUE
------------------------------------ ----------- -------------------------------
spfile                               string      +OCRDG/shanma-cluster/ASMPARAME
                                                 TERFILE/registry.253.933541259
In addition, we can see that there are two other diskgroups called FRADG and DATADG.
SQL>
SQL> select name, state from v$asm_diskgroup ;

NAME                           STATE
------------------------------ -----------
FRADG                          MOUNTED
OCRDG                          MOUNTED
DATADG                         MOUNTED
SQL><>  
SQL> select a.name, path from v$asm_diskgroup a, v$asm_disk b where a.group_number=b.group_number;

NAME                           PATH
------------------------------ --------------------------
DATADG                         ORCL:ASMDISK_SDF1
                               ORCL:ASMDISK_SDG1
FRADG                          ORCL:ASMDISK_SDE1
                               ORCL:ASMDISK_SDD1
OCRDG                          ORCL:ASMDISK_SDB1
Diskgroup '+OCRDG' (the first diskgroup which we create during the Grid installation) also holds the database of _mgmtdb and its SPFILE reside.
ASMCMD
ASMCMD [+] > find +OCRDG *
+OCRDG/_MGMTDB/
+OCRDG/_MGMTDB/CONTROLFILE/
+OCRDG/_MGMTDB/CONTROLFILE/Current.259.935350005
+OCRDG/_MGMTDB/DATAFILE/
+OCRDG/_MGMTDB/DATAFILE/SYSAUX.256.935349941
+OCRDG/_MGMTDB/DATAFILE/SYSTEM.257.935349951
+OCRDG/_MGMTDB/DATAFILE/UNDOTBS1.258.935349967
+OCRDG/_MGMTDB/ONLINELOG/
+OCRDG/_MGMTDB/ONLINELOG/group_1.260.935350007
+OCRDG/_MGMTDB/ONLINELOG/group_2.261.935350007
+OCRDG/_MGMTDB/ONLINELOG/group_3.262.935350009
+OCRDG/_MGMTDB/PARAMETERFILE/
+OCRDG/_MGMTDB/PARAMETERFILE/spfile.264.935350021
+OCRDG/_MGMTDB/TEMPFILE/
+OCRDG/_MGMTDB/TEMPFILE/TEMP.263.935350011
+OCRDG/shanma-cluster/
+OCRDG/shanma-cluster/ASMPARAMETERFILE/
+OCRDG/shanma-cluster/ASMPARAMETERFILE/REGISTRY.253.935269459
+OCRDG/shanma-cluster/OCRFILE/
+OCRDG/shanma-cluster/OCRFILE/REGISTRY.255.935269621
OCR backups
Check most recent OCR backup (on all cluster nodes):
$<>  
[root@ol-alpha ~]# ocrconfig -showbackup
PROT-24: Auto backups for the Oracle Cluster Registry are not available

ol-alpha     2017/01/25 17:54:37     /u01/app/12.1.0/grid/cdata/shanma-cluster/backup_20170125_175437.ocr     0
ol-alpha     2017/01/23 05:19:05     /u01/app/12.1.0/grid/cdata/shanma-cluster/backup_20170123_051905.ocr     0
$<>  
[root@ol-alpha ~]# ocrconfig -manualbackup

ol-beta     2017/02/07 20:26:42     /u01/app/12.1.0/grid/cdata/shanma-cluster/backup_20170207_202642.ocr     0
ol-alpha     2017/01/25 17:54:37     /u01/app/12.1.0/grid/cdata/shanma-cluster/backup_20170125_175437.ocr     0
ol-alpha     2017/01/23 05:19:05     /u01/app/12.1.0/grid/cdata/shanma-cluster/backup_20170123_051905.ocr     0
Note: Since my cluster isn't running through the day, I have taken a manual backup using 'ocrconfig -manualbackup'. So, I will be using the most recent backup 'backup_20170207_202642.ocr' located in node ol-beta to demonstrate the recovery.
Simulate a disk failure
With everything looking great, I'm going to forcefully overwrite with something in the underlying ASM disk '/dev/oracleasm/disks/ASMDISK_SDB1' of diskgroup +OCRDG.
$<>  
[root@ol-alpha ~]# dd if=/dev/zero of=/dev/oracleasm/disks/ASMDISK_SDB1 bs=1024 count=100
100+0 records in
100+0 records out
102400 bytes (102 kB) copied, 0.00419669 s, 24.4 MB/s
All of contents are lost (OCR,VOTE,SPFILE etc.).
$
[root@ol-alpha ~]# crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
Now the cluster is crashed.
$
[root@ol-alpha ~]# crsctl stop crs -f
[root@ol-beta ~]# crsctl stop crs -f
Now I'm recreating the disk to simulate that we lost the disk and another fresh disk is made available.
$
[root@ol-alpha ~]# oracleasm deletedisk ASMDISK_SDB1
Disk "ASMDISK_SDB1" defines an unmarked device
Dropping disk: done
$
[root@ol-alpha ~]# oracleasm createdisk ASMDISK_SDB1 /dev/sdb1
Writing disk header: done
Instantiating disk: done
Now from the node (ol-beta in my case) where latest OCR backup exists, start the CRS in the exclusive mode.
$<>  
[root@ol-beta dbs]# crsctl start crs -excl
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.evmd' on 'ol-beta'
CRS-2672: Attempting to start 'ora.mdnsd' on 'ol-beta'
CRS-2676: Start of 'ora.evmd' on 'ol-beta' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'ol-beta' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'ol-beta'
CRS-2676: Start of 'ora.gpnpd' on 'ol-beta' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'ol-beta'
CRS-2672: Attempting to start 'ora.gipcd' on 'ol-beta'
CRS-2676: Start of 'ora.cssdmonitor' on 'ol-beta' succeeded
CRS-2676: Start of 'ora.gipcd' on 'ol-beta' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'ol-beta'
CRS-2672: Attempting to start 'ora.diskmon' on 'ol-beta'
CRS-2676: Start of 'ora.diskmon' on 'ol-beta' succeeded
CRS-2676: Start of 'ora.cssd' on 'ol-beta' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'ol-beta'
CRS-2672: Attempting to start 'ora.ctssd' on 'ol-beta'
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'ol-beta'
CRS-2676: Start of 'ora.crf' on 'ol-beta' succeeded
CRS-2676: Start of 'ora.ctssd' on 'ol-beta' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'ol-beta' succeeded
CRS-2679: Attempting to clean 'ora.asm' on 'ol-beta'
CRS-2681: Clean of 'ora.asm' on 'ol-beta' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'ol-beta'
CRS-2676: Start of 'ora.asm' on 'ol-beta' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'ol-beta'     /* here it will hang for sometime as it is searching for the storage */
diskgroup OCRDG not mounted ()
CRS-5017: The resource action "ora.storage start" encountered the following error:
Storage agent start action aborted. For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/ol-beta/crs/trace/ohasd_orarootagent_root.trc".
CRS-2674: Start of 'ora.storage' on 'ol-beta' failed
CRS-2679: Attempting to clean 'ora.storage' on 'ol-beta'
CRS-2681: Clean of 'ora.storage' on 'ol-beta' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'ol-beta'
CRS-2677: Stop of 'ora.asm' on 'ol-beta' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'ol-beta'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'ol-beta' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'ol-beta'
CRS-2677: Stop of 'ora.ctssd' on 'ol-beta' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'ol-beta'
CRS-2677: Stop of 'ora.crf' on 'ol-beta' succeeded
CRS-4000: Command Start failed, or completed with errors.
Start ASM with PFILE
Since we have lost ASM spfile, ASM cannot start and hence CRS stack cannot start. Let's start the ASM instance with the basic parameters.

*.asm_power_limit=1
*.diagnostic_dest='/u01/app/grid/diag'
*.instance_type='asm'
*.large_pool_size=12M
*.remote_login_passwordfile='EXCLUSIVE'
$
[grid@ol-beta dbs]$ vi init+ASM2.ora
[grid@ol-beta dbs]$ cat init+ASM2.ora
*.asm_power_limit=1
*.diagnostic_dest='/u01/app/grid/diag'
*.instance_type='asm'
*.large_pool_size=12M
*.remote_login_passwordfile='EXCLUSIVE'
$
[grid@ol-beta dbs]$ export ORACLE_SID=+ASM2
[grid@ol-beta dbs]$ sqlplus / as sysasm

SQL*Plus: Release 12.1.0.2.0 Production on Tue Feb 7 21:35:24 2017

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL>
SQL>
SQL>  startup pfile=/u01/app/12.1.0/grid/dbs/init+ASM2.ora
ASM instance started

Total System Global Area 1140850688 bytes
Fixed Size                  2933400 bytes
Variable Size            1112751464 bytes
ASM Cache                  25165824 bytes
ORA-15032: not all alterations performed
ORA-15017: diskgroup "OCRDG" cannot be mounted
ORA-15040: diskgroup is incomplete
ORA-09968: unable to lock file
Unpin
Whilst starting up the ASM instance, if you encounter this error, there is an easy way to overcome.


SQL> startup pfile=/u01/app/12.1.0/grid/dbs/init+ASM1.ora
ORA-10997: another startup/shutdown operation of this instance inprogress
ORA-09968: unable to lock file
Linux-x86_64 Error: 11: Resource temporarily unavailable
Additional information: 10969

Solution: Remove this file $ORACLE_HOME/dbs/lkinst+ASM1' and then retry...                
Other diskgroups remain dismounted.
SQL>
SQL> select name, state from v$asm_diskgroup;

NAME                           STATE
------------------------------ -----------
FRADG                          DISMOUNTED
DATADG                         DISMOUNTED
Recreate OCRDG diskgroup
I will create diskgroup OCRDG as below:
SQL><>  
SQL> create diskgroup OCRDG external redundancy disk 'ORCL:ASMDISK_SDB1' attribute 'COMPATIBLE.ASM'='12.1';
Diskgroup created.

SQL> alter diskgroup fradg mount;
Diskgroup altered.

SQL> alter diskgroup datadg mount;
Diskgroup altered.
Or you can mount all diskgroups using a single command 'SQL> alter diskgroup all mount;'
SQL>
SQL> select name, state from v$asm_diskgroup;

NAME                           STATE
------------------------------ -----------
FRADG                          MOUNTED
OCRDG                          MOUNTED
DATADG                         MOUNTED
Let's create the SPFILE by now.
SQL>
SQL> create spfile='+OCRDG' from pfile ;
File created.
OCR Restore from backup
With the ASM up and running and the OCRDG being available now, we can restore the OCR from the manual backup as below:
$<>  
[root@ol-beta shanma-cluster]# ocrconfig -restore /u01/app/12.1.0/grid/cdata/shanma-cluster/backup_20170207_202642.ocr
We can add the votedisk with the below command:
$
[root@ol-beta shanma-cluster]# crsctl replace votedisk +OCRDG
Successful addition of voting disk 275a0fed6ed64f58bffc232858329972.
Successfully replaced voting disk group with +OCRDG.
CRS-4266: Voting file(s) successfully replaced
$
[root@ol-beta shanma-cluster]# crsctl stop crs -f
Now I can start the OHASD and CRS stack in both nodes.
$
[root@ol-beta ~]# crsctl start has
[root@ol-alpha ~]# crsctl start has
Let's take a quick look at the OCR and VOTEDISK.
$
[grid@ol-alpha ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          4
         Total space (kbytes)     :     409568
         Used space (kbytes)      :       1480
         Available space (kbytes) :     408088
         ID                       :  989059424
         Device/File Name         :     +OCRDG
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check bypassed due to non-privileged user
$
[grid@ol-alpha ~]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   275a0fed6ed64f58bffc232858329972 (ORCL:ASMDISK_SDB1) [OCRDG]
Located 1 voting disk(s).
$<>  
[grid@ol-alpha ~]$ scripts/crs_res.sh

RESOURCE NAME                 RESOURCE TYPE                 TARGET              STATE
============================= ============================= =================== =============================
ora.DATADG.dg                 ora.diskgroup.type            ONLINE              ONLINE on ol-alpha
ora.DATADG.dg                 ora.diskgroup.type            ONLINE              ONLINE on ol-beta
ora.FRADG.dg                  ora.diskgroup.type            ONLINE              ONLINE on ol-alpha
ora.FRADG.dg                  ora.diskgroup.type            ONLINE              ONLINE on ol-beta
ora.LISTENER.lsnr             ora.listener.type             ONLINE              ONLINE on ol-alpha
ora.LISTENER.lsnr             ora.listener.type             ONLINE              ONLINE on ol-beta
ora.LISTENER_SCAN1.lsnr       ora.scan_listener.type        ONLINE              ONLINE on ol-beta
ora.LISTENER_SCAN2.lsnr       ora.scan_listener.type        ONLINE              ONLINE on ol-alpha
ora.LISTENER_SCAN3.lsnr       ora.scan_listener.type        ONLINE              ONLINE on ol-alpha
ora.MGMTLSNR                  ora.mgmtlsnr.type             ONLINE              ONLINE on ol-beta
ora.OCRDG.dg                  ora.diskgroup.type            ONLINE              ONLINE on ol-alpha
ora.OCRDG.dg                  ora.diskgroup.type            ONLINE              ONLINE on ol-beta
ora.asm                       ora.asm.type                  ONLINE              ONLINE on ol-alpha
ora.asm                       ora.asm.type                  ONLINE              ONLINE on ol-beta
ora.cvu                       ora.cvu.type                  ONLINE              ONLINE on ol-alpha
ora.mgmtdb                    ora.mgmtdb.type               ONLINE              OFFLINE
ora.net1.network              ora.network.type              ONLINE              ONLINE on ol-alpha
ora.net1.network              ora.network.type              ONLINE              ONLINE on ol-beta
ora.oc4j                      ora.oc4j.type                 ONLINE              ONLINE on ol-alpha
ora.ol-alpha.vip              ora.cluster_vip_net1.type     ONLINE              ONLINE on ol-alpha
ora.ol-beta.vip               ora.cluster_vip_net1.type     ONLINE              ONLINE on ol-beta
ora.ons                       ora.ons.type                  ONLINE              ONLINE on ol-alpha
ora.ons                       ora.ons.type                  ONLINE              ONLINE on ol-beta
ora.scan1.vip                 ora.scan_vip.type             ONLINE              ONLINE on ol-beta
ora.scan2.vip                 ora.scan_vip.type             ONLINE              ONLINE on ol-alpha
ora.scan3.vip                 ora.scan_vip.type             ONLINE              ONLINE on ol-alpha
ora.shannura.db               ora.database.type             OFFLINE             OFFLINE
ora.shannura.db               ora.database.type             OFFLINE             OFFLINE
Rebuilding _mgmtdb
Now all the resources are up except mgmtdb and ora.shannura.db. No worries with ora.shannura.db since TARGET is itself OFFLINE which we can manually start it up. However, we need to fix mgmtdb because the datafiles and spfiles associated with the mgmtdb were lost earlier we we destroyed ASMDISK_SDB1.
$
[grid@ol-alpha ~]$ srvctl status mgmtdb
Database is enabled
Database is not running.
$
[grid@ol-alpha ~]$ srvctl disable mgmtdb
[grid@ol-alpha ~]$ srvctl remove mgmtdb
Remove the database _mgmtdb? (y/[n]) y
$<>  
[grid@ol-alpha ~]$ $ORACLE_HOME/bin/dbca -silent -createDatabase -templateName MGMTSeed_Database.dbc -sid -MGMTDB -gdbName _mgmtdb -storageType ASM -diskGroupName +OCRDG -datafileJarLocation $ORACLE_HOME/assistants/dbca/templates -characterset AL32UTF8 -autoGeneratePasswords -oui_internal
Registering database with Oracle Grid Infrastructure
5% complete
Copying database files
7% complete
9% complete
16% complete
23% complete
30% complete
37% complete
41% complete
Creating and starting Oracle instance
43% complete
48% complete
53% complete
57% complete
58% complete
59% complete
62% complete
64% complete
Completing Database Creation
68% complete
78% complete
89% complete
100% complete
Look at the log file "/u01/app/grid/cfgtoollogs/dbca/_mgmtdb/_mgmtdb2.log" for further details.
$
[grid@ol-alpha ~]$ srvctl status mgmtdb
Database is enabled
Instance -MGMTDB is running on node ol-alpha
Now, everything is back up and running. Anyway, database resource I haven't started it yet.
$
[grid@ol-alpha ~]$ srvctl start database -database shannura
[grid@ol-alpha ~]$ srvctl status database -database shannura
Instance shannura1 is running on node ol-alpha
Instance shannura2 is running on node ol-beta
$<>  
[grid@ol-alpha ~]$ scripts/crs_res.sh

RESOURCE NAME                 RESOURCE TYPE                 TARGET              STATE
============================= ============================= =================== =============================
ora.DATADG.dg                 ora.diskgroup.type            ONLINE              ONLINE on ol-alpha
ora.DATADG.dg                 ora.diskgroup.type            ONLINE              ONLINE on ol-beta
ora.FRADG.dg                  ora.diskgroup.type            ONLINE              ONLINE on ol-alpha
ora.FRADG.dg                  ora.diskgroup.type            ONLINE              ONLINE on ol-beta
ora.LISTENER.lsnr             ora.listener.type             ONLINE              ONLINE on ol-alpha
ora.LISTENER.lsnr             ora.listener.type             ONLINE              ONLINE on ol-beta
ora.LISTENER_SCAN1.lsnr       ora.scan_listener.type        ONLINE              ONLINE on ol-beta
ora.LISTENER_SCAN2.lsnr       ora.scan_listener.type        ONLINE              ONLINE on ol-alpha
ora.LISTENER_SCAN3.lsnr       ora.scan_listener.type        ONLINE              ONLINE on ol-alpha
ora.MGMTLSNR                  ora.mgmtlsnr.type             ONLINE              ONLINE on ol-alpha
ora.OCRDG.dg                  ora.diskgroup.type            ONLINE              ONLINE on ol-alpha
ora.OCRDG.dg                  ora.diskgroup.type            ONLINE              ONLINE on ol-beta
ora.asm                       ora.asm.type                  ONLINE              ONLINE on ol-alpha
ora.asm                       ora.asm.type                  ONLINE              ONLINE on ol-beta
ora.cvu                       ora.cvu.type                  ONLINE              ONLINE on ol-alpha
ora.mgmtdb                    ora.mgmtdb.type               ONLINE              ONLINE on ol-alpha
ora.net1.network              ora.network.type              ONLINE              ONLINE on ol-alpha
ora.net1.network              ora.network.type              ONLINE              ONLINE on ol-beta
ora.oc4j                      ora.oc4j.type                 ONLINE              ONLINE on ol-alpha
ora.ol-alpha.vip              ora.cluster_vip_net1.type     ONLINE              ONLINE on ol-alpha
ora.ol-beta.vip               ora.cluster_vip_net1.type     ONLINE              ONLINE on ol-beta
ora.ons                       ora.ons.type                  ONLINE              ONLINE on ol-alpha
ora.ons                       ora.ons.type                  ONLINE              ONLINE on ol-beta
ora.scan1.vip                 ora.scan_vip.type             ONLINE              ONLINE on ol-beta
ora.scan2.vip                 ora.scan_vip.type             ONLINE              ONLINE on ol-alpha
ora.scan3.vip                 ora.scan_vip.type             ONLINE              ONLINE on ol-alpha
ora.shannura.db               ora.database.type             ONLINE              ONLINE on ol-alpha
ora.shannura.db               ora.database.type             ONLINE              ONLINE on ol-beta
Please write your comment if this article was useful.

Shannura

/
You might want to read this:
Oracle Cluster Registry (OCR)