During instance recovery, mounting a diskgroup can fail with ORA-600[KFCEMA02].
There is a mismatch between the FCN recorded in the block and the FCN recorded
in the ACD. block FCN < ACD fcn.
The top functions in the call stack are:
kfgInitCache -> kfcMount ->kfrcrv -> kfrPass2 -> kfcema
The trace file contains the FCN for the current block been recovered.
eg:
kfbh_kfcbh.fcn_kfbh = 0.5538283
BH: (0x3807959c0) bnum=13 type=FILEDIR state=rcv chgSt=not modifying
flags=0x00000000 pinmode=excl lockmode=null bf=0x38040c000
kfbh_kfcbh.fcn_kfbh = 0.5538283 lowAba=0.0 highAba=0.0
last kfcbInitSlot return code=null cpkt lnk is null
The ACD fcn is the second argument on the ORA-600 [KFCEMA02]
This patch does not fix a diskgroup with the error already introduced.
It will prevent future occurrences.
Hdr: 6163771 10.2.0.3 RDBMS 10.2.0.3 ASM PRODID-5 PORTID-23
Abstract: CANNOT MOUNT DISKGROUP DUE TO ORA-600 [KFCEMA02]
PROBLEM:
--------
The cusotmer had a maintenance window (for something else) this morning on
this development RAC. We could not shutdown cleanly. Then after the
maintenance window, FRA diskgroup would not mounted.
Hdr: 6163771 10.2.0.3 RDBMS 10.2.0.3 ASM PRODID-5 PORTID-23
Abstract: CANNOT MOUNT DISKGROUP DUE TO ORA-600 [KFCEMA02]
WORKAROUND:
-----------
N/A
REPRODUCIBILITY:
----------------
At will
STACK TRACE:
------------
ksedmp kgerinv kgeasnmierr kfcema kfrPass2 kfrcrv
kfcMount kfgInitCache kfgFinalizeMount 3088 kfgscFinalize kfgForEachKfgsc
kfgsoFinalize kfgFinalize kfxdrvMount kfxdrvEntry opiexe opiosq0
kpooprx kpoal8 opiodr ttcpip opitsk opiino
opiodr opidrv sou2o opimai_real...
SUPPORTING INFORMATION:
-----------------------
Alert log and trace file uploaded
PROGRAMMING DETAILS:
-----------------------
Development has found a bug in the way checkpoints are maintained and this
bug is the probable cause of the kfcema02 assert these customers are seeing.
We have a high degree of confidence that the bug we found is the cause of
the customer issues because of what we saw in the AMDU dumps.
The problem is that buffers on the ping queue are not sorted in any
particular order. The fix is for kfrbCkpt to scan the entire ping queue to
find the oldest buffer when computing the new checkpoint. kfcbDriver is
also updated to scan the entire ping queue when computing the targetAba for
kfcbCkpt, but that code change is not critical because the only effect of
having the targetAba be higher than it should be was that DBWR would write
more dirty buffer than it really needed to.
After reading this BLOG from awhile ago on ORACLE-L - I was not encouraged to say the least.
Reaching out to Oracle Support helped solved the problem with employing a 11g Tool (can also run on 10g) called: facp (and AMDU). AMDU was released with 11g, and is a tool used to get the location of the ASM metadata across the disks. As many other tools released with 11g, it can be used on 10g environments. Note 553639.1 is the placeholder for the different platforms. The note include also instructions for the configuration. It only needs to be configured (not run) for this fix since facp calls the AMDU.
Steps taken to resolve:
Transfer amdu and facp to a working directory and include it on LD_LIBRARY_PATH, PATH and other relevant variables.
Download the script facp from SR attachment.
Then, ACD Scanning and generation of pertinent files,
$./facp '/dev/oracleasm/disks*' 'DG6' ALL
And then it will generate files named like facp* in same directory.
Then try to adjust all checkpoints by 10 blocks:
./facp_adjust -10
Used after adjusting the checkpoints to verify they are valid.
$./facp_check
If you adjusted too much facp_check will not print "Valid Checkpoint". Try adjusting less.
Till get "Valid Checkpoint" for both thread.
Once facp_check reports "Valid Checkpoint" for all threads, it's the indication
to proceed with the real patching, which means, updating the ACD records
Write ASM metadata with the new data:
$./facp_patch
Then try to mount this diskgroup manually:
SQL> alter diskgroup DG6 mount; --------->> ASM sqlplus
SQL> select name,state from v$asm_diskgroup; --------->> ASM sqlplus
Everything showed MOUNTED and was able to bring up our Production DB.
If you experience this issue - log a SR with Oracle Support for these tools if not already on your system.
No comments:
Post a Comment