Friday 30 November 2012

Nasty bug with Dataguard Setup via RMAN!

There is a very nasty bug that you will hit if you implement Dataguard using rman on an active database on 11.2. This set up method is describe in detail in Oracle doc: ID 1075908.1

The main task is performed via rman command:

duplicate target database for standby from active database

This invokes an Oracle provided procedure and includes the creation of a controlfile copy. Unfortunately, this controlfile copy uses the same name shared by controlfile snapshots that are taken as part of your routine rman backups. 

So what is the problem?

The problem, though tiny, has big implications for your database, especially your production databases. This applies if you use rman to manage obsolete/redundant backup sets. If so, you will be deleting said backups that fall outside of the retention policy using the following rman command:

delete obsolete;

Bad luck I'm afraid! This command will fail with:

RMAN-06207: WARNING: 1 objects could not be deleted for DISK channel(s) due
RMAN-06208:          to mismatched status.  Use CROSSCHECK command to fix status

since rman has the controlfile down as as snapshot, not a copy. Unfortunately, this is the first delete it tries to run, fails and quits. Unless you are super vigilant your backups will fill up the flash recovery area until your database comes to a halt! VERY BAD!

Solution

I raised a ticket with Oracle about this and currently there is no patch. So, they way I work around it is as follows. Just before running the duplicate command I rename the snapshot controlfile name. The once the Dataguard set up is complete, I rename the snapshot backup to its original name and delete the controlfile copy:

CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/snapcf_temp.f';

duplicate target database for standby from active database

CONFIGURE SNAPSHOT CONTROLFILE NAME TO default;
crosscheck controlfilecopy '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/snapcf_temp.f';
delete force noprompt expired controlfilecopy '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/snapcf_temp.f';

This steers clear of the original snapshot controlfile and my rman deletes continue to operate as normal.