Tuesday, August 30, 2011

Reinstate after failover due to lost of current redo log

I'd say that the most catastrofic situation is the lost of current redo log file group.
If we don't use Data Guard, then all thing we can do is incomplete recover, of cause if we have made backup.
However if we use Data Guard with Real-Time Log Apply - it's OK. All we need to do is Failover to standby database.
After we have database repaired, we may wish to return it to Data Guard as standby.
In this post I'll try to show lost of current redo log, failover, and getting back failed database to Data Guard
as new standby database and then as primary. So at the end we get back our databasesc as it was no corrupted redo.

As in my previouse posts we're on:
AIX 5.3
Oracle 10.2.0.5 EE
primary host - prod2
standby host - ilstal
primary database - T
standby database - TSTB
on both database I use Flash Recovery Area, Flashback, Data Guard with Real-Time Log-Apply

1. For the first I want to get current redo log file damaged. Let's make it!
prod2 $ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.5.0 - Production on Tue Aug 30 16:14:18 2011

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select group#, status from v$log;

    GROUP# STATUS
---------- ----------------
         1 INACTIVE
         2 INACTIVE
         3 CURRENT

SQL> !cat /dev/null > /oracle/oradata/T/redo03.log

SQL> insert into scott.t values('record maden after current redo have damaged');

1 row created.

SQL> commit;

Commit complete.

SQL> select * from scott.t;

A
--------------------------------------------------------------------------------
record maden after current redo have damaged

SQL> alter system archive log current;
alter system archive log current
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

-- at this point database is aborted
SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

Let's look in the alert...

prod2 $ tail /oracle/admin/T/bdump/alert_T.log
Errors in file /oracle/admin/T/bdump/t_lgwr_762024.trc:
ORA-00316: Message 316 not found; No message file for product=RDBMS, facility=ORA; arguments: [3] [1] [0]
ORA-00312: Message 312 not found; No message file for product=RDBMS, facility=ORA; arguments: [3] [1] [/oracle/oradata/T/redo03.log]
Tue Aug 30 16:15:08 EEST 2011
Errors in file /oracle/admin/T/bdump/t_lgwr_762024.trc:
ORA-00316: Message 316 not found; No message file for product=RDBMS, facility=ORA; arguments: [3] [1] [0]
ORA-00312: Message 312 not found; No message file for product=RDBMS, facility=ORA; arguments: [3] [1] [/oracle/oradata/T/redo03.log]
Tue Aug 30 16:15:08 EEST 2011
LGWR: terminating instance due to error 316
Instance terminated by LGWR, pid = 762024
Great!We screwed up current redo! :)

2. Now we have failover to standby database. Go to standby host (ilstal):

ilstal $ dgmgrl sys/password@TSTB
DGMGRL for IBM/AIX RISC System/6000: Version 10.2.0.5.0 - 64bit Production

Copyright (c) 2000, 2005, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected.
DGMGRL> failover to 'TSTB';
Performing failover NOW, please wait...
Failover succeeded, new primary is "TSTB"
DGMGRL>

3. What now? Now we need to know SCN to what we have to flashback our failed database.
It might be SCN just before damaged redo log.  Let's check it. Go on prod2:

prod2 $ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.5.0 - Production on Tue Aug 30 16:20:41 2011

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

Connected to an idle instance.

SQL> startup mount
ORACLE instance started.

Total System Global Area  612368384 bytes
Fixed Size                  2098312 bytes
Variable Size             171969400 bytes
Database Buffers          432013312 bytes
Redo Buffers                6287360 bytes
Database mounted.
SQL> select group#, status, FIRST_CHANGE# from v$log;

    GROUP# STATUS           FIRST_CHANGE#
---------- ---------------- -------------
         1 UNUSED                       0
         3 CURRENT                 950891
         2 INACTIVE                950812

-- SCN we need is 950891 - 2 =  950889
SQL> flashback database to scn 950889;

Flashback complete.

-- clear failed redo group

SQL> alter database clear logfile group 3;

Database altered.

4. Let's now go to standby host and get our flailed database back to Data Guard as standby database

ilstal $ dgmgrl sys/password@TSTB

DGMGRL for IBM/AIX RISC System/6000: Version 10.2.0.5.0 - 64bit Production

Copyright (c) 2000, 2005, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected.
DGMGRL> show configuration;

Configuration
  Name:                DG_T
  Enabled:             YES
  Protection Mode:     MaxPerformance
  Fast-Start Failover: DISABLED
  Databases:
    T    - Physical standby database (disabled)
    TSTB - Primary database

Current status for "DG_T":
SUCCESS

DGMGRL> show database 'T';

Database
  Name:            T
  Role:            PHYSICAL STANDBY
  Enabled:         NO
  Intended State:  ONLINE
  Instance(s):
    T

Current status for "T":
Error: ORA-16661: the standby database needs to be reinstated

-- It means we need to reinstate our failed (and now flashed back) primary database, other words we can return it to Data Guard as new standby database.

5. So let's reinstate.

DGMGRL> reinstate database 'T';
Reinstating database "T", please wait...
Reinstatement of database "T" succeeded
DGMGRL>
DGMGRL> show configuration;

Configuration
  Name:                DG_T
  Enabled:             YES
  Protection Mode:     MaxPerformance
  Fast-Start Failover: DISABLED
  Databases:
    T    - Physical standby database
    TSTB - Primary database

Current status for "DG_T":
SUCCESS
DGMGRL> enable database 'T';
Enabled.

6. Now we can make thing as it was no failed redo log

DGMGRL> switchover to 'T';
Performing switchover NOW, please wait...
Operation requires shutdown of instance "TSTB" on database "TSTB"
Shutting down instance "TSTB"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires shutdown of instance "T" on database "T"
Shutting down instance "T"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "TSTB" on database "TSTB"
Starting instance "TSTB"...
ORACLE instance started.
Database mounted.
Operation requires startup of instance "T" on database "T"
Starting instance "T"...
ORACLE instance started.
Database mounted.
Switchover succeeded, new primary is "T"
DGMGRL> show configuration;

Configuration
  Name:                DG_T
  Enabled:             YES
  Protection Mode:     MaxPerformance
  Fast-Start Failover: DISABLED
  Databases:
    T    - Primary database
    TSTB - Physical standby database

Current status for "DG_T":
SUCCESS

DGMGRL> show database 'T';

Database
  Name:            T
  Role:            PRIMARY
  Enabled:         YES
  Intended State:  ONLINE
  Instance(s):
    T

Current status for "T":
SUCCESS

DGMGRL> show database 'TSTB';

Database
  Name:            TSTB
  Role:            PHYSICAL STANDBY
  Enabled:         YES
  Intended State:  ONLINE
  Instance(s):
    TSTB

Current status for "TSTB":
SUCCESS

DGMGRL>

7. Check if we have no lost data.
prod2 $ echo $ORACLE_SID
T
prod2 $ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.5.0 - Production on Wed Aug 17 08:01:12 2011

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> select * from scott.t;

A
--------------------------------------------------------------------------------
record maden after current redo have damaged

That's all!


No comments: