Link to home
Start Free TrialLog in
Avatar of pendell2
pendell2Flag for United States of America

asked on

DB2 9.0 recovery failure SQL1042C

So I had a customer complain that their DB2 Express-C database (version 9.0) wouldn't talk to them any more.  When I attempted a direct connection to the database, I found that it was suffering from 'unexpected system error'.  

So I attempted to restart the database.

db2 => restart database [dbname]
SQL1042C  An unexpected system error occurred.  SQLSTATE=58004

Puzzled, I attempted to restore an old backup of the database -- just throw away the log files and start back at a point when the database was good.

db2 => restore database [dbname] from d:\db2_backups taken at 20101001020000 with
out rolling forward
SQL2537N  Roll-forward is required following the Restore.

db2 => rollforward database [dbname] complete
SQL1015N  The database is in an inconsistent state.  SQLSTATE=55025

My next step was to check the db2diag.log file.   It told me that DB2 had attempted to recover from a crash and had failed to do so.  

I will include some relevant snippets below.

I am frankly out of ideas. Is there any way to make this system functional again?  Or should I start building a new database from scratch?

Respectfully,

Brian P.

Db2diag.log snippets:

2010-11-27-00.49.58.712000-480 I34928586H457      LEVEL: Warning
PID     : 2080                 TID  : 2880        PROC : db2syscs.exe
INSTANCE: DB2                  NODE : 000         DB   : [database]
APPHDL  : 0-7                  APPID: *LOCAL.DB2.101127084938
AUTHID  : [user]
FUNCTION: DB2 UDB, recovery manager, sqlprecm, probe:4000
MESSAGE : DIA2051W Forward phase of crash recovery has completed.  Next LSN is
          "000002A01DF9FFD1".

2010-11-27-00.49.58.760000-480 I34929045H399      LEVEL: Severe
PID     : 2080                 TID  : 2880        PROC : db2syscs.exe
INSTANCE: DB2                  NODE : 000         DB   : [database]
APPHDL  : 0-7                  APPID: *LOCAL.DB2.101127084938
AUTHID  : [user]
FUNCTION: DB2 UDB, recovery manager, sqlprecm, probe:4110
MESSAGE : Recovery did not reach the end of the logs!

2010-11-27-00.49.58.760000-480 I34929446H161      LEVEL: Severe
PID:2080 TID:2880 NODE:000 Title: SQLP_DBCB
Dump File:C:\PROGRA~1\IBM\SQLLIB\DB2\20802880.000

2010-11-27-00.49.58.760000-480 I34929609H515      LEVEL: Error
PID     : 2080                 TID  : 2880        PROC : db2syscs.exe
INSTANCE: DB2                  NODE : 000         DB   : MINIBAR
APPHDL  : 0-7                  APPID: *LOCAL.DB2.101127084938
AUTHID  : [user]
FUNCTION: DB2 UDB, recovery manager, sqlpresr, probe:640
DATA #1 : String, 142 bytes
Crash recovery failed with rc -2045771763
LowtranLSN 000002A01DF9F1FE MinbuffLSN 000002A0190F2F38
Recovery started on log file S0172056.LOG.

2010-11-27-00.49.58.823000-480 I34930126H433      LEVEL: Severe
PID     : 2080                 TID  : 2880        PROC : db2syscs.exe
INSTANCE: DB2                  NODE : 000         DB   : MINIBAR
APPHDL  : 0-7                  APPID: *LOCAL.DB2.101127084938
AUTHID  : [user]
FUNCTION: DB2 UDB, base sys utilities, recover, probe:16
DATA #1 : Hexdump, 4 bytes
0x04DE9704 : 0D00 1086                                  ....

2010-11-27-00.49.58.871000-480 I34930561H432      LEVEL: Error
PID     : 2080                 TID  : 2880        PROC : db2syscs.exe
INSTANCE: DB2                  NODE : 000         DB   : [database]
APPHDL  : 0-7                  APPID: *LOCAL.DB2.101127084938
AUTHID  : [database]
FUNCTION: DB2 UDB, base sys utilities, recover, probe:16
DATA #2 : Hexdump, 4 bytes
0x04DE9704 : 0D00 1086            



2010-11-27-00.49.58.871000-480 E34931408H392      LEVEL: Severe
PID     : 2080                 TID  : 2880        PROC : db2syscs.exe
INSTANCE: DB2                  NODE : 000         DB   : [database]
APPHDL  : 0-7                  APPID: *LOCAL.DB2.101127084938
AUTHID  : [user]
FUNCTION: DB2 UDB, base sys utilities, sqleMarkDBad, probe:10
MESSAGE : ADM7518C  "[database] " marked bad.

2010-11-27-00.49.58.871000-480 I34931802H405      LEVEL: Severe
PID     : 2080                 TID  : 2880        PROC : db2syscs.exe
INSTANCE: DB2                  NODE : 000         DB   : [database]
APPHDL  : 0-7                  APPID: *LOCAL.DB2.101127084938
AUTHID  : [user]
FUNCTION: DB2 UDB, base sys utilities, sqleMarkDBad, probe:210
MESSAGE : Database logging stopped due to mark db bad.
SOLUTION
Avatar of Kent Olsen
Kent Olsen
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of pendell2

ASKER

Kdo:  I'm not sure I understand your question. I believe it was a standard offline backup, much like the one done through DB2 control center.   I followed the standard protocol to restore a backup, but was unsuccessful.  I tried multiple backups going back a period of three months. Not one of them recovered cleanly.

I did indeed review the logs, but didn't see any obvious indication of a hardware error. Do you have a suggestion for a keyword I should be looking for?

Respectfully,

Brian P.
Okay, I found the script which was executed by Hibernate to perform the backup.

'backup db " + dbName + " to " + backupPath + " compress util_impact_priority 1 include logs'


It SHOULD have been on offline backup, but I am not able to confirm.

Respectfully,

Brian P.
Hi Brian,

What else runs on that machine?  Using the "util_impact_priority" clause suggests that other jobs may be running.


Kent
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@KDO: Nothing else runs on the machine at all. It's a standard Windows 2003 Server machine with DB2, our software that runs on top of DB2, and that's it.  

See the previous comment:  Modifying the restore command slightly allowed me to restore the database, although I still don't understand why "include logs" in the backup script worked!

Respectfully,

Brian P.
Hi Brian,

This seems like odd behavior.  It has been quite a while since I was forced to restore a DB2 database.  Testing the backup/restore process is somewhat clinical and all of the possible scenarios can never be scripted.

Since you're using DB2 Express-C, I doubt that you have a maintenance contract with IBM.  Let me dig around and see if I can find a better description.  I'd like to know too, as what you've described seems to be unintended behavior.


Kent
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@Mustaccio:

"The error "SQL2537N  Roll-forward is required following the Restore." indicates that you were indeed restoring an online backup. You must roll forward the database restored from an online backup at least to a point in time following the end of backup, so ROLLFORWARD .. COMPLETE understandably failed."

I'm willing to believe that it is an online backup. HOWEVER,

1) I am certain -- 100% positive -- that the script I posted is indeed the script being used to cause the backup.  

2)  According to the DB2 manual

http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/core/r0001933.htm

"ONLINE
    Specifies online backup. The default is offline backup. Online backups are only available for databases configured with logretain or userexit enabled. During an online backup, DB2 obtains IN (Intent None) locks on all tables existing in SMS table spaces as they are processed and S (Share) locks on LOB data in SMS table spaces. " 

So -- if I'm reading this right, since the ONLINE keyword is absent, this should be an offline backup, and since it's an offline backup, we should get an error message when 'include logs' is being sent.

But that isn't happening.  Instead, we are getting an online backup.  

Is the DB2 manual out of date? Does 'include logs' now implicitly specify an online backup in DB2 version 9.0, which is the version I'm using?

Respectfully,

Brian P.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@Mustaccio:

I've just done what you've suggested.   I think you'll find this very interesting, because the same command, done two different ways, results in either an online or an offline backup!

Here is what happened:

I logged on to db2 through the command line db2cmd and executed this command.  

'backup db [dbname] to d:\db2_backups compress util_impact_priority 1 include logs'

This failed with the error message "-ioptions not valid".

I trimmed off "include logs" and re-ran the command. The backup completed successfully, and LIST HISTORY verified that it was an offline backup. Just as we expected.

So the next thing I did was to start up the application software and schedule an immediate backup. I enabled debugging output to capture the exact command it would send to DB2.  That command was:

''backup db [db name] to d:\db2_backups compress util_impact_priority 1 include logs''

After the command completed,  I ran LIST HISTORY again. The last backup took place at 11 AM and was ONLINE.  

The difference between the two is that the first command was executed via IBM's command-line utility DB2 while the second was executed programmatically through Hibernate (http://www.hibernate.org/)

Curious, isn't it? Is that a problem with Hibernate or with DB2? Should someone be informed of that? IBM? Or the makers of Hibernate?

Respectfully,

Brian P.
I think it's the problem with Hibernate. BACKUP is not something you can execute as a regular SQL statement, because it is not SQL; it is a DB2 command line processor (CLP) command, which is only interpreted by CLP. I believe the Hibernate debugging output does not reflect the actual command that is being issued.
Are you suggesting that the script sent from hibernate to DB2 is not the same as the script sent from the source program to hibernate?  That hibernate has added additional arguments to the script to make it an online backup before actually sending it on to DB2?

Respectfully,

Brian P.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@mustaccio: I just did so, and you are correct. Executing the script as you describe it results in an online backup.

That is our answer, then. The admin command can only be performed as an online backup, and therefore 'include logs' does not result in an error. This is not unexpected behavior either by DB2 or by Hibernate;  the software is functioning as designed.

Respectfully,

Brian P.
Question successfully closed.