Saturday, January 26, 2008

Debugging DB2 Programs

You are sound asleep only to be awoken by your beeper in the middle of the night with a critical job being down. Before you start panicking, what steps would you follow to get it resolved, so you can go to bed sooner?

The first part is to narrow down the problem to where the error occurred. If this were an IMS batch job (a BMP) using COBOL, DB2 and MQ, the error can be in any of the following – COBOL, DB2, MQ or in IMS or data issues.

The first step is to look in SDSF or in JHS and to locate the job step and proc step that abended. This will tell you the driver Then if there is a SYSOUT corresponding to this step, take a look to see if there is anything that can give you a clue as to where the error might be. If the error is the COBOL program, usually S0C4 or S0C7 abends, you would see the program where the abend occurred, followed by the offset or the line number causing the abend. You can then identify what happened by taking a look at the compile listing and the offending line number.

Another good location to start digging around for information is the CEEDUMP. This will give you a dump of all the working storage and other variables that were in play when the abend happened. This is a good place to stop and search since it can give you pretty much everything, if not all, to debug the issue.

If this were a DB2 related error, one of the first steps to look for is the SQLCA statement in the CEEDUMP. As you may remember, the SQLCA (SQL Communication Area) contains a set of variables that are populated after each and every SQL call. The SQLCODE will give you the result of the SQL statement that was executed. A zero return code indicates that the SQL was executed successfully, a negative return code indicates there was an error, and a positive return code indicates that the SQL returned a warning. A +100 indicates that there are no more records to be selected or fetched. For a detailed list of the different SQLCODEs, please click this link, which will take you to the IBM DB2 website.

Another good place to look for detailed information, especially if you have a resource unavailable condition (-904), a deadlock situation (-911), or a timeout situation (-913), is the DB2 Master Log. You can see this on SDSF subsystem, and the naming convention of the DB2 Master Log is +MSTR (four letter DB2 subsystem + MSTR for Master). If you are in a data sharing system, you can identify the LPAR where the abending job ran, and then identify the corresponding DB2 Master Log by going to SDSF to identify the *MSTR log corresponding to the LPAR. Once you are in the MSTR log, you can scroll down to the time when the job abended to find out more details about the abending job.

If your job was down with a deadlock (-911 SQLCODE), you would see a message/messages similar to the one below. In the example below, the job (or in this example the IMS started task) IMSXMX01 ended abnormally because the plan QREMLPCU was deadlocked with another plan QREMLPC1 causing the job/started task IMSXMX01 to go down. The job IMSXMX01 would have received a -911 SQLCODE causing it to abnormally end. If you restart the abended job when the job using plan QREMLPC1 is complete, it should go to completion without any issues, unless of course, someone else started to use it, and the cycle continues.

DSNT375I -DB2T PLAN=PLAN0001 WITH 824

CORRELATION-ID=0064PLAN0001

CONNECTION-ID=IMSX

LUW-ID=CDN.DXXXX511.C1D49A76FB35=76163

THREAD-INFO=CSQQTRMN:*:*:*

IS DEADLOCKED WITH PLAN=PLAN0002 WITH

CORRELATION-ID=0008PLAN0002

CONNECTION-ID=IMSX

LUW-ID=CDN.DXXXX511.C1D49A76F1D7=76162

THREAD-INFO=CSQQTRMN:*:*:*

ON MEMBER DB2T

DSNT501I -DB2T DSNILMCL RESOURCE UNAVAILABLE 825

CORRELATION-ID=0064PLAN0002

CONNECTION-ID=IMSX

LUW-ID=CDN.DXXXX511.C1D49A76FB35=0

REASON 00C90088

TYPE 00000302

NAME Database.SpaceNam .X'700338'

DSN3201I -DB2T ABNORMAL EOT IN PROGRESS FOR 834

USER=CSQQTRMN CONNECTION-ID=IMSQ CORRELATION-ID=0064PLAN0002

JOBNAME=IMSXMX01 ASID=01A9 TCB=007B9490

If your job was down with a timeout (usually a -913 SQLCODE), you would see a message/messages similar to the one below. This error should disappear once you restart your abended job when the plan is released.

DSNT376I -DB2T PLAN=DSNRRSAF WITH 896

CORRELATION-ID=CORRELATION0

CONNECTION-ID=RRSAF

LUW-ID=CDN.DXXXX511.C1D49662F1CF=74270

THREAD-INFO=AB00002:*:*:*

IS TIMED OUT. ONE HOLDER OF THE RESOURCE IS PLAN=RRSAF WITH

CORRELATION-ID=CORRELATION0

CONNECTION-ID=RRSAF

LUW-ID=CDN.DHXXX511.C1D4961EF89E=71580

THREAD-INFO=AB00001:*:*:*

ON MEMBER DDT2

DSNT501I -DB2T DSNILMCL RESOURCE UNAVAILABLE 897

CORRELATION-ID=CORRELATION0

CONNECTION-ID=RRSAF

LUW-ID=CDN.DXXXX511.C1D49662F1CF=286685

REASON 00C9008E

TYPE 00000302

NAME DSNDB06 .SYSDBASE.X'014A7E'

If your job was down with a resource unavailable (-904 SQLCODE), you would see the detailed information on what resource was not available at the time of execution. In the example below, the job shown by the correlation ID is down with a -904. The method to fix this will depend on the REASON code. In the example below, 00C90097 means the tablespace is COPY PENDING, so if you take an image copy of the Tablespace under the database (Database.SpaceNam), and restart your abended job, the error should disappear.

DSNT501I -DB2T DSNIDBET RESOURCE UNAVAILABLE 557

CORRELATION-ID=IMSX

CONNECTION-ID=DB2CALL

LUW-ID=CDN.DHIPX511.C1D4BC05AA17=0

REASON 00C90097

TYPE 00000200

NAME Database.SpaceNam

There are multiple reasons for a -904. For each Reason Code, there are different steps to be followed. Usually, a Quick Reference (QW) on the Reason Code, or a Google search on the Reason Code will give you pointers on the detailed steps to fix the error situation.

The system log is another place to get a chronological sequence of information leading up to your abend. You can browse the system log by using the keyword “LOG” on the SDSF command line.

I hope the above set of examples have given some insight into debugging DB2 problems. In my next blog, I will cover basic IMS debugging techniques.

If you want additional information, or would like to see specific topics to be covered, please contact me via email at pkganapathi@yahoo.com.

1 comment:

Elav Ariv said...

Highly informative and well explained! I am happy that i stumbled upon this article. I got to share this with my teammates.