Thursday, February 21, 2008

The COBOL Compiler – More hidden benefits

As we all know every COBOL program, when compiled, produces a compile output. The compile output shows a list of the severe errors, warnings and informational messages in your program. Severe errors terminate the compilation and throw a Return Code RC=12. Warnings and Informational messages throw a Return Code RC=4, and proceeds to link-edit and generates a successfully compiled and executable code.

So, do you think your program is ready to test just yet? If you answered yes, then you may be missing a very crucial check – reviewing of the compiler output for any warnings and informational messages.

A healthy habit after creating a new COBOL program, or making changes to an existing program, is to look closely at the COBOL compiler output to ensure that all the warnings have been handled.

In my previous blog, I showed how the compiler can be used to remove dead code from your programs. In this article, I will attempt to show how a closer look at the compiler warnings could have saved abends and promoted stability in our production regions.

Case 1:
One of the production support groups supporting an application got a page for a S0C4. A standard analysis of the dump revealed that an array had an index value was higher than what it was supposed to be. The group was initially very confident that they had checks in multiple places to prevent this from happening.

After an initial investigation of the compiler output for warnings, my eye caught the PERFORM statement shown below.

1 XXXXXX PERFORM 2005-MOVE-INPUT
IGYOP3094-W There may be a loop from the "PERFORM" statement at "PERFORM
7069.01)" to itself. "PERFORM" statement optimization was
not attempted.
1 XXXXXX THRU 2500-EXIT


At first glance, this statement looks correct, but if you take a very closer look, you will notice that the programmer intending to use 2005-EXIT, mistakenly coded 2500-EXIT. This caused the index, in one of the paragraphs between 2005-EXIT and 2500-EXIT, to be continually incremented and eventually caused it to go outside its array bounds and come down with a S0C4 in production. If the compiler warning was heeded, this S0C4 could have been prevented.

Case 2:
Another production support group was paged to a bridge because the users experienced heavy clocking in the IMS On-lines. The IMS support group, and they noticed that there was one IMS MPP that was sitting for more than 20+ minutes and not coming out (either successfully or a U0240). Because it never came out, it kept holding all the database locks, thereby preventing additional transactions to access those databases, which in turn caused the clocking to compound.

A Strobe report on the MPP revealed that it was looping in COBOL code and pointed to the set of statements enveloped by the PERFORM statement below.

PERFORM VARYING WS FROM 0 BY 1
UNTIL WS > 99
……. Additional statements
END-PERFORM
IGYPG3173-W The result of the comparison of operands WS and 99 is known at
compile time, based on the attributes of the operands. Unconditional
code was generated.

Even though nothing obvious jumps out from this innocent looking PERFORM statement, a closer look at the PIC clause of WS variable reveals that it is a 9(02) variable. This essentially means that no matter how you increment, the WS will NEVER be greater than 99 causing this PERFORM statement to go into an endless loop.

Because this loop was executed in an IMS MPP (online transaction), the IMS transaction did not get a U0240 timeout abend since the loop was in COBOL. Since IMS does not know that you are looping in COBOL code, you cannot ABDUMP this transaction leaving you with the only option of issuing an Omegamon KILL to free this transaction.

The solution in this case was to use a PIC 9(03) variable to allow the variable to “grow” beyond 99.

Had the developer paid close attention to the compiler output, such situations could have been prevented.

I could go on and on about the beauty of the COBOL compiler. The above two cases are just a fraction of what the COBOL compiler can does for you.

So, the next time you are paged to a bridge, take a look at the compiler listing of the programs that are involved, your solution might lie with what the soothsayer had “warn”ed you about!

Happy compiling!

PS: If you want additional information, or would like to see specific topics to be covered here, please contact me via email at pkganapathi@yahoo.com .

Thursday, February 14, 2008

The Scoop on -804 SQLCODEs

Ever wonder why your test programs are getting a sqlcode -804?


The Book Manager definition for -804 says that the "call parameter list or the SQLDA is invalid." However, SQLDA is used for programs that have dynamic SQL, but you still get -804 for programs that have only static SQLs? Why is this?


Well, a -804 sqlcode for programs with static SQL is analogous to a S0C7 situation. In addition to regular working storage variables getting corrupt with invalid data, the SQLDA also gets corrupt. This manifests as an SQL -804 when you encounter the next available SQL statement thereby leading you into a wild goose chase thinking it is a DB2 problem.


So, the next time you get a -804, look closely at the working storage variables and any arrays that you may have. The arrays may be overflowing and creeping its way into the SQLDA which is added just before the linkage by the system. You may use the compiler option SSRANGE to detect array overflows in the test system. Do not use the SSRANGE compile option for production compiles since this adds system overhead (the default compile option is NOSSRANGE).
If you want additional information, or would like to see specific topics to be covered, please contact me via email at pkganapathi@yahoo.com.

Wednesday, February 13, 2008

The COBOL Compiler – The hidden benefits

As programmers, all of us have the responsibility of removing non-executable (“dead code”) from the COBOL programs. Even though non-executable code does not affect processing, it still takes up DASD space, and clutters the program with paragraphs that are never going to get executed. This also causes confusion to others who are trying to debug the program or make changes to the program. If you are wondering, how you can find out which paragraphs and lines are code are not getting executed in a program that runs into thousands of lines, then this article is for you.

As we all know every COBOL program, when compiled, produces a compile output. The compile output shows a list of the severe errors, warnings and informational messages in your program. Severe errors terminate the compilation and throw a Return Code RC=12. Warnings and Informational messages throw a Return Code RC=4, and proceeds to link-edit and generates a successfully compiled and executable code.

So, how can we identify non-executable code in a program?

The easiest way would be to look at the end of the COBOL compile listing for the IGYOP3091-W message. The message looks similar to the one shown below.

15147 IGYOP3091-W Code from "procedure name 01000-UPDATE-DATABASES" to "
to "EXIT (line 15299.01)" can never be executed and was
therefore discarded.


The message tells you the lines of code that are never going to be executed. As you can see, the code between the lines 15147 and 15299 (over 150 lines!) is never going to be executed, and you can safely delete these lines from your program. You can be proud of yourself for saving valuable DASD space.

Have you ever had to solve a production abend caused by an infinite execution of a paragraph? Have you wondered whether there is an easy way to find out if certain paragraphs could cause infinite looping?

If you answered yes, the answer is again in the COBOL compiler output. During compilation, the compiler detects paragraphs that can cause a potential repetitive looping. Look for the warning message IGYOP3094-W similar to the one below:

60707 IGYOP3094-W There may be a loop from the "PERFORM" statement at
"PERFORM (line 60707.01)" to itself. "PERFORM" statement
optimization was not attempted.

The compiler is attempting to warn you that you may be trying to execute the PERFORM statement (in line 60707) from within the same paragraph, thereby causing a potential looping situation. This requires a closer look at the lines in and around the lines mentioned in the compile listing to see if there is a looping situation.

I could go on and on about the beauty of the COBOL compiler. The above two cases are just a fraction of what the COBOL compiler can does for you.

So, the next time your COBOL compilation ends successfully with a RC=4, don’t just stop. Take a closer look at all the warnings generated by the compiler. You might save yourself a phone call in the middle of the night!

Happy compiling!
If you want additional information, or would like to see specific topics to be covered, please contact me via email at pkganapathi@yahoo.com.

Saturday, January 26, 2008

Debugging DB2 Programs

You are sound asleep only to be awoken by your beeper in the middle of the night with a critical job being down. Before you start panicking, what steps would you follow to get it resolved, so you can go to bed sooner?

The first part is to narrow down the problem to where the error occurred. If this were an IMS batch job (a BMP) using COBOL, DB2 and MQ, the error can be in any of the following – COBOL, DB2, MQ or in IMS or data issues.

The first step is to look in SDSF or in JHS and to locate the job step and proc step that abended. This will tell you the driver Then if there is a SYSOUT corresponding to this step, take a look to see if there is anything that can give you a clue as to where the error might be. If the error is the COBOL program, usually S0C4 or S0C7 abends, you would see the program where the abend occurred, followed by the offset or the line number causing the abend. You can then identify what happened by taking a look at the compile listing and the offending line number.

Another good location to start digging around for information is the CEEDUMP. This will give you a dump of all the working storage and other variables that were in play when the abend happened. This is a good place to stop and search since it can give you pretty much everything, if not all, to debug the issue.

If this were a DB2 related error, one of the first steps to look for is the SQLCA statement in the CEEDUMP. As you may remember, the SQLCA (SQL Communication Area) contains a set of variables that are populated after each and every SQL call. The SQLCODE will give you the result of the SQL statement that was executed. A zero return code indicates that the SQL was executed successfully, a negative return code indicates there was an error, and a positive return code indicates that the SQL returned a warning. A +100 indicates that there are no more records to be selected or fetched. For a detailed list of the different SQLCODEs, please click this link, which will take you to the IBM DB2 website.

Another good place to look for detailed information, especially if you have a resource unavailable condition (-904), a deadlock situation (-911), or a timeout situation (-913), is the DB2 Master Log. You can see this on SDSF subsystem, and the naming convention of the DB2 Master Log is +MSTR (four letter DB2 subsystem + MSTR for Master). If you are in a data sharing system, you can identify the LPAR where the abending job ran, and then identify the corresponding DB2 Master Log by going to SDSF to identify the *MSTR log corresponding to the LPAR. Once you are in the MSTR log, you can scroll down to the time when the job abended to find out more details about the abending job.

If your job was down with a deadlock (-911 SQLCODE), you would see a message/messages similar to the one below. In the example below, the job (or in this example the IMS started task) IMSXMX01 ended abnormally because the plan QREMLPCU was deadlocked with another plan QREMLPC1 causing the job/started task IMSXMX01 to go down. The job IMSXMX01 would have received a -911 SQLCODE causing it to abnormally end. If you restart the abended job when the job using plan QREMLPC1 is complete, it should go to completion without any issues, unless of course, someone else started to use it, and the cycle continues.

DSNT375I -DB2T PLAN=PLAN0001 WITH 824

CORRELATION-ID=0064PLAN0001

CONNECTION-ID=IMSX

LUW-ID=CDN.DXXXX511.C1D49A76FB35=76163

THREAD-INFO=CSQQTRMN:*:*:*

IS DEADLOCKED WITH PLAN=PLAN0002 WITH

CORRELATION-ID=0008PLAN0002

CONNECTION-ID=IMSX

LUW-ID=CDN.DXXXX511.C1D49A76F1D7=76162

THREAD-INFO=CSQQTRMN:*:*:*

ON MEMBER DB2T

DSNT501I -DB2T DSNILMCL RESOURCE UNAVAILABLE 825

CORRELATION-ID=0064PLAN0002

CONNECTION-ID=IMSX

LUW-ID=CDN.DXXXX511.C1D49A76FB35=0

REASON 00C90088

TYPE 00000302

NAME Database.SpaceNam .X'700338'

DSN3201I -DB2T ABNORMAL EOT IN PROGRESS FOR 834

USER=CSQQTRMN CONNECTION-ID=IMSQ CORRELATION-ID=0064PLAN0002

JOBNAME=IMSXMX01 ASID=01A9 TCB=007B9490

If your job was down with a timeout (usually a -913 SQLCODE), you would see a message/messages similar to the one below. This error should disappear once you restart your abended job when the plan is released.

DSNT376I -DB2T PLAN=DSNRRSAF WITH 896

CORRELATION-ID=CORRELATION0

CONNECTION-ID=RRSAF

LUW-ID=CDN.DXXXX511.C1D49662F1CF=74270

THREAD-INFO=AB00002:*:*:*

IS TIMED OUT. ONE HOLDER OF THE RESOURCE IS PLAN=RRSAF WITH

CORRELATION-ID=CORRELATION0

CONNECTION-ID=RRSAF

LUW-ID=CDN.DHXXX511.C1D4961EF89E=71580

THREAD-INFO=AB00001:*:*:*

ON MEMBER DDT2

DSNT501I -DB2T DSNILMCL RESOURCE UNAVAILABLE 897

CORRELATION-ID=CORRELATION0

CONNECTION-ID=RRSAF

LUW-ID=CDN.DXXXX511.C1D49662F1CF=286685

REASON 00C9008E

TYPE 00000302

NAME DSNDB06 .SYSDBASE.X'014A7E'

If your job was down with a resource unavailable (-904 SQLCODE), you would see the detailed information on what resource was not available at the time of execution. In the example below, the job shown by the correlation ID is down with a -904. The method to fix this will depend on the REASON code. In the example below, 00C90097 means the tablespace is COPY PENDING, so if you take an image copy of the Tablespace under the database (Database.SpaceNam), and restart your abended job, the error should disappear.

DSNT501I -DB2T DSNIDBET RESOURCE UNAVAILABLE 557

CORRELATION-ID=IMSX

CONNECTION-ID=DB2CALL

LUW-ID=CDN.DHIPX511.C1D4BC05AA17=0

REASON 00C90097

TYPE 00000200

NAME Database.SpaceNam

There are multiple reasons for a -904. For each Reason Code, there are different steps to be followed. Usually, a Quick Reference (QW) on the Reason Code, or a Google search on the Reason Code will give you pointers on the detailed steps to fix the error situation.

The system log is another place to get a chronological sequence of information leading up to your abend. You can browse the system log by using the keyword “LOG” on the SDSF command line.

I hope the above set of examples have given some insight into debugging DB2 problems. In my next blog, I will cover basic IMS debugging techniques.

If you want additional information, or would like to see specific topics to be covered, please contact me via email at pkganapathi@yahoo.com.