How to find out the wrong field whenever the job goes to MSGW?

I am facing decimal data errors while submitting few batch jobs because of these errors the jobs are going to MSGW. We are taking action 'G' - Getin to process these jobs whenever the jobs were in MSGW. After this the program is generating a dump spool file.Dump spool file shows that the variable is not having the data. But i am not able to find out which variable is having wrong data. I know that decimal data errors will occurs whenever we keyed invalid data.But the dump spool file is misleading the error statement number. Is there any way to find out the invalid data in the dump spool file or any other way to find out the root cause. Please help me to find out the exact field where the data was wrong.
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Gary PattersonVP Technology / Senior Consultant Commented:
First of all, for most modern RPG programs, responding with a "G" is almost ALWAYS the wrong thing to do.  

In an old RPG cycle program ("P"rimary file declared), this tells the program to just stop whatever it it doing and jump to the RPG cycle *GETIN routine and read the next record from the primary file.

Since most modern programs don't use the RPG cycle, and don't have a Primary file defined, what ends up happening is that the program stops whatever it is doing, and jumps to the first line of your "C" specs ans starts running from there.  Perhaps this particular program continues long enough to read another record from the problem file, or does some initialization that corrupts your dump.

Most programs aren't designed to just jump to the start of the "C" specs when an error has occurred, so there is no telling what happens next.

When a program throws an error, terminate it immediately with a "D"ump (best choice, if available), or a "C"ancel (if Dump is not an option).  This stops the current program program immediately, and returns an escape message to the program that called the current program.  That program can either handle the error and continue (if it is designed to do that), terminate gracefully, or thrown an unhandled exception of it's own.  Repeat teh "D" or "C" until you have unwound the whole call stack, or until you hit a program that was properly written to handle errors.

Now you can inspect you job logs and dumps and hopefully find some useful information.

If this is a recurring problem, there are some things you can do.  Make sure the program is compiled with a debugging view.  IF you are allowed to debug in your production environment, you can use the STRSRVJOB on the hung job (while it is still in MSGW status) and then enter debugging (STRDBG) commands to inspect the contents of program variables.

You can also use the WRKJOB command, option 14 (again, while the program is in still in MSGW), to view open files and see the relative record number of the last record read for each file.  Be aware that if the file is opened for sequential access only, the record number shown might not be the record number that the program is processing, due to record blocking.  In this case, use STRSRVJOB + STRDBG or dump the program and inspect the dump.

Of course, and even better approach is to modify the program so terminate gracefully and produce a dump when an error occurs, but, based on my experience, that is apparently asking too much of some programmers.

- Gary Patterson

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Compile the program and get yourself a listing.  Online or on paper.

The dump will tell you the statement number in error.

Find that statement on the compile listing.

Find the fields referenced on that line of code.

Find those fields in the dump.

One or more of them have data that is not numeric.
Probably 040404&

The previous replies are already going in a good direction, so I'll go in a different one.

One potential way to reduce the number of decimal data errors is to have better control over the entry of the data. You might not be able to change any of that programming, but you _might_ be able to recreate the files that your programs are reading. If those files were created with DDS, consider recreating them with SQL DDL instead. It might be possible to create essentially precise new copies.

One particular advantage of SQL DDL-defined tables is that column/data validation occurs at write-time, which makes it very difficult to get invalid decimal data into the table in the first place. For a DDS-defined file, field/data validation happens at read-time, which might be too late in your procedures.

Maybe that's all you need.

Gary PattersonVP Technology / Senior Consultant Commented:
As usual, Tom's got a great point:  "Garbage in, garbage out".  Now I want to amend my earlier comment:  "fix the data at the point of capture or at a system boundary, and then add error handling to deal with the unexpected".

- Gary Patterson
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Operating Systems

From novice to tech pro — start learning today.