Avatar of bhagatali
bhagataliFlag for United States of America

asked on 

Will SAVEPOINT impact my application performance?

Hi,

We are developing a program which reads from a table, processes the data in a COBOL/RPG program, stores the records in an array and once a certain record count (lets say 2500 records) is reached we INSERT those records in block into the target table. We COMMIT after every block INSERT.

For error handling processes, if one record fails during the COBOL/RPG processing of a block of 2500 (say for instance that the 2200th record failed), we want to COMMIT all prior records and not COMMIT the record that is being processed (the one that failed). To do this efficiently, i was hoping to set a SAVEPOINT after every successful record process. When a record in a block fails, i COMMIT till the last SAVEPOINT.

My question is, will it impact my program performance if I add a SAVEPOINT after every successful record that is processed? We are anticipating this program to process around 200 million records.

An alternate approach would then be to not use SAVEPOINT. I would COMMIT after every block INSERT and when there is an error in processing an enrollee. In this case, I will have to go back into the target table and remove the changes for the last partially processed record.

Any pointers?

Regards
Ali.
DB2IBM System iCOBOL

Avatar of undefined
Last Comment
Gary Patterson, CISSP
Avatar of momi_sabag
momi_sabag
Flag of United States of America image

The best approach will probably be to load those records using the LOAD utility.
You might want to look into MULTI ROW Insert feature of db2
you can decide there that you want the operation to be non-atomic which means - if some rows fail, all the other rows are inserted, which sounds like what you are looking for
Avatar of Member_2_2484401

I think your final approach will work the best. Doint a SAVEPOINT after every successful INSERT would most likely take way too long.

HTH,
DaveSlash
ASKER CERTIFIED SOLUTION
Avatar of Gary Patterson, CISSP
Gary Patterson, CISSP
Flag of United States of America image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of bhagatali
bhagatali
Flag of United States of America image

ASKER

Hi Gary,

Thanks for the response. You have raised some very very interesting points. Something that I could have never thought of. Turning the commitment control off is a fantastic option.

The program that we are developing is a reporting program that is used to generate financial auditing reports. The report is data intensive as it calculates details of every enrollee in our system starting from the year 2006.

Our design approach is to have key columns from specific application tables populated into a MQT. This MQT becomes the base for our report creation. The COBOL/RPG program that we are developing would read data from this MQT, process the financial calculation and then INSERT into a target table. As you rightly said, it is more of a "bulk insert". We are only using "bulk insert" because of the performance gains that we have seen when using this model. These records that are being inserted do not have a "all-or-none" requirement. We are still working on deriving the optimal block size (i used 2500 merely as an example).

These programs are run on an AS400 platform in batch mode using DB2/400 as the database. We read the data from the existing production schema, store that into an MQT and continue processing the reports from that MQT.

Please feel free to share any more expertise that you may have on this subject.

Regards
Ali.
Avatar of Member_2_276102
Member_2_276102

I tend to agree with Gary. A 200M insert process sounds more like database population rather than simply processing batches of "transactions".

It seems as if simply getting the load completed and reserving time for potential cleanup afterward will work as well as things possibly can.

Do you expect to do this on a regular basis with this many rows?

What is the OS version on your AS/400? That may enable different options.

Tom
My specialty is performance:  System, application, and network.  I work primarily with clients that are experiencing performance problems in applications and systems containing AS/400's.

One thing I've learned is that there is no one "best strategy" for developing applications like this.  Optimizing performance usually forces trade-offs between coding ease, maintainability, and performance.  When developing big, batch-oriented data-intensive applications like this, I can give you some general rules, though, if performance is a major concern:

In developing this type of application, the goal is always to minimize IO.  Minimize trips to disk, and minimize sending data over a network.  Block reads and writes, when possible.

Process files only once, if possible, and don't be afraid to resort to "old school" techniques: physically sorting input files into the correct sequence (I still occasionally use the FMTDTA command!) to allow blocked sequential read processing in your application can produce amazing performance results, especially when your application needs to process more than about 80% of the records in a given table.  Use blocked writes to optimize write performance.  I have to admit that I occasionally strip SQL out of applications and revert to record level access when I need to optimize application performance.  (Please don't tell my friends!)

Reduce the IO burden on the system as a whole and minimize synchronous IO during very large-scale batch processing by avoiding commitment control, disabling journaling for the batch-inserted files, using blocked writes, temporarily removing indexes over output files, temporarily disabling trigger programs (and allowing your batch process to efficiently perform the trigger logic if needed), etc.

An MQT can also serve many of these functions, if properly implemented.  Here's a nice write-up on MQT's  on the AS/400:

http://public.dhe.ibm.com/partnerworld/pub/pdf/courses/438a.pdf

- Gary Patterson



DB2
DB2

IBM DB2 is a family of relational database server products developed by IBM that have been extended to support object-relational features and non-relational structures like JSON and XML. There are three main products in the DB2 family: DB2 for Linux, UNIX and Windows (informally known as DB2 LUW), DB2 for z/OS (mainframe), and DB2 for i (formerly OS/400), plus a version for IBM's Virtual Storage Extended (VSE) operating system.

6K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo