• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1855
  • Last Modified:

Write Batch Process using Java

Hi, I am currently working on an application where the requirement is to write a java batch program that would extract the records (37,000) every week from DB2 on OS390 and then validate a each record against the Oracle table and then insert that record to the other table in Oracle. The Oracle db is in Unix. So this has to be run weekly on Sunday night. Control M is used for trigerring the job. I am trying to work on the solution for the extract and load part.
Would it not be burden on the Db2 to oopen a  connection with 37000 records and do validation against each record. I am conceenred of any database locks that might be caused.

It would be really great if any of you can assist me in my requirement. This is a kind of urgent situation..
Thanks in advance.
0
gatamaneni
Asked:
gatamaneni
  • 18
  • 11
  • 5
  • +2
2 Solutions
 
Tommy BraasCommented:
There won't be any locks used for the select query to retrieve the data from DB2 unless you specifically create one. What row size do you have?

As far as reading everything, record by record, over the same connection, could result in connection time outs. Have your done a dry-run to estimate time, or any other time estimates for that matter?

What kind of validation are you doing on the records?

Additional concerns for the solution should be as follows:
1. Data growth. Current data size is 37,000 records. Will that remain true?
2. Character encoding issues (EBCDIC vs. UTF-8 vs. ISO vs. CP).

0
 
Mayank SAssociate Director - Product EngineeringCommented:
This is more of a DB question than a Java question :) post a link to it on the Database forums and you might get some more replies.
0
 
Mayank SAssociate Director - Product EngineeringCommented:
BTW, you can just write a Java program to do what you want and using the OS settings, schedule it to run after a specific duration.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
javaoptimizerCommented:
Hi  gatamaneni ,

it will be a burden on the db2, but isolation level matters, if you set it to Dirty read , then the performance will be high.

the performance also depends on the validation criteria, hence you can build your validation logic in the PL/SQL block.
Hence it will occur at the database level. But see up to that you are not validating any extra fields. i.e if possible let the validation be based on the primary key itself.

Also, one more idea , is that instead of running a batch process for all the 37,000 records split it into two or three parts based on some criteria like date and then perform batch process on each splitted set of records.
(you can have mutiple batch processes executed one after other with a predefined delay).

Cheers
0
 
mrigankCommented:
Since the number of rows is very high, Java might not be the ideal situation.

here is a way to go about it
a) Extract the data from db2 to a csv/excel file. there shall be tools to do it.
b) Load that file to an Oracle TEMP table using SQL Loader.
c) Do the validation in PL/SQL(Stored Procs).
d) Insert the validated records to the new table.

All this can be combined is a batch file to automate the entire process.

Have done the same and this way is much easier/faster than through Java code. As the number of records grow, using Java will lead to problems.
0
 
mrigankCommented:
And if you need to use Java, use CachedRowSets rather than ResultSets.

Pull the entire data set from a single query and Load the data onto a CachedRowSet. The CachedRowSet lives even after closing the connection. So you will not hog the connection forever.
0
 
mrigankCommented:
Do the validations after iterating through the CachedRowSet(similar to a ResultSet) and insert the data into the new table either by
a) Using Stored Procs
b) Using batch inserts and PreparedStatements to enhance performance.
0
 
gatamaneniAuthor Commented:
Hi mriqank, Thanks for the input. But i need to acheive this using Java only. To be more clear, let me explain how it needs to function

1. Base Query to be executed (Select 24 columns from 9 tables in DB2)
2. There will be around 37,000 records retrieved.
3. Assign the 24 colums into the local variables.
4. Now using this information, i need to run some more queries to get the remaining column information from Oracle tables/
5. After getting the other columns , now insert this record into the Oracle table.
6. Again take the second record and continue the process from 3 to 5.
7. After all the records are inserted, then i need to commit the table.

So if i try to do the same procedure in Java with all the above steps, it is taking almost one hour to complete. My main worry is when it runs in production environment, if it takes this long time, what would be the impact...
can you please give me the solution that i can implement with out much change to the technology. I don't have much time also to implement... I would really appreciate if any one you experts can dig into this.
0
 
mrigankCommented:
the first thing that you should do is maybe run a profiler through the code and fine out the exact place which takes the maximum amount of time.

I think the maximum time is consumed when  you fetch  the remaining column information from Oracle for each row. That would mean 37000 queries for each row. You need to optimise that part. Plus after getting the info you are performing about 37000 inserts. that is 37000*2 queries. Correct me if I am wrong  but that is a huge amount of queries.

You need to minimise these.
0
 
mrigankCommented:
The first step could be to get the remaining column data from Oracle in one query only.
Get the entire data into Java in one query rather than  querying for each row.

The other way is to add the Select clause in the Insert query itself.

Like

INSERT [INTO]
      {table_name} [a,b,c]
{1,2, SELECT * FROM SOME_TABLE WHERE SOME_CLAUSE}

that will reduce the number of queries to 37000.
0
 
mrigankCommented:
Now to further reduce the queries, you can use batch updates.
http://java.sun.com/docs/books/tutorial/jdbc/jdbc2dot0/batchupdates.html

Depending upon the batch size the number of queries will reduce to (37000/Batch_size).
0
 
mrigankCommented:
I do not if you can write Stored Procedures in your Oracle table, if yes, then that would optimise the timeby a huge margin.
0
 
mrigankCommented:
And intead of of creating 37000 in memory Java objects after the Select query to DB2, use a CacheRowSet to hold the values.

http://java.sun.com/j2se/1.5.0/docs/api/javax/sql/rowset/CachedRowSet.html
0
 
mrigankCommented:
So finally here it what we have

a) Execute the Select query.
b) Get the 37000 rows and put them in a CachedRowSet. (Creating individual objects for each row could lead to memory issues).
c) Iterate through the CachedRowSet
d) Create the insert query with the select query to fetch the remaining data embedded into the insert query.
e) Put the query in a batch.
f) Execute the batch if it is full.
g) Go to c.

So starting from 74000 queries, we have come down to 37000/10 = 3700 queries if  the batch size is 10. Let  me  know if it helps the performace. You might need to do a bit of trial and error to figure the best batch size. A big batch size is also not helpful.
0
 
mrigankCommented:
Also if you can ask your DBA and get the DB2 and Oracle tables properly indexed, then that would add to further enhancement to performance.

The columns being used in the where clauses could be indexed to start off.
0
 
gatamaneniAuthor Commented:
Wow mriqank,, thanks a lot for your prompt reply.. Actually i am getting some confidence after listening to your approach...I got the above points that you mentioned.

Now till C it is fine, iterating through the Cached Rowset. Now, The real problem is in D. I cannot do what you mentioned, because it is not just one select query that i can embed in the Insert stmt. It is more than that...

Actually there are 4 queries to be executed and apart from that, there are some basic validations like, validating the format of date that i retreived , validating the length of a partivcular field to be only 3 charactrers.. etc.. Based on the validations, a field will be set with error code.. So after this huge process is done, then all the columns would be inserted. finally...

so the it is my mistake that.. i made it sound as one select statement to be exceuted for the validation.. Actually i have pseudo code document.. It would be lot to paste it here in this block...

Can we still do the process as u mentioned above.. Now how to optimize the big validation block,, In the last phasse, this process is done in the COBOL programming(to extract the data from big query -DB2  OS390 HOST) and then (used PERL to validate and load into the Oracle --UNIX) as it is batch one.. Now they wanted the Java to take that place..
Am i clear atleast this time ...I think i am ppoor at expressing the problem.. Sorry for that mriqank.
0
 
mrigankCommented:
Validation can be done while  iterating through the Results and preparing the Insert statement.
That will not  be a perormance bottleneck.

The issue is the 4 queries that need to be performed.
Cos that means for each row you have 4 queries to Oracle, meaning a total of 37000*4 extra queries.

a) Cant you combine all these 4 queries into 1 query. That would bring down the queries in the first place(37000*3 queries lesser). It shouldnt be that difficult to do this.

Now do these 4 queries to Oracle take any data related to the db2 row that you have fetched, any id or something. I think it has to be yes or no need to run the queries for each row.

b) After combing the 4 select queries into 1, run the query for the rows that need to be put into the batch. Select * from whatever where db2_data_for_10_rows.
Now as you build the batch, add the data from the result set you get here.

This approach would have 7400 queries.
You currently have
1 + 37000*4 + 37000 = approx 185000 queries.
0
 
mrigankCommented:
Even if you can do a) as mentioned in my previous post,
you will have approx 1 + 37000 + 3700 = 40000 queries,
15 minutes to run everything ;).
0
 
gatamaneniAuthor Commented:
mriqank, I cannot combine the 4 queries into One. 2 queries are in Oracle, One in DB2 and the othe r in Oracle again but diffeerent db. So i canot join them. Is  there any way i can mail you or show you my psedo code that i wrote .so that you will have better idea as what i am trying to do..
0
 
mrigankCommented:
Then you need to pull all the data from those tables in multiples. Like pull data for 10 rows together in one shot.
0
 
mrigankCommented:
You need to increase the max heap size of your JVM as well.
-Xmx 512
should be fine and  not give any OutOfMemory issues.
0
 
gatamaneniAuthor Commented:
Hi mriqank, you mentioned to use a CacheRowSet to hold the values from the first base DB2 query right... Can you plrease let me know if there are any cons in this approach. Like fire the base query in DB2 and then retreive all the records and then insert them into a hash table as databeans and then save them and close the connection. So from now i can use this hash table as my holder right ? DO u see any issue in this. This is for my first part.. not for the above validation part.
0
 
mrigankCommented:
If you use a CachedRowSet, then you will have one object and the data(37000 rows) will be part of it.

By your approach, you will have 37000 databean objects.

The only issue(could be a big one) here is the amount of memory the 37000 databean objects will take. could lead to memory issues. Object creation will be a memory intensive process.


If you do not want to use CachedRowSet, then do not have 37000 databean objects.
Instead keep a 2-d array inside an object and copy the result set into the 2-d array(like a table).
0
 
gatamaneniAuthor Commented:
Thanks that makes it clear that Hashtable might not be a good idea to go for. So then i shall approach it using the CachedRowset. Is ther any jar file or something that i  need to import into my code base before using this CRs ?
0
 
Mayank SAssociate Director - Product EngineeringCommented:
You need to download the rowset.jar from Sun.
0
 
Mayank SAssociate Director - Product EngineeringCommented:
0
 
gatamaneniAuthor Commented:
Thanks for the update. Do you really think that reaplacing the COBOL and Perl jobs to rewrite in Java with records involving around 40000 advisable.  BEcause Java is not built for Batch jobs right ? Is there any place or document that tells why COBOL is better for batch related stuff to implement rather than Java ... Because,, i see a long time for construction since i need to develop everything from scratch where as if it is in COBOL, then it is robust and most of the functions for huge data extractions and FTP might be in built in Mainframes  right ? I get often questions from my boss asking why i am giving more estimate ?.,, I am not able to defend that statement to the fullest ?
0
 
mrigankCommented:
Bulk updates of such huge amounts of data are not suitable  with Java.
If performance is not an issue, then go ahead with Java. But do not expect real quick batch updates with Java.
PL SQL can ease that a bit.
Java can be used for validations but that would make the process disjoint.
Try what has been mentioned above to see how much you gain.

Java is easier to maintain though unlike COBOL and Perl jobs.

I am not sure about COBOL performances.
0
 
gatamaneniAuthor Commented:
I tried using the CRS and the time it is taking is 6 minutes.. Is it not too much of time ?
0
 
gatamaneniAuthor Commented:
see the code that i executed to find out the time..

import javax.sql.rowset.CachedRowSet;
import com.sun.rowset.CachedRowSetImpl;
import java.sql.*;
import java.util.Date;
import COM.ibm.db2.jdbc.app.DB2Driver;
public class MyClass {
 public static void main(String arg[])
 {
  try
  {
   System.out.println("The start time is "+new Date());
   Class c = Class.forName("COM.ibm.db2.jdbc.app.DB2Driver");
   CachedRowSet crs = new CachedRowSetImpl();
   System.out.println("Cached rowset created");
   crs.setUrl("jdbc:db2:db2t");
   crs.setUsername("Z018905");
   crs.setPassword("nov2005");
   crs.setCommand("SELECT AP.PO_I,AP.DEPT_I,SH.SCAC_C,SH.PRO_I,SH.DEST_LOC_I,SH.TCSS_SHIP_I FROM db2prod.TCSS_DLVRY_MNFST DM,
db2prod.FMS_CARRIER FC,
db2prod.TCSS_SHIP SH,
db2prod.FMS_CARRIER F2,
db2prod.TCSS_BOL_PO AP
WHERE DM.APPL_TYPE_C = 'CONS'
AND DM.LAST_UPDATE_D BETWEEN '2005-11-20'AND '2005-11-26'
AND FC.CARR_1 = DM.SCAC_C
AND FC.SHIP_METH_C = 'CON'
AND SH.CONS_SCAC_C = DM.SCAC_C
AND SH.DLVRY_MNFST_I = DM.DLVRY_MNFST_I
AND SH.DEST_LOC_I = DM.DEST_LOC_I AND
F2.CARR_1 = SH.SCAC_C AND
F2.SHIP_METH_C IN ('TRK','XTL','NMF','DED','PFL','LTL')
AND AP.TCSS_SHIP_I = SH.TCSS_SHIP_I
AND AP.DEST_LOC_I = SH.DEST_LOC_I");
   
crs.execute();
   System.out.println("After executing");
   System.out.println("The size of the fetch is "+crs.size());
   System.out.println("The end time is "+new Date());
  }
  catch(Exception ex)
  {
   ex.printStackTrace();
  }  
 }
}
0
 
Mayank SAssociate Director - Product EngineeringCommented:
Try with a result-set and then put it into a CRS:

ResultSet rs = stmt.executeQuery ( "SELECT...." ) ;
CachedRowSet crs = new CachedRowSet () ;
crs.populate ( rs ) ;
rs.close () ;
0
 
gatamaneniAuthor Commented:
Is there anything possible with ETL extract using Datastage way...This option was mentioned by one of my team mate to extract the data..
0
 
gatamaneniAuthor Commented:
Hi mrigank,,, Thanks for your constant help.. Couple of updates for you...I tried using as mentioned above... First execute the resultset.. It takes around 5 min to get that RS.. Then populate all the resultset data into the CRS... Here it is taking around 2min... Is this fine or not ? Now i need to do some validation points and see how much it might take to complete the process..
Please let me know ur feedback..
0
 
mrigankCommented:
Is it possible to get your db2 tables indexed.
If not then that is the best :( using Java, I guess.
0
 
gatamaneniAuthor Commented:
hi mrigank, I am experiencing a new problem. I was implementiong the above problem and wass trying to insert in to the oracle table after my va;idations are done with the other tables...
But now i am running into a different problem.. I am getting the exception "ORA-01000: maximum open cursors exceeded"//

What should i do..Can any one help me..
0
 
mrigankCommented:
ORA-01000: maximum open cursors exceeded
Cause: A host language program attempted to open too many cursors. The initialization parameter OPEN_CURSORS determines the maximum number of cursors per user.
Action: Modify the program to use fewer cursors. If this error occurs often, shut down Oracle, increase the value of OPEN_CURSORS, and then restart Oracle.
0

Featured Post

Upgrade your Question Security!

Add Premium security features to your question to ensure its privacy or anonymity. Learn more about your ability to control Question Security today.

  • 18
  • 11
  • 5
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now