Link to home
Start Free TrialLog in
Avatar of juliakir
juliakir

asked on

Sybase, SQL Server, 11.9, large data load

Hello experts,

I need to load 1.9 mil records nightly from DB2 to Sybase.
To prevent lose data in case load fails I would like to have 2 tables active and staging. During data load I would like to populate staging table,once load is done I would like to move all data to the  main table.

What is the best approach for this ? What is the fastest way to move 1.9 mil records in Sybase from one table to another without logging.

Appriciate all help,
Thanks
Avatar of grant300
grant300

I have used this method before with good success.  There is, however, probably no way to accomplish getting the rows into the production table without logging unless you are completely wiping out the active table each night.

I assume you are getting a flat file (tab delimited) dump of the data from DB2.  I also assume you are on a fairly small machine or this wouldn't be an issue since 1.9 million records is not very much by todays standards.

The problem with BCP is that you lose an entire batch, potentially 1000 or more records, any time there is an error on one record in the batch.  This usually happens when there is an invalid value for the datatype you are trying to stuff it into.  Dates are notorious for this as are numeric values that are stored in character fields in the source system.  To get around that I typically define all the fields in the staging table as varchar() so that BCP should never encounter an error due to data conversion.

The stored proc that moves the data from staging to active has to do so a record at a time in order to catch single row errors.  There is way to avoid logging those inserts though you can minimize the impact by dropping the indexes on the target table then rebuilding them after the process is complete.  This only works if you don't need them for your staging-to-active load process.

I have built the staging-to-active procedure a couple of different ways but the common thread is that you need to open a cursor on the staging table and perform an insert into the active table for each row.  That is the point at which you are doing all the type conversions and can deal with each error and log it if required.  Normally you want to stay away from cursors and use set operations but this is one case where you need to touch each row individually.

I recently wrote a benchmark program for a client that read 150,000 rows from a cursor and called a stored procedure that did some minor stuff and then performed the Insert.  This was Sybase 15 on a 2-CPU Dual-Core Opteron machine running Linux.  I was attempting to figure out where the time was going for one of there load processes that called the save procedure from a C++ program.  The results were very interesting.  Called from C++, inserting 150,000 rows took 120 seconds.  My benchmark program took 50 seconds.  The benchmark program without the call to the save procedure took 12 seconds.  The benchmark program calling the save procedure but with the Insert statement commented out took 35 seconds.  Based on that, the cursor loop program and the Insert statement itself took just 27 seconds.  That is equal to 5500+ rows per second for a proc such as you will create.  BTW, the target table in this test has 62 columns and 5 indexes so it was not a trivial example.

Assuming your system is only 1/5th as fast, you can still count on 1000 rows a second or so in your staging-to-active procedure or just over half an hour run time.

BTW, the fastest way to move records from one table to another without logging (or rather with minimal logging) is to do a SELECT/INTO and have the "Select Into Bulk Copy" option turned on for your database.  The problem is that SELECT/INTO wants to create a new table; it doesn't work for existing tables.

Regards,
Bill
If you have network access to the DB2 system, another alternative is to use Sybase ETL Small Business Edition.  You could pull the rows directly from the source table, do any type conversions you need to, handle any errors, and load the target table directly.

Sybase ETL SBE runs on Windows and has a list price of $2995, making it very affordable.  You will undoubtedly find other uses for it as well.

Regards,
Bill
Avatar of juliakir

ASKER

grant300

Thank you very much . I am actually using Informatica ETL to move data from DB2 to Sybase .
Im my script I am dropping indexes , truncate table,running Informatica job and creating indexes/updating stats on the table .
Whole script takes ~ 50 miin to run . During this time table will not be available to users.
I would like to :
1. minimize time table is unavailable to users
2. in case load fails , still have data for the previous day
 
your help is greatly appriciated
Julia
ASKER CERTIFIED SOLUTION
Avatar of grant300
grant300

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Right now I am trying solution number #4
where I will have status table with one row  that will indicate that second table
is loading
Once it is done I will update flag

I will create view that will go to diff table depenig on the flag . Will let you know how it will go

Thanks a  lot