Importing DB2 BLOBs into SQL 2016 database very slow

Jim Youmans
Jim Youmans used Ask the Experts™
on
We have a DB2 10.5 system (on windows) that we are exporting to a SQL Server 2016 data warehouse on a daily basis using SSIS 2013.  The issue I am running into is that some of the tables in DB2 have BLOB columns (picture data) and when I use SSIS to import that to SQL Server it is very slow.  The main DB2 table has about 25,000 new records per day and it takes about 3 hours to move that from DB2 to SQL Server.

The main bottle neck is that SSIS has to write the BLOB to disk, then read it and store it in SQL Server.  I am averaging about 130 rows per second with both systems on solid state drives.  
 
My SSIS data flow is set up like so...

OLE DB Data Souce connected to DB2 10.5
Data Converison task to change DB2 BLOB datatype to Unicode Text Stream (DT_NTEXT)
Derived Column to fix DB2 date to SQL date issue
OLE DB Data Dest to SQL Server 2016

I had the SSIS BLOB storage on C:\temp at first and that was even slower (about 75 rows per second) and I read a blog about why not to do that.  So I moved the BLOB to a different drive and that about doubled the speed but at 130 rows per second, it takes about 3 hours to process 25,000.

Is there any way to speed this up?  Is there any way to use a memory location to read and write the BLOBs too?

Any suggestions would be much appreciated!

Thanks!!!
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Gary PattersonVP Technology / Senior Consultant

Commented:
You might want to try using a RAM disk for your temp storage location for this process.  You allocate some memory to emulate a disk drive.  We have used this technique from time to time to improve the performance of certain types of processes like this.
VP Technology / Senior Consultant
Commented:
Lots of other approaches to reducing load time of a problem table or data element.  For example, we've set up triggers in source DB2 DB to asynchronously push data to SQL Server in near real time, to eliminate need to bulk load, or stage to staging tables in SQL server throughout the day then bulk load from staging tables to DW tables.
Jim YoumansSr Database Administrator

Author

Commented:
Thank you for your suggestions.  I was excited about the RAM disk until I spoke to our server team and they told me that setting up a RAMDISK can be possible on a member cluster, this configuration is not recommended or supported while the cluster service is up.  So that will not work.

As for real time replication from DB2 to staging tables on SQL server, that is something I will look into but I still need to do the initial load in less than a week.

I was hoping to use the DefaultBufferSize and make it large enough (I think 100 MB is the limit) that there would not be a need to spool to disk but from what I can tell, SSIS will always write BLOB data to disk because it cannot tell the size before hand.  Is that correct?

Any other suggestions?
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Gary PattersonVP Technology / Senior Consultant

Commented:
Hi Jim.  Lots of possible solutions, but nothing I can think of that doesn't require some development work.  In a cluster environment, ramdisk is definitely a problem, since it is instance specific.
Jim YoumansSr Database Administrator

Author

Commented:
I have pretty much given up on using SSIS for large tables with BLOB fields.  It runs OK for a bit then starts throwing warnings about locked buffers and slows down to like 300 rows per second.  When you have 55 million rows that is like 48 hours or more.  I had set the buffer size to 100 MB (limit I believe for SSIS) but still hits same issues.  It ends up writing 1,000s of small files to disk no matter what I do.  

Jim
Gary PattersonVP Technology / Senior Consultant

Commented:
Ultimately, the performance trick is to keep the blob in memory, without an intermediate trip.to disk.

I'm not an SSIS expert, so I'm going to go an alternate route.

I would write an application that retrieves the blob from db2 into a variable, and inserts or updates it directly into SQL server, without intermediate storage to disk.  Assuming there is enough memory available to your app, performance should improve significantly.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial