Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium


Temp Table VS physical Temp Tables in same db which one will be better for 30 gb data transformation.

Posted on 2004-11-18
Medium Priority
Last Modified: 2010-05-18
Hello Experts,

I need your expertise, there is huge 30-200GB flat file (text) which I need to upload every week into sql Server. bulk copy statement being used to bring this data into sql table for Temporary use and then take this take and transform it accord to db schema and put into right tables.

Currently : data is brought into physical temporary table in same database and then ETL operation is being done. becuase data is huge like 30-200GB and then that temporary table gets deleted once job is over. so its a two step process.

I have two questions.
1. should I use DTS instead of bulk copy import and then ETL. I am trying to build the package which can do it in single step my questions is. As I know DTS is slower then bulk copy. How much benifit I will get reducing the one step and other hand implementing the slower process (DTS) when data is huge 30-200GB per week?

2. temporary tables (offered by sql server using #) also get stored in tempdb. is it ok to use temporary tables in custom db and import the data and then delete those tables. which is better use mircosoft offerd temp tables or custom tables and delete them later.

is there any performace gain.

what else I can do to improve the import process performance. normally now its talking 5-8 hours.
Question by:ExpertPro

Accepted Solution

Duane Lawrence earned 400 total points
ID: 12616274
1.  No benifit, plus DTS in 2000 is not forward compatible with DTS in 2005.

2. As long as you declare all temp tables at the begining of the procedure the overhead is compairable.  To get better information you would have to write it both ways and look at the execution plan to see which was better.

LVL 16

Assisted Solution

muzzy2003 earned 400 total points
ID: 12616326
1. No, stick with a bulk copy. You can do this as a DTS step anyway, if you want a single DTS package managing the whole thing.

2. If you are writing to the same database you are reading from, then you have both operations against the same disk files. Particularly if you can get the tempdb files and your own database files stored on different physical devices, then going for a table in tempdb should run quicker I would have thought. Apart from that, I don't know of any difference in performance between temporary and semi-permanent tables.

What else can you do? How about getting your data provider to give you incremental updates rather than the whole lot every week? Can't really give you any more advice without knowing more about what you are doing.
LVL 18

Assisted Solution

ShogunWade earned 400 total points
ID: 12616638
How many physical records are we talking here.  what is the "ETL operation " ?

There is no performance gain in using temp tables.   Infact with such large files it is a disadvantage to use temp tables.   Why?   quite simply because undoubtedly you will the using a BATCHSIZE on the bulk insert,  so it imports say 10,000 rows and commits, then the next 10,000 and commits, etc.   Lets say your server goes down in the middle.    with the temptalbe option youve gotta start all over again.  

with the working table option as you have, you can restart the bulk insert from the last row that was committed.  (semi recovery )
LVL 32

Assisted Solution

by:Brendt Hess
Brendt Hess earned 800 total points
ID: 12616678
(1) Quite honestly, it depends on how much data massage you need to perform on the incoming text file.  In almost all cases (as noted above) Bulk then ETL would be preferred, but some special cases where you choose to massage your data in the import step can save you time.  Normally, these will be steps where most of the bulk loaded data will be discarded, or the data will mainly update existing data in your files.  Without a detailed analysis, the recommendations above hold.

(2) If tempdb is on a separate fast disk array, and if your import table is not on a separate disk array from the data tables you are importing into, then you may see performance increases by importing to #tmp tables.

Personally, I would consider going to an external ETL application for this case, as the bulk load and then process is just an additional step.  I like the Data Integrator in the Pervasive Integration Suite myself, but there are others out there as well.

LVL 18

Expert Comment

ID: 12655288
Do you need any more assistance with this question ?

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently we ran in to an issue while running some SQL jobs where we were trying to process the cubes.  We got an error saying failure stating 'NT SERVICE\SQLSERVERAGENT does not have access to Analysis Services. So this is a way to automate that wit…
Ready to get certified? Check out some courses that help you prepare for third-party exams.
Via a live example, show how to shrink a transaction log file down to a reasonable size.
Viewers will learn how to use the UPDATE and DELETE statements to change or remove existing data from their tables. Make a table: Update a specific column given a specific row using the UPDATE statement: Remove a set of values using the DELETE s…
Suggested Courses

577 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question