Temp Table VS physical Temp Tables in same db which one will be better for 30 gb data transformation.

Hello Experts,

I need your expertise, there is huge 30-200GB flat file (text) which I need to upload every week into sql Server. bulk copy statement being used to bring this data into sql table for Temporary use and then take this take and transform it accord to db schema and put into right tables.



Currently : data is brought into physical temporary table in same database and then ETL operation is being done. becuase data is huge like 30-200GB and then that temporary table gets deleted once job is over. so its a two step process.

I have two questions.
1. should I use DTS instead of bulk copy import and then ETL. I am trying to build the package which can do it in single step my questions is. As I know DTS is slower then bulk copy. How much benifit I will get reducing the one step and other hand implementing the slower process (DTS) when data is huge 30-200GB per week?

2. temporary tables (offered by sql server using #) also get stored in tempdb. is it ok to use temporary tables in custom db and import the data and then delete those tables. which is better use mircosoft offerd temp tables or custom tables and delete them later.

is there any performace gain.

what else I can do to improve the import process performance. normally now its talking 5-8 hours.
LVL 1
ExpertProAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Duane LawrenceCommented:
1.  No benifit, plus DTS in 2000 is not forward compatible with DTS in 2005.

2. As long as you declare all temp tables at the begining of the procedure the overhead is compairable.  To get better information you would have to write it both ways and look at the execution plan to see which was better.

Duane
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
muzzy2003Commented:
1. No, stick with a bulk copy. You can do this as a DTS step anyway, if you want a single DTS package managing the whole thing.

2. If you are writing to the same database you are reading from, then you have both operations against the same disk files. Particularly if you can get the tempdb files and your own database files stored on different physical devices, then going for a table in tempdb should run quicker I would have thought. Apart from that, I don't know of any difference in performance between temporary and semi-permanent tables.

What else can you do? How about getting your data provider to give you incremental updates rather than the whole lot every week? Can't really give you any more advice without knowing more about what you are doing.
0
ShogunWadeCommented:
How many physical records are we talking here.  what is the "ETL operation " ?

There is no performance gain in using temp tables.   Infact with such large files it is a disadvantage to use temp tables.   Why?   quite simply because undoubtedly you will the using a BATCHSIZE on the bulk insert,  so it imports say 10,000 rows and commits, then the next 10,000 and commits, etc.   Lets say your server goes down in the middle.    with the temptalbe option youve gotta start all over again.  

with the working table option as you have, you can restart the bulk insert from the last row that was committed.  (semi recovery )
0
Brendt HessSenior DBACommented:
(1) Quite honestly, it depends on how much data massage you need to perform on the incoming text file.  In almost all cases (as noted above) Bulk then ETL would be preferred, but some special cases where you choose to massage your data in the import step can save you time.  Normally, these will be steps where most of the bulk loaded data will be discarded, or the data will mainly update existing data in your files.  Without a detailed analysis, the recommendations above hold.

(2) If tempdb is on a separate fast disk array, and if your import table is not on a separate disk array from the data tables you are importing into, then you may see performance increases by importing to #tmp tables.

Personally, I would consider going to an external ETL application for this case, as the bulk load and then process is just an additional step.  I like the Data Integrator in the Pervasive Integration Suite myself, but there are others out there as well.

http://www.pervasive.com/dataintegrator/index.asp
0
ShogunWadeCommented:
Do you need any more assistance with this question ?
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.