Temp Table VS physical Temp Tables in same db which one will be better for 30 gb data transformation.

Posted on 2004-11-18
Last Modified: 2010-05-18
Hello Experts,

I need your expertise, there is huge 30-200GB flat file (text) which I need to upload every week into sql Server. bulk copy statement being used to bring this data into sql table for Temporary use and then take this take and transform it accord to db schema and put into right tables.

Currently : data is brought into physical temporary table in same database and then ETL operation is being done. becuase data is huge like 30-200GB and then that temporary table gets deleted once job is over. so its a two step process.

I have two questions.
1. should I use DTS instead of bulk copy import and then ETL. I am trying to build the package which can do it in single step my questions is. As I know DTS is slower then bulk copy. How much benifit I will get reducing the one step and other hand implementing the slower process (DTS) when data is huge 30-200GB per week?

2. temporary tables (offered by sql server using #) also get stored in tempdb. is it ok to use temporary tables in custom db and import the data and then delete those tables. which is better use mircosoft offerd temp tables or custom tables and delete them later.

is there any performace gain.

what else I can do to improve the import process performance. normally now its talking 5-8 hours.
Question by:ExpertPro
    LVL 6

    Accepted Solution

    1.  No benifit, plus DTS in 2000 is not forward compatible with DTS in 2005.

    2. As long as you declare all temp tables at the begining of the procedure the overhead is compairable.  To get better information you would have to write it both ways and look at the execution plan to see which was better.

    LVL 16

    Assisted Solution

    1. No, stick with a bulk copy. You can do this as a DTS step anyway, if you want a single DTS package managing the whole thing.

    2. If you are writing to the same database you are reading from, then you have both operations against the same disk files. Particularly if you can get the tempdb files and your own database files stored on different physical devices, then going for a table in tempdb should run quicker I would have thought. Apart from that, I don't know of any difference in performance between temporary and semi-permanent tables.

    What else can you do? How about getting your data provider to give you incremental updates rather than the whole lot every week? Can't really give you any more advice without knowing more about what you are doing.
    LVL 18

    Assisted Solution

    How many physical records are we talking here.  what is the "ETL operation " ?

    There is no performance gain in using temp tables.   Infact with such large files it is a disadvantage to use temp tables.   Why?   quite simply because undoubtedly you will the using a BATCHSIZE on the bulk insert,  so it imports say 10,000 rows and commits, then the next 10,000 and commits, etc.   Lets say your server goes down in the middle.    with the temptalbe option youve gotta start all over again.  

    with the working table option as you have, you can restart the bulk insert from the last row that was committed.  (semi recovery )
    LVL 32

    Assisted Solution

    (1) Quite honestly, it depends on how much data massage you need to perform on the incoming text file.  In almost all cases (as noted above) Bulk then ETL would be preferred, but some special cases where you choose to massage your data in the import step can save you time.  Normally, these will be steps where most of the bulk loaded data will be discarded, or the data will mainly update existing data in your files.  Without a detailed analysis, the recommendations above hold.

    (2) If tempdb is on a separate fast disk array, and if your import table is not on a separate disk array from the data tables you are importing into, then you may see performance increases by importing to #tmp tables.

    Personally, I would consider going to an external ETL application for this case, as the bulk load and then process is just an additional step.  I like the Data Integrator in the Pervasive Integration Suite myself, but there are others out there as well.
    LVL 18

    Expert Comment

    Do you need any more assistance with this question ?

    Featured Post

    Why You Should Analyze Threat Actor TTPs

    After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

    Join & Write a Comment

    Introduction In my previous article ( I showed you how the XML Source component can be used to load XML files into a SQL Server database, us…
    Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
    Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.
    Viewers will learn how the fundamental information of how to create a table.

    734 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    24 Experts available now in Live!

    Get 1:1 Help Now