SSIS Import from flat-file into multiple tables

Posted on 2007-10-16
Last Modified: 2013-11-30
I have a massive text file with consultant data for my company that I have to synchronize with records in SQL 2005.  The text file uses ~ as the column delimiter and carriage returns for the row delimiter.  There are no header columns in the text file.  This file has around 180,000 records in it, but I only need about 40,000 of them that meet certain criteria for import.  I'll share all of the fields in the flat file if completely necessary, but in the meantime I'll tell you that it basically has all pertinent info for a person's billing and shipping information (Name, Address, City, State, Zip, Phone, etc.) as well as some internal information such as an ID number, signup dates, and the like.

I only need to pull the records from the flat-file that have all of the following things supplied:  Internal ID, First Name, Last Name, Phone, and Email.  The address information is only partly necessary in the overall scheme of things.

The two tables in the database that I need to import into are a Customer table which holds the necessary info above, and an address table that will hold both of the addresses for that record assuming they are present, but in two separate records.  One record will be for shipping, and the other for billing.

So, here's what has to happen:

For each record that qualifies for import we check and see if a record exists for them in the Customer table.  The fields to match are [Column 1] from the flat file (remember, no headers), and [ConsultantID] in the Customer table.  If they exist, the record is updated with whatever information is in the flat file.  If they don't exist a new record is inserted.  We then do the same thing for the Address table.  The records in the Address table are attached to the Customer table by a FK [CustomerID] which is the PK of the Customer table.

Now, what I've tried so far is to use SSIS to import the flat file into a table in the database and do the synch for just the user data only (no addresses) using a stored procedure.  The problem is that the stored procedure takes over half an hour to run which seems excessive considering the relatively small number of records.  When I throw in the address data as well, the whole process could very well take over an hour.  I don't know if I need to continue down my current path and just accept the fact that it's a long-running operation, or if the whole thing should be done in SSIS (hopefully speeding things up), which I would need your help with.  I'm pretty confident in my abilities with SQL Server overall, but SSIS is still relatively new to me beyond simple imports and exports.

So, let's get started.  Fire away.
Question by:AshleyBryant
    LVL 25

    Expert Comment

    A profiler trace can learn you something about the problems, wich step in your process takes more then you think is 'normal'
    General- always try to exclude unneeded items as fast as possible

    Large import:
    - see there is enough free space so this won't slow down, if transaction-log = full also enough free diskpace for the transactionlog-file
    - indexes especialy clustered indexes could slow down inserts

    - always avoid unneeded updates (update something to the value it already was having)
    - If there are many items that already and don't need to change try not to touch those, could be faster to bulk-import into a working-table and start from there.

    LVL 25

    Assisted Solution

    <For each record that qualifies >
    If this means you realy tread each record seperatly, you will access your database 40,000 times just to check and then 5000 inserts and 3000 updates on a table with constraints, indexes and foreign keys witch all take a litle processing time.
    The bulk-import to a working table without all that can be followed by 2 statements
    - update .... the changed records
    - insert .... the new one
    LVL 12

    Author Comment

    Based on suggestions so far, here's what I've done:

    SSIS pulls the flat file into a holding table.

    SP does the following:
    -  Deletes from the holding table any records we don't want
    -  Using a CURSOR (See my note below) I loop through all records remaining in the holding table
    -  I pull the mandatory contact info into a handful of variables
    -  If there is a match on the ID records mentioned in the original post, I then check to see if the remaining data that needs to be synched matches up or not.
    -  If the data doesn't match for the existing records, the record is updated in the holding table.
    -  If the record is new in the holding table, then it is inserted.

    The only things I changed from what I originally was doing in my stored procedure are the deletion of the records that don't apply, and storing the data in variables before my checks for the record's existence.  Before I was only loading the data when it was needed, but I realized that it was going to be needed no matter what so I changed the execution.

    * CURSOR - I used the cursor for my loop because it was running faster than my original method, using a WHILE loop.  The procedure is going to be run in the wee hours of the morning, so I wasn't as concerned about the overhead generated by the CURSOR as I normally would be.
    LVL 12

    Author Comment

    Oh, with my changes above, the total execution time for the current items was cut from about 45 minutes to 17.
    LVL 30

    Expert Comment

    You can use the conditional split task to drop the records that you're not interested in.

    The jury is out on whether or not its faster than using a stored proc, but its worth a try.

    If you want to go the SP route, then follow jogos advice and don't ise a loop. Use a bulk update/insert.
    LVL 25

    Accepted Solution

    Already got rid of 2/3 of executiontime, fine but not good enough :)
    < run in the wee hours of the morning, so I wasn't as concerned about the overhead >
    many have started with that attitude and after 2 years  there are much more records larger database and many other process like that and the night fills up

    <  Deletes from the holding table any records we don't want>
    It should be better to if you could even prevent the upload
    <CURSOR - I used the cursor for my loop because it was running faster than my original method, using a WHILE loop>
    cursor-loop still generates one or as much calls(+locks) to database as there are records, it's much more efficient if you could do that in one action for all records which will be treated  

    basicly  it comes down to expand the where-clause of the insert and update with
    - the where of your cursor
    - the select you use to decide wether that one record should be inserted of updated (insert ... where not exists (....) )

    LVL 12

    Author Comment

    Good stuff guys.  I've still only tested synchronizing the base contact data, but with these latest results I can now work on synching up the addresses.  Here's what I'm doing now:

    DECLARE @updated_ids TABLE(id nvarchar(50))

    UPDATE Customer
    SET FirstName = M.[Column 4], LastName = M.[Column 5], Phone = M.[Column 6], Email = M.[Column 17]
    OUTPUT inserted.consultantid INTO @updated_ids
    FROM Customer C, MEMDAT M
    WHERE C.ConsultantID = M.[Column 1]

    INSERT INTO Customer (SaltKey, FirstName, LastName, Phone, Email, ConsultantID)
    SELECT 0, [Column 4], [Column 5], [Column 6], [Column 17], [Column 1]
    WHERE NOT EXISTS(SELECT id FROM @updated_ids U WHERE = M.[Column 1])

    38,800 rows updated
    213 rows inserted
    3 minutes, 7 seconds.

    The UPDATE breezes by in about a second, but the INSERTs still take a few minutes.  The good news is that there are no more loops being made with this format, and the number of INSERTs that need to be done on a nightly basis will be small.  I'll throw in the address updates and then let you all know how that goes.  Then we break out the points.
    LVL 8

    Expert Comment

    Have you tried using the Lookup option in the SSIS tool to update the table you want. You can easy match the columns from two tables and map the relative columns. It pretty cool stuff.

    Please let me know if this helps.

    LVL 25

    Expert Comment

    Spread the word : 'long cursors can kill your server'!  
    Can you immagine to tripple your input in the original 45 minute situation?

    But the same goes for smaller loops: each user calls 365 times the server for his callendar or it's compressed to 2 database-calls in a 3000-users environment.
    LVL 12

    Author Comment

    Alright.  I still need to do a few tweaks to the tables I think to make this run a bit faster, but I'm happy with what I've achieved so far.  Thanks to all that helped!

    Featured Post

    How to run any project with ease

    Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
    - Combine task lists, docs, spreadsheets, and chat in one
    - View and edit from mobile/offline
    - Cut down on emails

    Join & Write a Comment

    Nowadays, some of developer are too much worried about data. Who is using data, who is updating it etc. etc. Because, data is more costlier in term of money and information. So security of data is focusing concern in days. Lets' understand the Au…
    International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
    Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
    Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.

    746 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    14 Experts available now in Live!

    Get 1:1 Help Now