[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now


SSIS Import from flat-file into multiple tables

Posted on 2007-10-16
Medium Priority
Last Modified: 2013-11-30
I have a massive text file with consultant data for my company that I have to synchronize with records in SQL 2005.  The text file uses ~ as the column delimiter and carriage returns for the row delimiter.  There are no header columns in the text file.  This file has around 180,000 records in it, but I only need about 40,000 of them that meet certain criteria for import.  I'll share all of the fields in the flat file if completely necessary, but in the meantime I'll tell you that it basically has all pertinent info for a person's billing and shipping information (Name, Address, City, State, Zip, Phone, etc.) as well as some internal information such as an ID number, signup dates, and the like.

I only need to pull the records from the flat-file that have all of the following things supplied:  Internal ID, First Name, Last Name, Phone, and Email.  The address information is only partly necessary in the overall scheme of things.

The two tables in the database that I need to import into are a Customer table which holds the necessary info above, and an address table that will hold both of the addresses for that record assuming they are present, but in two separate records.  One record will be for shipping, and the other for billing.

So, here's what has to happen:

For each record that qualifies for import we check and see if a record exists for them in the Customer table.  The fields to match are [Column 1] from the flat file (remember, no headers), and [ConsultantID] in the Customer table.  If they exist, the record is updated with whatever information is in the flat file.  If they don't exist a new record is inserted.  We then do the same thing for the Address table.  The records in the Address table are attached to the Customer table by a FK [CustomerID] which is the PK of the Customer table.

Now, what I've tried so far is to use SSIS to import the flat file into a table in the database and do the synch for just the user data only (no addresses) using a stored procedure.  The problem is that the stored procedure takes over half an hour to run which seems excessive considering the relatively small number of records.  When I throw in the address data as well, the whole process could very well take over an hour.  I don't know if I need to continue down my current path and just accept the fact that it's a long-running operation, or if the whole thing should be done in SSIS (hopefully speeding things up), which I would need your help with.  I'm pretty confident in my abilities with SQL Server overall, but SSIS is still relatively new to me beyond simple imports and exports.

So, let's get started.  Fire away.
Question by:Ashley Bryant
LVL 25

Expert Comment

ID: 20086698
A profiler trace can learn you something about the problems, wich step in your process takes more then you think is 'normal'
General- always try to exclude unneeded items as fast as possible

Large import:
- see there is enough free space so this won't slow down, if transaction-log = full also enough free diskpace for the transactionlog-file
- indexes especialy clustered indexes could slow down inserts

- always avoid unneeded updates (update something to the value it already was having)
- If there are many items that already and don't need to change try not to touch those, could be faster to bulk-import into a working-table and start from there.

LVL 25

Assisted Solution

jogos earned 2000 total points
ID: 20086738
<For each record that qualifies >
If this means you realy tread each record seperatly, you will access your database 40,000 times just to check and then 5000 inserts and 3000 updates on a table with constraints, indexes and foreign keys witch all take a litle processing time.
The bulk-import to a working table without all that can be followed by 2 statements
- update .... the changed records
- insert .... the new one
LVL 12

Author Comment

by:Ashley Bryant
ID: 20087397
Based on suggestions so far, here's what I've done:

SSIS pulls the flat file into a holding table.

SP does the following:
-  Deletes from the holding table any records we don't want
-  Using a CURSOR (See my note below) I loop through all records remaining in the holding table
-  I pull the mandatory contact info into a handful of variables
-  If there is a match on the ID records mentioned in the original post, I then check to see if the remaining data that needs to be synched matches up or not.
-  If the data doesn't match for the existing records, the record is updated in the holding table.
-  If the record is new in the holding table, then it is inserted.

The only things I changed from what I originally was doing in my stored procedure are the deletion of the records that don't apply, and storing the data in variables before my checks for the record's existence.  Before I was only loading the data when it was needed, but I realized that it was going to be needed no matter what so I changed the execution.

* CURSOR - I used the cursor for my loop because it was running faster than my original method, using a WHILE loop.  The procedure is going to be run in the wee hours of the morning, so I wasn't as concerned about the overhead generated by the CURSOR as I normally would be.
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

LVL 12

Author Comment

by:Ashley Bryant
ID: 20087407
Oh, with my changes above, the total execution time for the current items was cut from about 45 minutes to 17.
LVL 30

Expert Comment

ID: 20090876
You can use the conditional split task to drop the records that you're not interested in.

The jury is out on whether or not its faster than using a stored proc, but its worth a try.

If you want to go the SP route, then follow jogos advice and don't ise a loop. Use a bulk update/insert.
LVL 25

Accepted Solution

jogos earned 2000 total points
ID: 20091536
Already got rid of 2/3 of executiontime, fine but not good enough :)
< run in the wee hours of the morning, so I wasn't as concerned about the overhead >
many have started with that attitude and after 2 years  there are much more records larger database and many other process like that and the night fills up

<  Deletes from the holding table any records we don't want>
It should be better to if you could even prevent the upload
<CURSOR - I used the cursor for my loop because it was running faster than my original method, using a WHILE loop>
cursor-loop still generates one or as much calls(+locks) to database as there are records, it's much more efficient if you could do that in one action for all records which will be treated  

basicly  it comes down to expand the where-clause of the insert and update with
- the where of your cursor
- the select you use to decide wether that one record should be inserted of updated (insert ... where not exists (....) )

LVL 12

Author Comment

by:Ashley Bryant
ID: 20093993
Good stuff guys.  I've still only tested synchronizing the base contact data, but with these latest results I can now work on synching up the addresses.  Here's what I'm doing now:

DECLARE @updated_ids TABLE(id nvarchar(50))

UPDATE Customer
SET FirstName = M.[Column 4], LastName = M.[Column 5], Phone = M.[Column 6], Email = M.[Column 17]
OUTPUT inserted.consultantid INTO @updated_ids
WHERE C.ConsultantID = M.[Column 1]

INSERT INTO Customer (SaltKey, FirstName, LastName, Phone, Email, ConsultantID)
SELECT 0, [Column 4], [Column 5], [Column 6], [Column 17], [Column 1]
WHERE NOT EXISTS(SELECT id FROM @updated_ids U WHERE U.id = M.[Column 1])

38,800 rows updated
213 rows inserted
3 minutes, 7 seconds.

The UPDATE breezes by in about a second, but the INSERTs still take a few minutes.  The good news is that there are no more loops being made with this format, and the number of INSERTs that need to be done on a nightly basis will be small.  I'll throw in the address updates and then let you all know how that goes.  Then we break out the points.

Expert Comment

ID: 20094486
Have you tried using the Lookup option in the SSIS tool to update the table you want. You can easy match the columns from two tables and map the relative columns. It pretty cool stuff.

Please let me know if this helps.

LVL 25

Expert Comment

ID: 20094599
Spread the word : 'long cursors can kill your server'!  
Can you immagine to tripple your input in the original 45 minute situation?

But the same goes for smaller loops: each user calls 365 times the server for his callendar or it's compressed to 2 database-calls in a 3000-users environment.
LVL 12

Author Comment

by:Ashley Bryant
ID: 20096852
Alright.  I still need to do a few tweaks to the tables I think to make this run a bit faster, but I'm happy with what I've achieved so far.  Thanks to all that helped!

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Why is this different from all of the other step by step guides?  Because I make a living as a DBA and not as a writer and I lived through this experience. Defining the name: When I talk to people they say different names on this subject stuff l…
Ready to get certified? Check out some courses that help you prepare for third-party exams.
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.
Suggested Courses

873 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question