Do not use on any
shared computer
May 17, 2008 10:42am pdt
10.16.2007 at 07:54AM PDT, ID: 22896324
[x]
Attachment Details
[x]
The Solution Rating System

With so many solutions, how can you tell which solutions are most likely to help you and which ones are not? To provide you with a tool to use, we rate our solutions based on various elements that most accurately determine if a solution is a quality solution. To explain what factors affect the solution rating, here are the elements we take into consideration when formulating our solution rating.

  • The Grade of the Solution
  • The Zone Rank of the Expert Providing the Solution
  • The Number of Author and Expert Comments
  • The Number of Experts Contributing
  • The Feedback of the Community

Your Input Matters
Because of the way the system is set up, the most important variable in this equation is you. As a member of Experts Exchange, you are able to cast your vote on the quality of the solutions in regard to how complete, accurate, helpful and easy to understand each solution is. When you provide your feedback, each rating is adjusted accordingly. So, if you see a solution that has a poor rating that you think is a good solution, let us know by rating it. As you do, the rating will be adjusted and will become more accurate for other members of our site.

If you have any suggestions that you would like to make for our rating system, please ask a question in the Suggestions Zone of Community Support.

Thank you!

SSIS Import from flat-file into multiple tables
Tags: ssis, file, import, flat, multiple
I have a massive text file with consultant data for my company that I have to synchronize with records in SQL 2005.  The text file uses ~ as the column delimiter and carriage returns for the row delimiter.  There are no header columns in the text file.  This file has around 180,000 records in it, but I only need about 40,000 of them that meet certain criteria for import.  I'll share all of the fields in the flat file if completely necessary, but in the meantime I'll tell you that it basically has all pertinent info for a person's billing and shipping information (Name, Address, City, State, Zip, Phone, etc.) as well as some internal information such as an ID number, signup dates, and the like.

I only need to pull the records from the flat-file that have all of the following things supplied:  Internal ID, First Name, Last Name, Phone, and Email.  The address information is only partly necessary in the overall scheme of things.

The two tables in the database that I need to import into are a Customer table which holds the necessary info above, and an address table that will hold both of the addresses for that record assuming they are present, but in two separate records.  One record will be for shipping, and the other for billing.

So, here's what has to happen:

For each record that qualifies for import we check and see if a record exists for them in the Customer table.  The fields to match are [Column 1] from the flat file (remember, no headers), and [ConsultantID] in the Customer table.  If they exist, the record is updated with whatever information is in the flat file.  If they don't exist a new record is inserted.  We then do the same thing for the Address table.  The records in the Address table are attached to the Customer table by a FK [CustomerID] which is the PK of the Customer table.

Now, what I've tried so far is to use SSIS to import the flat file into a table in the database and do the synch for just the user data only (no addresses) using a stored procedure.  The problem is that the stored procedure takes over half an hour to run which seems excessive considering the relatively small number of records.  When I throw in the address data as well, the whole process could very well take over an hour.  I don't know if I need to continue down my current path and just accept the fact that it's a long-running operation, or if the whole thing should be done in SSIS (hopefully speeding things up), which I would need your help with.  I'm pretty confident in my abilities with SQL Server overall, but SSIS is still relatively new to me beyond simple imports and exports.

So, let's get started.  Fire away.
Start your free trial to view this solution
Question Stats
Zone: Microsoft
Question Asked By: AshleyBryant
Solution Provided By: jogos
Participating Experts: 3
Solution Grade: A
Views: 242
Translate:
Loading Advertisement...
10.16.2007 at 09:05AM PDT, ID: 20086698

All comments and solutions are available to Premium Service Members only.

Start your 7 day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
10.16.2007 at 09:11AM PDT, ID: 20086738

All comments and solutions are available to Premium Service Members only.

Start your 7 day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
10.16.2007 at 10:34AM PDT, ID: 20087397

All comments and solutions are available to Premium Service Members only.

Start your 7 day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
10.16.2007 at 10:35AM PDT, ID: 20087407

All comments and solutions are available to Premium Service Members only.

Start your 7 day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
10.16.2007 at 09:45PM PDT, ID: 20090876

Rank: Master

All comments and solutions are available to Premium Service Members only.

Start your 7 day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
10.17.2007 at 01:10AM PDT, ID: 20091536

All comments and solutions are available to Premium Service Members only.

Start your 7 day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
10.17.2007 at 07:51AM PDT, ID: 20093993

All comments and solutions are available to Premium Service Members only.

Start your 7 day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
10.17.2007 at 08:41AM PDT, ID: 20094486

All comments and solutions are available to Premium Service Members only.

Start your 7 day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
10.17.2007 at 08:54AM PDT, ID: 20094599

All comments and solutions are available to Premium Service Members only.

Start your 7 day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
10.17.2007 at 01:36PM PDT, ID: 20096852

All comments and solutions are available to Premium Service Members only.

Start your 7 day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
 
Loading Advertisement...
Microsoft
  • Internet Protocols
  • Applications
  • Development
  • OS
  • Hardware
  • Windows Security
Apple
  • Operating Systems
  • Hardware
  • Programming
  • Networking
  • Software
Internet
  • Search Engines
  • File Sharing
  • WebTrends / Stats
  • Spy / Ad Blockers
  • Web Browsers
  • New Net Users
  • Web Development
  • Chat / IM
  • Anti Spam
  • Web Servers
  • Anti-Virus
  • Email Clients
Gamers
  • Tips
  • Online / MMORPG
  • Puzzle
  • Emulators
  • Action / Adventure
  • Role Playing
  • Consoles
  • Game Programming
  • Strategy
  • Sports
  • Misc
  • Computer Games
Digital Living
  • Hardware
  • New Net Users
  • New Users
  • Software
  • Digital Music
  • Gaming World
  • Home Security
  • Apple
  • Networking Hardware
Virus & Spyware
  • Vulnerabilities
  • IDS
  • Encryption
  • Anti-Virus
  • Operating Systems Security
  • Software Firewalls
  • WebApplications
  • Cell Phones
  • Operating Systems
  • Internet
  • Hardware Firewalls
Hardware
  • Handhelds / PDAs
  • Displays / Monitors
  • Components
  • Networking Hardware
  • Peripherals
  • Laptops/Notebooks
  • Storage
  • Servers
  • Desktops
  • New Users
  • Misc
  • Apple
Software
  • System Utilities
  • Industry Specific
  • Network Management
  • Photos / Graphics
  • Page Layout
  • VMWare
  • Misc
  • Web Development
  • OS
  • CYGWIN
  • Voice Recognition
  • Message Queue
  • Quality Assurance
  • Security
  • Firewalls
  • MultiMedia Applications
  • Development
  • Database
  • Office / Productivity
  • Business Management
  • OS/2 Apps
  • Server Software
  • Internet / Email
ITPro
  • OS
  • Storage
  • Encryption
  • Operating Systems Security
  • Apple Hardware
  • Laptops & Notebooks
  • Servers
  • Networking Hardware
  • Peripherals
  • Devices
  • Displays / Monitors
  • WebTrends / Stats
  • Search Engines
  • Firewalls
  • WebApplications
  • IDS
  • Vulnerabilities
  • Email Clients
  • File Sharing
  • Spy / Ad Blockers
  • Web Browsers
  • Web Servers
  • Networking
  • Anti-Virus
  • Chat / IM
  • Anti Spam
Developer
  • Web Servers
  • Web Browsers
  • Game Programming
  • Dev Tools
  • Industry Specific
  • Office / Productivity
  • Database
  • CYGWIN
  • Web Development
  • Search Engines
  • File Sharing
  • WebTrends / Stats
  • Programming
  • Content Management
  • Application Servers
  • Protocols
Storage
  • Removable Backup Media
  • Storage Technology
  • Servers
  • Grid
  • Remote Access
  • Backup / Restore
  • Misc
  • Hard Drives
OS
  • Miscellaneous
  • Security
  • Development
  • Linux
  • VMWare
  • MainFrame OS
  • Unix
  • Apple
  • OS / 2
  • AS / 400
  • BeOS
  • Microsoft
  • VMS / OpenVMS
Database
  • Oracle
  • Miscellaneous
  • MySQL
  • Software
  • Sybase
  • Contact Management
  • PostgreSQL
  • Data Manipulation
  • Clarion
  • InterSystems Cache
  • Siebel
  • MUMPS
  • OLAP
  • SQLBase
  • SAS
  • GIS & GPS
  • 4GL
  • Berkeley DB
  • DB2
  • Informix
  • Interbase / Firebird
  • FoxPro
  • Reporting
  • LDAP
  • Filemaker Pro
  • MS SQL Server
  • dBase
  • MS Access
Security
  • Misc
  • Web Browsers
  • Software Firewalls
  • Operating Systems Security
  • File Sharing
  • Spy / Ad Blockers
  • Vulnerabilities
  • WebApplications
  • IDS
  • Anti-Virus
  • Encryption
  • Anti Spam
  • Email Clients
  • VPN
  • Chat / IM
Programming
  • Editors IDEs
  • Installation
  • Handhelds / PDAs
  • Multimedia Programming
  • System / Kernel
  • Algorithms
  • Game
  • Signal Processing
  • Project Management
  • Open Source
  • Database
  • Misc
  • Languages
  • Processor Platforms
  • Theory
Web Development
  • Scripting
  • Blogs
  • Web Servers
  • Software
  • Search Engines
  • Web Graphics
  • Images
  • Internet Marketing
  • Images and Photos
  • Components
  • Document Imaging
  • Web Languages/Standards
  • Illustration
  • WebApplications
  • Fonts
  • WebTrends / Stats
  • Authoring
  • Digital Camera Software
  • Miscellaneous
Networking
  • Protocols
  • Apple Networking
  • Network Management
  • Message Queue
  • Application Servers
  • Content Management
  • File Servers
  • Email Servers
  • Misc
  • Java Editors & IDEs
  • Wireless
  • Networking Hardware
  • Backup / Restore
  • System Utilities
  • ISPs & Hosting
  • Web Servers
  • Storage Technology
  • Removable Backup Media
  • Servers
  • Broadband
  • Grid
  • OS / 2
  • Novell Netware
  • Unix Networking
  • Windows Networking
  • Security
  • Telecommunications
  • Operating Systems
  • Linux Networking
Other
  • Community Advisor
  • Lounge
  • Community Support
  • New Net Users
  • Philosophy / Religion
  • Math / Science
  • Miscellaneous
  • URLs
  • Expert Lounge
  • Politics
  • Puzzles / Riddles
Community Support
  • Suggestions
  • New to EE
  • New Topics
  • Community Advisor
  • CleanUp
  • Announcements
  • General
  • Feedback
  • Input
  • EE Bugs
 
10.16.2007 at 09:05AM PDT, ID: 20086698
A profiler trace can learn you something about the problems, wich step in your process takes more then you think is 'normal'
General- always try to exclude unneeded items as fast as possible

Large import:
- see there is enough free space so this won't slow down, if transaction-log = full also enough free diskpace for the transactionlog-file
- indexes especialy clustered indexes could slow down inserts

Updates:
- always avoid unneeded updates (update something to the value it already was having)
- If there are many items that already and don't need to change try not to touch those, could be faster to bulk-import into a working-table and start from there.


 
10.16.2007 at 09:11AM PDT, ID: 20086738
<For each record that qualifies >
If this means you realy tread each record seperatly, you will access your database 40,000 times just to check and then 5000 inserts and 3000 updates on a table with constraints, indexes and foreign keys witch all take a litle processing time.
The bulk-import to a working table without all that can be followed by 2 statements
- update .... the changed records
- insert .... the new one
Assisted Solution
 
10.16.2007 at 10:34AM PDT, ID: 20087397
Based on suggestions so far, here's what I've done:

SSIS pulls the flat file into a holding table.

SP does the following:
-  Deletes from the holding table any records we don't want
-  Using a CURSOR (See my note below) I loop through all records remaining in the holding table
-  I pull the mandatory contact info into a handful of variables
-  If there is a match on the ID records mentioned in the original post, I then check to see if the remaining data that needs to be synched matches up or not.
-  If the data doesn't match for the existing records, the record is updated in the holding table.
-  If the record is new in the holding table, then it is inserted.

The only things I changed from what I originally was doing in my stored procedure are the deletion of the records that don't apply, and storing the data in variables before my checks for the record's existence.  Before I was only loading the data when it was needed, but I realized that it was going to be needed no matter what so I changed the execution.

* CURSOR - I used the cursor for my loop because it was running faster than my original method, using a WHILE loop.  The procedure is going to be run in the wee hours of the morning, so I wasn't as concerned about the overhead generated by the CURSOR as I normally would be.
 
10.16.2007 at 10:35AM PDT, ID: 20087407
Oh, with my changes above, the total execution time for the current items was cut from about 45 minutes to 17.
 
10.16.2007 at 09:45PM PDT, ID: 20090876

Rank: Master

You can use the conditional split task to drop the records that you're not interested in.

The jury is out on whether or not its faster than using a stored proc, but its worth a try.

If you want to go the SP route, then follow jogos advice and don't ise a loop. Use a bulk update/insert.
 
10.17.2007 at 01:10AM PDT, ID: 20091536
Already got rid of 2/3 of executiontime, fine but not good enough :)
< run in the wee hours of the morning, so I wasn't as concerned about the overhead >
many have started with that attitude and after 2 years  there are much more records larger database and many other process like that and the night fills up

<  Deletes from the holding table any records we don't want>
It should be better to if you could even prevent the upload
<CURSOR - I used the cursor for my loop because it was running faster than my original method, using a WHILE loop>
cursor-loop still generates one or as much calls(+locks) to database as there are records, it's much more efficient if you could do that in one action for all records which will be treated  

basicly  it comes down to expand the where-clause of the insert and update with
- the where of your cursor
- the select you use to decide wether that one record should be inserted of updated (insert ... where not exists (....) )
 




Accepted Solution
 
10.17.2007 at 07:51AM PDT, ID: 20093993
Good stuff guys.  I've still only tested synchronizing the base contact data, but with these latest results I can now work on synching up the addresses.  Here's what I'm doing now:

DECLARE @updated_ids TABLE(id nvarchar(50))

UPDATE Customer
SET FirstName = M.[Column 4], LastName = M.[Column 5], Phone = M.[Column 6], Email = M.[Column 17]
OUTPUT inserted.consultantid INTO @updated_ids
FROM Customer C, MEMDAT M
WHERE C.ConsultantID = M.[Column 1]


INSERT INTO Customer (SaltKey, FirstName, LastName, Phone, Email, ConsultantID)
SELECT 0, [Column 4], [Column 5], [Column 6], [Column 17], [Column 1]
FROM MEMDAT M
WHERE NOT EXISTS(SELECT id FROM @updated_ids U WHERE U.id = M.[Column 1])

38,800 rows updated
213 rows inserted
3 minutes, 7 seconds.

The UPDATE breezes by in about a second, but the INSERTs still take a few minutes.  The good news is that there are no more loops being made with this format, and the number of INSERTs that need to be done on a nightly basis will be small.  I'll throw in the address updates and then let you all know how that goes.  Then we break out the points.
 
10.17.2007 at 08:41AM PDT, ID: 20094486
Have you tried using the Lookup option in the SSIS tool to update the table you want. You can easy match the columns from two tables and map the relative columns. It pretty cool stuff.

Please let me know if this helps.

Aash
 
10.17.2007 at 08:54AM PDT, ID: 20094599
Spread the word : 'long cursors can kill your server'!  
Can you immagine to tripple your input in the original 45 minute situation?

But the same goes for smaller loops: each user calls 365 times the server for his callendar or it's compressed to 2 database-calls in a 3000-users environment.
 
10.17.2007 at 01:36PM PDT, ID: 20096852
Alright.  I still need to do a few tweaks to the tables I think to make this run a bit faster, but I'm happy with what I've achieved so far.  Thanks to all that helped!
 
 
20080206-EE-VQP-25 / EE_QW_2_20070628