Link to home
Start Free TrialLog in
Avatar of dabdowb
dabdowb

asked on

Import a fixed width file into SQL Server 2005

I am attempting to import a fixed width text file into SQL Server 2005.  I have had my fair share of challenges with such an easy task even on delimited files due to issues with converting different types of data between source and destination.

If you can't tell, I am new to SQL Server in general, so everything seems challenging, even most basic things.  I managed to finally get pipe delimited files into SQL using the BULK INSERT command, but this file is not delimited, but very sizable (about 3.5GB in size and roughly 600 characters wide at some points) so looking for a solution that will get the data in quickly.

I tried the import wizard, and really have interest right now in only about the first 100 characters of the file, but when I setup the mapping to specifically grab so many characters into this field, and so many into this field, etc., and then got to a point where the rest of the file could all just come down as one big mess, it didn't work right.

In fact, if I looked at the preview, instead of the data showing horizontal, it seemed to switch stuff to a vertical view instead.  I could expand, shrnk the column lengths and then more data would start showing horizontal again, but never got it all to sink in correctly.

Any thoughts on how to get this file in?  I figure SSIS would be a great route, but not familiar with it at all at this point.  I have attached a small sample of the file below so the basic structure can be seen.  Just to save alot of questions, you will notice that most items have two lines, and about 43 characters in or so, you will see one row has the letter 'A' and the next one for the same material, has the letter 'I'.  I do not care about pulling in the 'I' lines at this point, only the 'A' lines, but I understand that I need to bring them all in.  As such, I worked as though only the 'A' line existed, knowing that that one field, of one character length, I could simply go in and delete the lines with 'I' in them after the fact.

Geez, I better shut up now.  Let me know of more questions.

Thanks,
Matt
FixedWidthTest.txt
Avatar of Banthor
Banthor
Flag of United States of America image

Your data file is not exaclty fixed width, look for Ragged Right
I have been doing this kind of stuff for almost 30 years. I still use the wizard to make the base import. and then edit in SSIS if needed
I always save the end package in file format rather than in SQL server.
 
Avatar of dabdowb
dabdowb

ASKER

True, not exactly fixed width since one line has one set of data for the product number and then the next line has a different set of data for the same product.  However, it is all in a set specific format to load into our master software system.
The only line I am interested in dissecting is the 'A' line, and only about the first 50 - 100 characters...can something be setup to grab those and leave the remainder as whatever it is until end of line?
I can't get the wizard to come close to what I need at all, so at a loss there, and don't really know a thing about SSIS yet.  Just started using this software.  
Avatar of dabdowb

ASKER

In fact, I can tell you the breakdown of the first few parts of the line I need.  I will get this posted tomorrow am.

Thanks,
Matt
You have to declare a column for all data. You can delete the leftovers afterword.
Avatar of dabdowb

ASKER

Sorry for the delayed response.  I tried to use the wizard again, this time, choosing ragged right, instead of fixed width, and I thought I was going to be successful.  Problem for me is even in trying the wizard, it doesn't seem to bring the data in correctly.  I get one row of data where the column 3, has the letter 'A' in it.  This is a key field for me, as the rows I want to ignore have the letter 'I' in that field.  Out of roughly 4.5 million records, half of them should be with the letter 'A' and the other half the letter 'I'.  It is almost like after the very first row, which had the 'A' in it, the column somehow get off.  Maybe because of the whole ragged right thing or how I setup the columns?  I must be doing something wrong, but can't figure out where?
It is one of the reasons why I wanted to try fixed width, and just tell it to grab the first few columns...which should be something like this (all string or nvarchar):
Column 0 - 6
Column 1 - 18
Column 2 - 18
Column 3 - 1
The rest of the data on any of those lines I don't care about at this point....maybe some other time, but I really need this part to work first.  I have tried to do the import the way above, and it keeps giving me error messages on the remainder of the column data.  Won't even import.
Thoughts?
Avatar of dabdowb

ASKER

sorry I didn't specify clearer.
Column 0 should be 6 characters long, Column 1 should be 18 characters long, etc.
 
Thanks,
ASKER CERTIFIED SOLUTION
Avatar of Banthor
Banthor
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of dabdowb

ASKER

Varchar(max) sounds like a great solution for the Column 4, but it is not available that I see when using the import wizard.  Do I need to setup the table first and that would somehow help?  I know I did that when I would import delimited files and use the BULK INSERT command, but I don't know how to use the BULK INSERT command with fixed width, or ragged right options, only delimited (which maybe that is the only way).
I understand what you are saying conceptually, and have been trying to do that for weeks now with no success, hence my post.  If I could successfully get all the data to come in, I have no issue at all cleaning it up afterward...I just can't get the data in correctly.
I go to the Import Wizard, I choose flat file for my source, I point to the file.  Unicode box is unchecked, Code Page shows 1252 (ANSI - Latin 1), Format I show Ragged Right, Text Qualifier set to None, Header Row Delimiter shows {CR}{LF}, Header Rows to Skip shows Zero, Column Names in the first row data row is unchecked.
The next screen, I choose my lines (field seperators) as shown above in previous post, and Row Delimiter is set to {LF}.  I then go to Advanced, to setup the data types.  Columns 0 through 3 are all set correct field lengths, data type is string [DT_STR] and text qualified shows as False.  All three of those columns show they column type as Fixed Width (which is greyed out).  
Then comes the difficulty, Column 4, that should just have all the rest of the data for that line until it sees the end of line.  It shows a column type of Delimited (which too is greyed out), Column Delimiter is set to {LF}, Data Type is set to string [DT_STR], OutputColumnWidth defaults to 50, but I set it to 500...have even tried 1000, and TextQualified is set to True (have tried False with no success either).
Now I check the Preview, and it looks great!!  Exactly how I want the end result to be, but if I follow through the rest of the screens with the default options set, it either does not work at all, or if it does actually build the table, only one row shows up correctly, the first row, with all the remaining rows imported incorrectly.  Thus, I cannot clean the data because I don't have all the data I need to do so.
It is maddening how something SO simple, will not cooperate.  Have you tried to do the import on the sample text I provided?  Did you have Column 3 showing A, then I, then A then I then A then I, for each row of data in the end table?  If so, what did I miss in my above steps??
Thanks
Avatar of dabdowb

ASKER

One thing I did notice, that I thought odd, when you go to select the column seperators, the screen you see with a few lines of data to help you choose where to put the seperators, was 32005 column lengths wide, even though the data only goes out about 600 column lengths.  One of the error messages I get on column 4 frequently is that it thinks it will lose some data somehow.  
varchar(max) should be available in SQL 2005 Standard and above.
I may have access to a SQL 2005 machine Thursday to check on the column widths
I am running 2008 here in the lab
Avatar of dabdowb

ASKER

Any luck with the 2005 version?
Thanks,
Matt
Avatar of dabdowb

ASKER

Update:
I decided to delve back in and try this again today.  That is when I realized that the table I was trying to insert to was not setup to receive the same number of characters I was trying to load...hence the truncation errors.  Also, saw the varchar(MAX) option on the table in SQL...I thought this was supposed to be an option on the Import Wizard, hence my confusion.
Anyway, I made the changes, and success!
Thanks for the help....and patience with this newbie!
Cheers,
Matt