• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1129
  • Last Modified:

Converting flat file to SQL 2005 database

**UPDATE:  I don't know if this will help, but after doing some research I think this is what you would call a parent child or master detail flat file.  Here's a link to an article that seems to explain exactly what I need to do, but it's way over my head in terms of understanding exactly what to do.  http://www.sqlis.com/54.aspx

First, a couple of important bits of information. Until last week, I had never touched SISS, and therefore, I know very little about it. I just never had the need to use it...until now. I was able to convert my first 3 flat files to SQL2005 tables by right clicking on "SISS Package" and choosing "SISS Import and Export Wizard". That is the extent of my knowledge! So please, please, please be patient with me and be as descriptive as possible.  

I've attached to files to this question.  The first attachment is the flat file specifications and the second one is a sample single line flat file similar to what I'm dealing with (the real flat file is over 2 gigs).  

As you can see, each entry could have any number of records and multiples of some of the record types, with one exception, every entry must have a "01" record and can only have one "01" record.  Oh, and each record has a length of 400.

I need to get this information into a SQL 2005 database so I can create a front end for accessing the data.  Originally, I wanted one line for each account and have null values listed for entries that don't have a specific record.  Now that I've looked at the data again, that doesn't look like a good idea.  I think a better way to do it would be to create 5 different tables, one for each record type.  However, records 2 through 5 don't have anything I can make a primary key.  So here are my questions...

1.  Is it possible to make 5 tables from this one file, one table for each of the record types?
2.  If so, can I copy the Account number in record 01, position 13-19 in each of the subsequent record types (that way I could link the tables as needed)?
3.  Can this be done using the SISS Import and Export Wizard to create the package?  If not, I'm going to need some very basic step by step instructions on how to create the package.
4.  Is SISS the best way to do this conversion or is there another program that would be better to use?

Assuming this can be done, I really need someone to either walk me through the process or give me link to another site that will.  I'm even willing to accept a reference to a book I need to purchase with a VERY VERY good description.

I know this is a huge question and I appreciate the help of anyone who boldly decides to help me!  Thank you in advance and I welcome anyone's suggestions!

Flat-File-Specifications.txt
TestFlatFile.txt
0
elavelle
Asked:
elavelle
1 Solution
 
ZberteocCommented:
First thing first, I think it would be useful for you to get an introductory overview about SSIS, how it works and how is organized. You have a good 1h video here:

http://www.canitpro.ca/technet_monthlies/Intro_to_ssis.asx

It will help you understand easier what they are talking about in the article you sent in the link.

On the other hand I didn't really underestand how your flat file data is organized. Are the 01, 02,...,05 data types part of the same row or are they coming each on a sepparate row. I am asking because in the file you attached as an example they are all on the same row.
0
 
nmcdermaidCommented:
1.  Is it possible to make 5 tables from this one file, one table for each of the record types?

Yes And that is definitely the way I would do it.

2.  If so, can I copy the Account number in record 01, position 13-19 in each of the subsequent record types (that way I could link the tables as needed)?

Yes. You'd probbly use a variable to remember what the last account number was, and reuse that when writing the other record types

3.  Can this be done using the SISS Import and Export Wizard to create the package?  If not, I'm going to need some very basic step by step instructions on how to create the package.

No. You'll need some very complicated step by step instructions

4.  Is SISS the best way to do this conversion or is there another program that would be better to use?

In theory SSIS is definitely the appropriate tool. Its concievable that it may be easier to run some VBScript over your file beforehand to split it into 5 files, or import it into a single table and use T-SQL to split it out.

But it would be much more interesting and challenging to do it in SSIS. SSIS is an ETL tool which is built for importing data, particularly data like this.


The first challenge is finding a connection manager that understands that there is no row delimiter - every record is every 41 chars (is that also record tpe 5, is it 41 wide or 23 wide?)

You may need to write your own connection manager, or maybe load the entire row and then pivot it somehow.


.... interesting I will do some research and get back to you.


What system is providing you with this file?
0
 
nmcdermaidCommented:
OK, here is a start:

1. Create a new SSIS
2. Drag a data flow task on to the page and double click it. This skips yu over to the data flow page
3. Right click in connection managers (at the bottom) and create a new Flat File connection.

4. On the General tab:
  Browse to your file and select it
  Format = Fixed Width

5. On the Columns Tab:
Set Row Width to 41

6. Go to Preview and note that you can now see your records in as records on each line. Press OK

The next steps are:

1. Create a destination connection manager
2. Extract the record type using a 'Derived Column'
3. Split the data out to five tables using a 'Conditional Split'

Just have to work out how to remember the account number from record 1 and reuse it..... it shouldn't be too hard


I am limited for time but have a go at that and get back to me.
0
[Webinar] Improve your customer journey

A positive customer journey is important in attracting and retaining business. To improve this experience, you can use Google Maps APIs to increase checkout conversions, boost user engagement, and optimize order fulfillment. Learn how in this webinar presented by Dito.

 
elavelleAuthor Commented:
nmcdermaid, I really appreciate you taking on my question.  To answer the question from your first post, yes, all records are 41 characters long including record number 05.  You were also wondering what horrible program output this file, it's an antiquated credit and collections program from Ontario Systems called FACS.  We've terminated the contract with them and we need to put the data into a SQL database so I can write a simple front end to view it.

I've followed your first 6 instructions.  I have a Flat file connection manager that is 1 column fixed width at 41 characters wide.  

For your next step, you said I need to create a destination connection manager, but when I right click in the connection managers box I don't have that option.

As for steps number 2 and 3 (the derived column and conditional split) I definitely don't know where to start with that.

I look forward to you next set of instructions.  I know this is time consuming, so I want you to know how grateful I am for your help!  Thank you, thank you. thank you!
0
 
nmcdermaidCommented:
OK.

Just to double check, in your text file connection manager, when you go to 'preview', can you see all your data laid out down the page like this:

01Site1....
02NY1155 12th Street
03ELL.....

Baically we want to always see the record identifier as the two leftmost characters.


Anyway, to create the destination connection manager... forget I said destination, just create a connection manager for your destination (SQL Server). And the tpe for SQL Server is called...... OLE DB... logical huh?


Right click in 'Connection Managers' and press 'New OLE DB Connection'

Press the New button

Type in your SQL Server connection info and press Test Connection. Press OK on the test connection dialog
Press OK ono the Connection Manager dialog and you'll see a new entry in the 'Data Connections' list on the OLE DB Connection Manager screen
Press OK on this dialoog to get back to the SSIS editor


So now you should have two connection managers, which are simply definitions of connections to something.


We will now use these within the data flow


Make sure you're on the data flow tab. Drag on 'Flat File Source' from the toolbox. Double click it and ensure that the 'Flat File Connection Manager' has the same name as the one that you created earlier.

Press preview. Again you should see your source file with records going down the page. Press OK


Now, just as an illustration (and because I should be at work right now and I have to cut this short) we'll load that data into a table This is not what you want but we'll get there eventually.


In the toolbox, under data flow destinations, drag on a SQL Server Destination.
Click your flat file source, drag the green arrow (data flow) on to the SQL Server destination so it links up
Double click the SQL Server destination.
Select your OLE DB connection manager
Select a destination table (its should have just one column)
Click the Mappings tab and it will automatically map through
Press OK


You should be back at the SSIS designer screen and there should be no red X's anywhere.

Press Play and it should looad your data into a table.


I realise that this is not what you want, but after you get this part right, let me know and I will help you load it properly.








0
 
elavelleAuthor Commented:
Yep,  Everything worked as expected.  Between the intro to SSIS video that Zberteoc and the directions you've given me I feel much more comfortable creating basic packages and moving around in SISS.  I'm definitely ready to the next steps.  I have a feeling the next part of this will be far from easy or intuitive.

I know I've said thank you before, but I feel like I need to say it again....Thank you very much for your patient and understanding help!  
0
 
Anthony PerkinsCommented:
>>Thank you very much for your patient and understanding help!<<
The way you do that here is by accepting the solution(s) that best suited you and awarding points.
0
 
nmcdermaidCommented:
Well thats some of the original question covered anyway. I haven't answered the whole question yet, but we could be here a while cause I have some other pressing work commitments at the moment....

So for speeds sake I will dispense with detailed instructions and give you the general idea.

First you need to remove your excisting data flow, and instead feed it int a 'Derived Column' (which you need to drop on the page from the toolbox).

Use the derived column transformation to split the record identifier out from the rest of the record. Therefore your derived column transformation should have one data flow going into it (with one column) from the flat file source, and one data flow (with two columns) coming out of the dervived column. Column 1 is the record type, and the other column is the rest of the record.

Now you feed that into a a conditional split transformation. This allows you to spit out five data flows (one for each record type)

Each of those five outputs goes into five other 'Derived Column' transformations. These derived columns split the single field into a number of fields according to the spec you have provided.

Then you can feed those five into five tables.


You may have noticed that the big thing I'm missing here is that we need to capture the account from record type 01, and apply it to any subsequent record types..... I think. maybe you can clarify exactly what you want.


Anyway try getting that working and hopefully by then I (or someone else) will have worked out how to apply the account number.

We'll probably need to capture it in a variable and use it.... something like that.
0
 
elavelleAuthor Commented:
Wow, it works, at least everything we've done up to this point!  The only part I'm missing is the insertion of the account number into each row.  Obviously record type 01 already contains the account number so I only need to insert it into record types 2 through 5.  I know nmcdermaid is very busy, so if someone else has this answer I would really appreciate it.

Thanks again!
0
 
nmcdermaidCommented:
OK, I am some way along the path of doing this differently

1. Load your flat file into a ADO Recordset object
2. Use an iterator to iterate over the records and apply the account number as required.

I will post back when its completed.
0
 
nmcdermaidCommented:
So far I have discovered that either way its complicated to do the entire thing in SSIS.

So hopefully soon I will post a solution that involves simply importing the file in SSIS, then using T-SQL to clean it.

Sorry I keep promising but not delivering but I need to set it up in order to explain it.

I will try to post back within a day or two


0
 
nmcdermaidCommented:
1. Create a destination table in SQL with three columns:

RowNumber        Int
Data                   Varchar(41)
AccountNumber  Varchar(7)

For this purpose, call the table tblImportData

Against the RowNumber column, go to the Column properties window, open out "Identity Specification" and set IsIdentity=Yes

2. Now set up your IIS to import your file into this table.

Within the data flow, drop a SQL Server Destination. Drag your source file to it.
Map your source column to the 'data' column. This is the only column that we will be populating.


Now right click and execute your data flow it should do two things:

1. Populate the Data column with your data
2. Number each row automatically in the RowNumber column


Now open up management studio and run this:

SELECT RowNumber, SUBSTRING(Data,13,7)
FROM tblImportData
WHERE LEFT(Data,2) = '01'

This query shows each row with an account number, and the account number.

We also want to show the last row.... so run this:


SELECT
SUBSTRING(F.Data,13,7) As AccountNumber,
F.RowNumber As FirstAccountRow,
(
SELECT MIN(RowNumber) - 1 As LastAccountRow
FROM tblImportData
WHERE LEFT(Data,2) = '01'
AND RowNumber > F.RowNumber
)
FROM tblImportData F
WHERE LEFT(Data,2) = '01'

That gives us a range of rownumber, and the accounts that those row numbers should be. Now we just do a (complicated) update based on that:



UPDATE tblImportData
SET AccountNumber = U.AccountNumber
FROM
(
SELECT
SUBSTRING(F.Data,13,7) As AccountNumber,
F.RowNumber As FirstAccountRow,
(
SELECT MIN(RowNumber) - 1
FROM tblImportData
WHERE LEFT(Data,2) = '01'
AND RowNumber > F.RowNumber
) As LastAccountRow
FROM tblImportData F
WHERE LEFT(Data,2) = '01'
) U
INNER JOIN
tblImportData D
ON D.RowNumber BETWEEN U.FirstAccountRow AND (ISNULL(U.LastAccountRow,(SELECT MAX(RowNumber) FROM tblImportData)))




Now if you check the data with

SELECT *
FROM dbo.tblImportData

You'll see that each line has an account number assigned.


You paste that last update into an execute SQL task in your SSIS to get the whole thing running in SSIS. You will also want to run a "TRUNCATE tblImportData" beforehand to clear out the table.

There is probably a way to do that horrible update in SSIS with merge joins and stuff but I don't think it would be any simpler.



The rest is hopefully easy - you just split up the Data field as defined by your spec and farm it out to tables with the correct account number.


Post back and let me know if this thread is still of interest to you and I can help some more. There are amny optimisations we can do, such as splitting the account and record type out beforehand in the SSIS. If you are importing a lot of data we'll want to do some optimisations.
0

Featured Post

2018 Annual Membership Survey

Here at Experts Exchange, we strive to give members the best experience. Help us improve the site by taking this survey today! (Bonus: Be entered to win a great tech prize for participating!)

Tackle projects and never again get stuck behind a technical roadblock.
Join Now