Solved

DTS: Non-normalized Flat-file to Normalized SQL Table

Posted on 2002-05-18
7
427 Views
Last Modified: 2013-11-30
Hi.

I have a need to import a pipe-delimited flat-file into SQL, but the problem is that there are a varying number of fields in each row of the flat-file, although field 3 onwards are all the same type of data. For those who know it, my file is the output from the NTResKit ADDUSERS.EXE which enumerates NT Groups and their memberships. For those who don't know it, the file is pretty much this kind of format:

Group1|Description1|Member1|Member2|...|MemberN
Group2|Description2|Member1|Member2|...|MemberN

What I need to be able to do is to import each Group-Member combination as a separate row into SQL so I would then have:

Group1 Description1 Member1
Group1 Description1 Member2
Group1 Description1 Member3
Group1 Description1 MemberN
Group2 Description2 Member1
Group2 Description2 Member2

Or, if you want to get really fancy, then one table for the Groups and their Descriptions and another table for Members and link the two by some ID key would be fantastic!

And finally, for some extra points, if you could get it to repeat this script for ALL files in a directory (all same format) that really would be the Holy Grail!

But the main thing is at least importing ONE file. Any help much appreciated - especially welcomed as VBScript for DTS since I could then understand what's going on and adapt it next time this crops up!!

Thanks,
Calum.
0
Comment
Question by:Calum2
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
7 Comments
 
LVL 5

Expert Comment

by:spcmnspff
ID: 7029930
Okay I know this is a delayed response but I had allocate time to think about this one.

Setting up DTS to import the file with a pipe column delimiter and an LF row delimiter will give us the denormalized table with the total number of columns (N) and the maximun number of users in a given group (u) where
N = u + 2 In other words the total number of columns will be 2 (one for the name and one for the desription) plus the total number of users from the group that has the most users.  In the other groups that have less users, the remaining fields after the last user listed are null.

Like this:
Col1    Col2   Col3    Col4    Col5
Name1   Desc1  User1   User3  <Null>
Name2   Desc2  User2  <Null>  <Null>
Name3   Desc3  User1   User2   User3


Now if we import this table into a temp staging table in an SQL server database with an Auto-increment ID field we have this:

GroupID  Col1    Col2   Col3    Col4    Col5
1        Name1   Desc1  User1   User3  <Null>
2        Name2   Desc2  User2  <Null>  <Null>
3        Name3   Desc3  User1   User2   User3


Now we can write N - 2 queries to place the users in a table with the groupid and user name

Create Table UserTable
GroupID Int,
UserName VarChar(100)
Primary Key NonClustered (GroupID, UserName)

Insert Into UserTable
Select GroupID, Col3 From TempGroupTable Where Col3 Is Not Null

Insert Into UserTable
Select GroupID, Col4 From GroupTable Where Col4 Is Not Null

Insert Into UserTable
Select GroupID, Col5 From GroupTable Where Col5 Is Not Null

And a last query to create the Master GroupTable

Create Table GroupTable
GroupID Int,
Name VarChar(15),
Description VarChar(100),
Primary Key Clustered (GroupID)

Insert Group Table
Select GroupID, Name, Description
From TempGroupTable

Drop table TempGroupTable

Okay Stay tuned for the automation of this int a single DTS package . . .
0
 
LVL 5

Accepted Solution

by:
spcmnspff earned 200 total points
ID: 7029984
The trick is to find out N so you know how many iterations you need of the UserTable insert.  After you import the data into a temporary staging table this can be found by Counting the records in syscolumns.

Create Proc UserInsert As

Declare @N Int

Select @N = Count(*)
from syscolumns SC, sysobects SO
Where SC.ID = AO.ID and SO.Name = 'TempGroupTable'


/*The next challange is to iterate N - 3 insert queries:
I would do this using dynamic SQL.  If we name thecolumns in our staging table col1, col2, col2, coln . . .
*/


Decare @i int, @SQL VarChar(500)
Set @i = 3

While @i <= @N
Begin
    Set @SQL = 'Insert Into UserTable Select GroupID, Col'
    + Cast(@i as VarChar(10)) +
    ' From TempGroupTable Where Col'
    + Cast(@i as VarChar(10)) +
    ' Is Not Null'

    Exec (@SQL)
    Set @i = @i + 1
End


So here's the summary for the DTS package:
1.) Import file into a staging table that has an autoid
2.) Perform an executeSQL task to Truncate the user table and Call the insert user sp  
4.) Perform a second executeSQL task to truncate the group table and insert the new data.
5.) Truncate the staging table.

It's going to be a little tricky but I think I've laid out an adequate plan here . . . let me know how it works.  Have Fun!  
0
 
LVL 5

Expert Comment

by:spcmnspff
ID: 7029998
A couple of minor changes . . . In my example Set '@i=3' should be set '@i=2' And 'While @i<=@N' should be 'While @i<=@N-1'.  Thanks =)

0
Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

 
LVL 5

Expert Comment

by:spcmnspff
ID: 7031072
One more typo . . . to find @N:

Select @N = Count(*)
from syscolumns SC, sysobects SO
Where SC.ID = AO.ID and SO.Name = 'TempGroupTable'

Should read:

Select @N = Count(*)
from syscolumns SC, sysobects SO
Where SC.ID = SO.ID and SO.Name = 'TempGroupTable'


Oh well I'm sure you get the idea! =)
0
 
LVL 5

Expert Comment

by:spcmnspff
ID: 7068987
Were you able to try this?
0
 

Author Comment

by:Calum2
ID: 7074626
Sorry - have been off on a completely IT-free vacation!

I ended up looping through my files with a VBScript, parsing out the data and loading it up into SQL using ADO - not fast, not cool, but worked.

However, I would like to award you the points for taking the trouble to present this suggested solution - very much appreciate your time and hard work.

Thanks.
0
 
LVL 5

Expert Comment

by:spcmnspff
ID: 7075292
Thanks, I apperciate it.  That did present an interestin challenge. =)
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Did you know SD-WANs can improve network connectivity? Check out this webinar to learn how an SD-WAN simplified, one-click tool can help you migrate and manage data in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
I have a large data set and a SSIS package. How can I load this file in multi threading?
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question