DTS: Non-normalized Flat-file to Normalized SQL Table

Posted on 2002-05-18
Last Modified: 2013-11-30

I have a need to import a pipe-delimited flat-file into SQL, but the problem is that there are a varying number of fields in each row of the flat-file, although field 3 onwards are all the same type of data. For those who know it, my file is the output from the NTResKit ADDUSERS.EXE which enumerates NT Groups and their memberships. For those who don't know it, the file is pretty much this kind of format:


What I need to be able to do is to import each Group-Member combination as a separate row into SQL so I would then have:

Group1 Description1 Member1
Group1 Description1 Member2
Group1 Description1 Member3
Group1 Description1 MemberN
Group2 Description2 Member1
Group2 Description2 Member2

Or, if you want to get really fancy, then one table for the Groups and their Descriptions and another table for Members and link the two by some ID key would be fantastic!

And finally, for some extra points, if you could get it to repeat this script for ALL files in a directory (all same format) that really would be the Holy Grail!

But the main thing is at least importing ONE file. Any help much appreciated - especially welcomed as VBScript for DTS since I could then understand what's going on and adapt it next time this crops up!!

Question by:Calum2
  • 6

Expert Comment

ID: 7029930
Okay I know this is a delayed response but I had allocate time to think about this one.

Setting up DTS to import the file with a pipe column delimiter and an LF row delimiter will give us the denormalized table with the total number of columns (N) and the maximun number of users in a given group (u) where
N = u + 2 In other words the total number of columns will be 2 (one for the name and one for the desription) plus the total number of users from the group that has the most users.  In the other groups that have less users, the remaining fields after the last user listed are null.

Like this:
Col1    Col2   Col3    Col4    Col5
Name1   Desc1  User1   User3  <Null>
Name2   Desc2  User2  <Null>  <Null>
Name3   Desc3  User1   User2   User3

Now if we import this table into a temp staging table in an SQL server database with an Auto-increment ID field we have this:

GroupID  Col1    Col2   Col3    Col4    Col5
1        Name1   Desc1  User1   User3  <Null>
2        Name2   Desc2  User2  <Null>  <Null>
3        Name3   Desc3  User1   User2   User3

Now we can write N - 2 queries to place the users in a table with the groupid and user name

Create Table UserTable
GroupID Int,
UserName VarChar(100)
Primary Key NonClustered (GroupID, UserName)

Insert Into UserTable
Select GroupID, Col3 From TempGroupTable Where Col3 Is Not Null

Insert Into UserTable
Select GroupID, Col4 From GroupTable Where Col4 Is Not Null

Insert Into UserTable
Select GroupID, Col5 From GroupTable Where Col5 Is Not Null

And a last query to create the Master GroupTable

Create Table GroupTable
GroupID Int,
Name VarChar(15),
Description VarChar(100),
Primary Key Clustered (GroupID)

Insert Group Table
Select GroupID, Name, Description
From TempGroupTable

Drop table TempGroupTable

Okay Stay tuned for the automation of this int a single DTS package . . .

Accepted Solution

spcmnspff earned 200 total points
ID: 7029984
The trick is to find out N so you know how many iterations you need of the UserTable insert.  After you import the data into a temporary staging table this can be found by Counting the records in syscolumns.

Create Proc UserInsert As

Declare @N Int

Select @N = Count(*)
from syscolumns SC, sysobects SO
Where SC.ID = AO.ID and SO.Name = 'TempGroupTable'

/*The next challange is to iterate N - 3 insert queries:
I would do this using dynamic SQL.  If we name thecolumns in our staging table col1, col2, col2, coln . . .

Decare @i int, @SQL VarChar(500)
Set @i = 3

While @i <= @N
    Set @SQL = 'Insert Into UserTable Select GroupID, Col'
    + Cast(@i as VarChar(10)) +
    ' From TempGroupTable Where Col'
    + Cast(@i as VarChar(10)) +
    ' Is Not Null'

    Exec (@SQL)
    Set @i = @i + 1

So here's the summary for the DTS package:
1.) Import file into a staging table that has an autoid
2.) Perform an executeSQL task to Truncate the user table and Call the insert user sp  
4.) Perform a second executeSQL task to truncate the group table and insert the new data.
5.) Truncate the staging table.

It's going to be a little tricky but I think I've laid out an adequate plan here . . . let me know how it works.  Have Fun!  

Expert Comment

ID: 7029998
A couple of minor changes . . . In my example Set '@i=3' should be set '@i=2' And 'While @i<=@N' should be 'While @i<=@N-1'.  Thanks =)

Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.


Expert Comment

ID: 7031072
One more typo . . . to find @N:

Select @N = Count(*)
from syscolumns SC, sysobects SO
Where SC.ID = AO.ID and SO.Name = 'TempGroupTable'

Should read:

Select @N = Count(*)
from syscolumns SC, sysobects SO
Where SC.ID = SO.ID and SO.Name = 'TempGroupTable'

Oh well I'm sure you get the idea! =)

Expert Comment

ID: 7068987
Were you able to try this?

Author Comment

ID: 7074626
Sorry - have been off on a completely IT-free vacation!

I ended up looping through my files with a VBScript, parsing out the data and loading it up into SQL using ADO - not fast, not cool, but worked.

However, I would like to award you the points for taking the trouble to present this suggested solution - very much appreciate your time and hard work.


Expert Comment

ID: 7075292
Thanks, I apperciate it.  That did present an interestin challenge. =)

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Let's review the features of new SQL Server 2012 (Denali CTP3). It listed as below: PERCENT_RANK(): PERCENT_RANK() function will returns the percentage value of rank of the values among its group. PERCENT_RANK() function value always in be…
This article shows gives you an overview on SQL Server 2016 row level security. You will also get to know the usages of row-level-security and how it works
Via a live example combined with referencing Books Online, show some of the information that can be extracted from the Catalog Views in SQL Server.
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.

786 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question