Link to home
Start Free TrialLog in
Avatar of richxyz
richxyzFlag for United States of America

asked on

SSIS 2008 import text and XML

Hello Experts,

I have log files in text file format that are formatted such that I don't know how to import them.  It is created this way by an application and I need ideas on how to import them into SQL 2008, and parse out all the data into columns.

The text file does not have any row or column delimiters - it is one long string with no line feeds / carriage returns.   Each record starts with a date time stamp (outside the XML) which I need to import, and then an XML string that I also need to import into the same record.

Attached is a sample of 2 records.

My question is:   what tools in SSIS would I need to import this?  If I have to write a VB program to parse the date/timestamp from the XML string for each record, how would I do this?  Would I need to create a new file that has row and column delimiters?  

Any ideas would be greatly appreciated!
20110131:140630.123:<?xml version="1.0" encoding="UTF-8"?><Message xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://info.lalala.com/messagepayload.xsd"><EventIdentification EventActionCode="U" EventDateTime="20110131140521" EventID="100_ChangePatientContext" EventOutcomeIndicator="0"></EventIdentification><UserIdentification RoleID="404" UserID="m1636"></UserIdentification><SourceIdentification SourceID="WP Portal" SourceType="3"></SourceIdentification><NetworkAccessPointIdentification NetworkAccessPointID="167.1.1.1" NetworkAccessPointTypeCode="1"></NetworkAccessPointIdentification><NetworkAccessPointIdentification NetworkAccessPointID="167.1.1.1 : null" NetworkAccessPointTypeCode="2"></NetworkAccessPointIdentification><ParticipantObjectIdentification ParticipantObjectDataLifeCycle="1" ParticipantObjectID="0000827576" ParticipantObjectITypeCode="1" ParticipantObjectName="DOE, JOHN" ParticipantObjectTypeCode="1"></ParticipantObjectIdentification><ParticipantObjectIdentification ParticipantObjectDataLifeCycle="1" ParticipantObjectID="1563890" ParticipantObjectITypeCode="2" ParticipantObjectName="DOE, JOHN" ParticipantObjectTypeCode="1"></ParticipantObjectIdentification><ParticipantObjectIdentification ParticipantObjectDataLifeCycle="1" ParticipantObjectID="1101600033" ParticipantObjectITypeCode="5" ParticipantObjectName="DOE, JOHN" ParticipantObjectTypeCode="1"></ParticipantObjectIdentification></Message>

Open in new window

Avatar of 8080_Diver
8080_Diver
Flag of United States of America image

For openers, you will probably need to remove the datetime stamp (20110131:140630.123: in your example) because the <?XML version ... information needs to be the first thing in the XML file for it to be valid.  Then, if you don't have access to the xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" link, you will probably want to remove that, too.

I was able to get the information to present as an XML file after taking those two corrective actions.

Now, on to the SSIS process, Are you using SSIS 2005 or SSIS 2008?  (It makes a bit of difference.)
Avatar of richxyz

ASKER

I'm using SSIS 2008 and SQL 2008 R2.  

Unfortunately these files are generated by another application and this is how they will always be created - and I have to deal with it!

So are you saying I will need to write a VB program to parse it out?    I'm curious if I could do this with a Script Task - and would I parse the date time stamp and the XML directly into SQL or would I have to put it into another temporary file before importing?
I would first try the Script Task.  I think you can provide an ConnectionManager that will let you access the flat file and then manipulate it using VB.Net or (my personal preference ;-) C#.  

The other option would be to create a VB.net or C# aplication that can be called from the SSIS package and have the application perform the manipulations.  The app could archive the original file and then create the preprocessed file either with a known name or in a known directory.  Once the app completes its processing, control would return to the SSIS psckage where the package would either process the single file or, if several files were processed and put in a specific folder, all of the files in the folder.  (If you go this route, I would suggest the use of a folder and embedding the datetime stamp in the filename so that you can retrieve it if needed.)
ASKER CERTIFIED SOLUTION
Avatar of 8080_Diver
8080_Diver
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of richxyz

ASKER

I just realized that importing it as a text file using the ragged right option will work.  (I didn't see the line feeds when viewing the text files before.)   I can put the date time stamp into one column and the rest into an XML data type.  Thanks for your help, I'll give you the points now :)