?
Solved

SSIS Importing Files delimited by a pipe - haveing problems

Posted on 2009-04-15
19
Medium Priority
?
1,257 Views
Last Modified: 2013-11-10
Hi, i am working on a project where i am getting a flat file that i need to import. I need to split the file rows on the delimited, in this instance a pipe |. The file looks like this

Header
ABCD= 12334
EFGH=some other data

START-OF-FILE
START ITEM|ITEM ABC
ABC|TIMESTAMP|VALUE1|VALUE2|VALUE3
ABC|TIMESTAMP|VALUE1|VALUE2|VALUE3|VALUE4|VALUE5|VALUE6|VALUE7
ABC|TIMESTAMP|VALUE1|VALUE2|VALUE3
ABC|TIMESTAMP|VALUE1|VALUE2|VALUE3|VALUE4|VALUE5|VALUE6|VALUE7
ABC|TIMESTAMP|VALUE1|VALUE2|VALUE3
ABC|TIMESTAMP|VALUE1|VALUE2|VALUE3
END ITEM|ITEM ABC
START ITEM|ITEM 123
123|TIMESTAMP|VALUE1|VALUE2|VALUE3|VALUE4|VALUE5|VALUE6|VALUE7
123|TIMESTAMP|VALUE1|VALUE2|VALUE3
123|TIMESTAMP|VALUE1|VALUE2|VALUE3|VALUE4|VALUE5|VALUE6|VALUE7
END ITEM|ITEM 123
END-OF-FILE

I only need to pull the values within the start item and end item records. So i started doing this by loading this via SSIS into a single-column table, then stripping out everything that wasnt data i wanted. Now i am trying to transform the data by splitting it, and sending it to its final table destination. Thats where i am stuck. Is there a SSIS or SQL  function that lets me split columnns by the pipe, so i can get to the data i want? Or should i be approaching this a totally different way?

One piece i forgot - the column count in every row isnt always the same. some rows have 8 fields while others have 11. Last bit - I have 3 million records to deal with, so looping thru each row is probably not an option.

Thanks
Mike
0
Comment
Question by:mikegrad7
  • 9
  • 9
19 Comments
 
LVL 12

Expert Comment

by:Einstine98
ID: 24153916
You could use a VBScript step and use one of the VB function to explode the line.  
You can write a select statement that uses substring and charindex

You could also use ParseName function as explainedin this article.
http://www.sqlteam.com/article/using-the-parsename-function-to-split-delimited-data
0
 
LVL 1

Author Comment

by:mikegrad7
ID: 24154323
Thanks Einstine, but Parsename appears to be limited to 4 delimited fields and i have up to 12. I already tried using script tasks but with 3 million lines in the test data and probably double that for production,  the performance was not acceptable. I have had to move to other resources outside of SSIS to accomplish this.
0
 
LVL 22

Expert Comment

by:PedroCGD
ID: 24155977
If you still need to do that inside SSIS, let me know!
Regards,
Pedro
www.pedrocgd.blogspot.com
0
Get quick recovery of individual SharePoint items

Free tool – Veeam Explorer for Microsoft SharePoint, enables fast, easy restores of SharePoint sites, documents, libraries and lists — all with no agents to manage and no additional licenses to buy.

 
LVL 1

Author Comment

by:mikegrad7
ID: 24167215
Pedro - i would like to hear your solution to keep this all inside SSIS - right now i had to manipulate the file outside in PERL and i would prefer to not use perl.
0
 
LVL 22

Expert Comment

by:PedroCGD
ID: 24167222
ok... re-open the question and attach an example file and tell which rows you want.
Regards,
Pedro
0
 
LVL 1

Author Comment

by:mikegrad7
ID: 24167293
i cant attach a file - its propritary information - the best i can do is the example above. the rows i need are everything between the START ITEM and END ITEM rows. It looks like we will be getting around 4-5 million rows every day, so it needs to be fast - this process cant take more than 10-20 seconds. PERL already does it in about 10 seconds so i need to stay around that for it to be viable.
0
 
LVL 22

Expert Comment

by:PedroCGD
ID: 24167320


Ok
In This:

Header
ABCD= 12334
EFGH=some other data

START-OF-FILE
START ITEM|ITEM ABC
ABC|TIMESTAMP|VALUE1|VALUE2|VALUE3
ABC|TIMESTAMP|VALUE1|VALUE2|VALUE3|VALUE4|VALUE5|VALUE6|VALUE7
ABC|TIMESTAMP|VALUE1|VALUE2|VALUE3
ABC|TIMESTAMP|VALUE1|VALUE2|VALUE3|VALUE4|VALUE5|VALUE6|VALUE7
ABC|TIMESTAMP|VALUE1|VALUE2|VALUE3
ABC|TIMESTAMP|VALUE1|VALUE2|VALUE3
END ITEM|ITEM ABC
START ITEM|ITEM 123
123|TIMESTAMP|VALUE1|VALUE2|VALUE3|VALUE4|VALUE5|VALUE6|VALUE7
123|TIMESTAMP|VALUE1|VALUE2|VALUE3
123|TIMESTAMP|VALUE1|VALUE2|VALUE3|VALUE4|VALUE5|VALUE6|VALUE7
END ITEM|ITEM 123
END-OF-FILE

You only want this:

ABC|TIMESTAMP|VALUE1|VALUE2|VALUE3
ABC|TIMESTAMP|VALUE1|VALUE2|VALUE3|VALUE4|VALUE5|VALUE6|VALUE7
ABC|TIMESTAMP|VALUE1|VALUE2|VALUE3
ABC|TIMESTAMP|VALUE1|VALUE2|VALUE3|VALUE4|VALUE5|VALUE6|VALUE7
ABC|TIMESTAMP|VALUE1|VALUE2|VALUE3
ABC|TIMESTAMP|VALUE1|VALUE2|VALUE3

123|TIMESTAMP|VALUE1|VALUE2|VALUE3|VALUE4|VALUE5|VALUE6|VALUE7
123|TIMESTAMP|VALUE1|VALUE2|VALUE3
123|TIMESTAMP|VALUE1|VALUE2|VALUE3|VALUE4|VALUE5|VALUE6|VALUE7


and considering when dont have values as empty:
VALUE4|VALUE5|VALUE6|VALUE7
correct?
0
 
LVL 1

Author Comment

by:mikegrad7
ID: 24167358
thats right Pedro - the big issue i was runnign into with SSIS was that i couldnt use a file connection correctly, because of the different number of columns in the rows. The file we get doesnt have empty columns so SSIS starts to run the columns toghether on me.  I need to import just that data you identified above to a table for further manipulation.

Thanks
Mike
0
 
LVL 22

Expert Comment

by:PedroCGD
ID: 24167371
ok... I will do it for you.
re-open the question.
0
 
LVL 1

Author Comment

by:mikegrad7
ID: 24167457
i cant find a way to re-open the question - it is still shwoing as open but pending deletion. I can still accept solutions at this point for each comment posted.
0
 
LVL 22

Expert Comment

by:PedroCGD
ID: 24167997
OK.. here you have.
Add the attached package to an existen SSIS solution.
Follow the images and try it!

Helped?
Regards,
Pedro
www.pedrocgd.blogspot.com
SSIS-Interface.JPG
SSIS-Interface2.JPG
SSIS-ConfigureScriptOutput.JPG
SSIS-ScriptCode.JPG
0
 
LVL 22

Accepted Solution

by:
PedroCGD earned 2000 total points
ID: 24168001
also the texfile I used:
sampleFile.txt
0
 
LVL 1

Author Comment

by:mikegrad7
ID: 24168740
Thanks Pedro, this looks promising. Ill put this together and check the performance.
0
 
LVL 22

Expert Comment

by:PedroCGD
ID: 24175075
ok... give feeback as soon as you can!
Pedro
0
 
LVL 1

Author Comment

by:mikegrad7
ID: 24186047
thanks for the solution. This works the way i need it to, and performance is where it needs to be too.
0
 
LVL 22

Expert Comment

by:PedroCGD
ID: 24186063
Do you have better performance with my solution or is worst but works? :-)
0
 
LVL 1

Author Comment

by:mikegrad7
ID: 24186088
worse than manipulating the file with PERL first, but only by maybe 10-15 seconds per 500,000 rows. Its within an acceptable range to give us the benefit to throw PERL out and stay within SSIS.
0
 
LVL 22

Expert Comment

by:PedroCGD
ID: 24186105
I can help you improving it...
One suggestion is to define the lenght of the columns properly... I defined all the columns as string with 20 lenght... if you reduze it, could improve a lot your performance!
Good!
pedro
www.pedrocgd.blogspot.com
0
 
LVL 1

Author Comment

by:mikegrad7
ID: 24186246
i actually have to use values between 25-75 for my data. i minimized every column but the performance increase was neglegible. Ill take the hit and keep all my columns wide to prevent against truncation errors. It looks like this works though, thanks for the solution.
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
This shares a stored procedure to retrieve permissions for a given user on the current database or across all databases on a server.
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question