Link to home
Start Free TrialLog in
Avatar of raphil
raphil

asked on

SSIS determining flat file delimiter by reading first line of file

I have a package which bulk loads data from a 40 field delimited flat file into Sql Server.  I am now running into a complication, where some of the files are comma delimited and some are tilda (~) delimted.  Some of the files include column names on the first line and some do not.
I need some way of determining these properties at run time.
What I would think would work is, at run time to read the first line and count the amount of tilda's found.  If the amount > x then set the file delimiter to tilda otherwise to comma.  I would also like to read the first word of the file to determine if there are column headers in the file.
I believe that this can be accomplished with some script.  Is this so?  If yes, how can this be done and do the scripts need to be written with VBS or can I use C#?
Is there a better and easier alternative to accomplishing this task?
Avatar of Anthony Perkins
Anthony Perkins
Flag of United States of America image

You need to split the file and import them separately.
Avatar of raphil
raphil

ASKER

They are in separate files.  However, some files are comma delimited and some are tilda delimited and some files have a header and some do not.
The only way for me to know is to open the file and look.
If you have the two delimiters in the same file and you want to automate this process than the best you can do is split the file in two, based on some reliable condition.  If you do not have a reliable condition than you up the proverbial creek without a paddle.
Avatar of raphil

ASKER

As stated above TWICE, some files are all comma delimited and some files are all Tilda delmited!
The only way for me to know which are comma delimited and which are tilda delimited is by opening the file and browsing the data.
I need a way (using a script task?) to open each file and read the first line so I can determine this.
Can you please tell me how to do this and if possible provide some script for accomplishing this.  I tried to add a script tag, but I don't know which function to place the code in, or how to read the lines of the file being loaded.  I also don't know how to specify in SSIS at runtime what type of delimiter to use to parse the file.
Any help in this area will be greatly appreciated.

Thank you
>>As stated above TWICE, some files are all comma delimited and some files are all Tilda delmited!<<
I understood that plainly, there was no need for the repetition.  It is obvious that I am not communicating.

I wish you the best of luck.
May use a script task to convert your delimiter in one that you choose, and next import data...
Regards,
Pedro
Avatar of raphil

ASKER

Thanks Pedro.
My question is more specific to details on how to do that.
I have actually played around a bit and was able to create an OnPoseExecute event handler with a script task to read the file.
I am now running into a brick wall, where I can not access the delimiter field for the flat file at runtime.  For some reason I can not add a reference to the microsoft.sqlserver.dtsruntimewrap.dll.  The only place I can find it on my machine is in the GAC, but when I add this using clause using Microsoft.SqlServer.Dts.Runtime.Wrapper; It is not recognized.  Please correct me if I am wrong, but I believe that I need to reference that assembly so that I can set the delimiter for the file programtically.
Do you know what I need to do?
Ok Raphil,
I made an example for you.
Update and customize to resolve your issue.

1. Open a SSIS Package in Visual Studio
2. Add the attached SSIS package (after rename ti to dtsx)
3. Update the uvFolder variable and paste their the textfiles attached for the example.
4. Execute the package and give feedback.

Helped?
Regards,
Pedro

www.pedrocgd.blogspot.com


SSIS-Interface.JPG
SSIS-ScriptCode.JPG
Package-EE90-dtsx.txt
test.txt
testUpdated.txt
Avatar of raphil

ASKER

Thanks Pedro.
While this solution may work, I can not change the file.  I guess I could technically make a copy of that file and replace the comma's.  However, I would prefer to just read the first line and determine the delimiter being used in the file and then simply set the ColumnDelimiter for the flat file so that SSIS will parse using that delimiter.  I have written the code for that (below).
The issue I am now having is that I can not add this using clause using RuntimeWrapper = Microsoft.SqlServer.Dts.Runtime.Wrapper to my script b/c for some reason I can not find a reference for the Runtime.wrapper.  I don't know why I can't find it.  I have seen a number of code samples online that all make reference to this file.  :(


ConnectionManager responseFile = Dts.Connections["FlatFile"];
            //RunttimeWrapper.ConnectionManagerFlatFileClass responseFile = Dts.Connections["EcInfo814Response"] as RunttimeWrapper.ConnectionManagerFlatFileClass;
            if (responseFile != null)
            {
                string theFile = responseFile.ConnectionString.ToString();
                if (theFile != string.Empty)
                {
                    FileStream fileStream = File.OpenRead(theFile);
                    StreamReader streamReader = new StreamReader(fileStream);
                    string firstLine = streamReader.ReadLine();
                    if (firstLine != null && firstLine.Trim() != string.Empty)
                    {
                        if (firstLine.Contains("~"))
                            Dts.Variables["varDelimiter"].Value = "~";
                        else
                            Dts.Variables["varDelimiter"].Value = ",";
                        //responseFile.ColumnNamesInFirstDataRow = true;
                        //responseFile.delimiter = Dts.Variables["varDelimiter"].Value.ToString();
                    }
                }
            }
Avatar of raphil

ASKER

BTW,
  I see this file in the GAC.
Assebly Name = Microsoft.SqlServer.DTSRuntimeWrap
Version = 10.0.0.0
Public Key Token = 89845dcd8080cc91
Processor Architecture = x86

I am running this on a 32 bit install.
Just a question... did you test the package I attached here? Run ok?
If true, I will update it and forget RunttimeWrapper.
regards,
Pedro
Avatar of raphil

ASKER

Yes Pedro I ran it and it worked.  Thanks for the sample.
However I can't change the commas to a ~ in a ~ delimited file, as there may be commas in the strings.

I need to be able to parse the files as is.  Do you know have any ideas why I can not access the Runtimewrapper.dll?
Avatar of raphil

ASKER

The truth is that I don't know if I necessarily need to access that assembly.  I just need to figure out how to set the file delimiter and ColumnNamesInFirstDataRow properties for the source file at runtime.
Maybe there is another way to do this.
1." However I can't change the commas to a ~ in a ~ delimited file, as there may be commas in the strings" why not? Only need to customize a little the VBCode

2.  "Do you know have any ideas why I can not access the Runtimewrapper.dll?", but why you insist on this if you are having so many problems?

Regards,
Pedro
You can know which files has "," or "~" and create 2 differents dataflows
OR
You can replace one file that has "~" by "," and has only one dataflow.
Avatar of raphil

ASKER

Changing the data is not an option for me.  I am getting this data from another Vendor and it needs to be parsed as is.
If I have this line of code which I need to get into the db, changing the , to a ~ will bread the script.

Pedro~very smart, and very quick

I don't know which field will have the comma and if it will at all.
Avatar of raphil

ASKER

Maybe I should use the 2 different data flows approach.
I just don't see why I would need to do that.  It just means more changes if anything needs to change.  There just needs to be a way to do this the correct way without making any bandaids.

:(
Avatar of raphil

ASKER

As you can see in the code I wrote above, I am able to determine the delimiter at runtime and save it in a variable.  The question is how can I use that variable at runtime to set the delimiter for the flat file (this delimiter is set in the BIDS GUI and I want to be able to set it at runtime.
If this is not possible then this is a major limitation of SSIS, but I can hardly believe that it is not possible, since I can technically created the whole package at runtime with out using the BIDS IDE, so I believe that there is a way for me to set this at runtime.
of... I can customize a little the package I made to read the dlimiter that exist on each file and store it in a variavle and I execute Dataflow1 (Accept ",") or Dataflow2 (Accept "~") based on that.

You need to add a ForEachLoop container to read each file.
A script task to check the first row in the file and get delimiter.
based on the delimiter and "Expression and Constrint" in the row that links between task and apply the delimiter condition that will decide the dataflow to execute
Avatar of raphil

ASKER

I already have a foreach loop which is looping through the files and storing them in the db.
I have added a script (in an eventhandler for the postexecute event) to read the first line and determine the delimiter and store it in a variable.
I just can't figure out how to set the columnDelimiter to use that variable.
ok... let me think...:-)
just a moment!
Avatar of raphil

ASKER

LOL
Avatar of raphil

ASKER

Something has to be wrong with my environment and I am not sure what it is.  Maybe I need to reinstall SSIS or maybe this dll is not available in the Standard version of SS and only in the enterprise.
I have read many articles online which make reference to this using clause.  But for some reason it is not working for me.  I can't make sense out of it.
just a moment...
You delimited your textfile with "~" ??!
Avatar of raphil

ASKER

Some files are "~" delimited and some files are ",".  I get these files from an outside source.  The only way for me to tell what the delimiter is, is by eyeballing the file.
ok..  the question is... Did you tried to manually condigure the flat file source with "~" delimiter?!
Avatar of raphil

ASKER

I don't understand your question?
ok... I know that your files could be delimited by "," or by "~", correct?

You want to dinamically use the some FlatFile Source for both textfiles sources, correct?

The question is:
"Didi you tried to configure manually the flatfile source with ~ delimiter? I cant...

regards!
Pedro
Avatar of raphil

ASKER

Oh, ok.  I didn't try that.  I will try that now.  One sec.
Avatar of raphil

ASKER

Yes I was able to set it manually.
You need to open the FlatFile click on the Columns tab and type the tilda character.  The combo box allows you to type eventhough it is not in the dropdown list.
Avatar of raphil

ASKER

There is a rowDelimiter property in the properties collection, but there is no column delimiter property which is what I am looking to set.
There is a columns property, but I can't figure out how to loop through the columns.
Avatar of raphil

ASKER

I see that I can set an expression for the RowDelimiter for the flat file in the IDE, but I don't see any place for setting an expression for the column delimiter.  :(
I'mn here again... I'll focus a little bit again...
just a moment
raphil,
I can configure the flatfile to get data using "~" delimiter. The problem is to set it using an expression, because you have the headerRowDelimiter, but not the columnDelimiter to change as expression. So, you have these options:
1. Change the delimiter of the file to only one delimiter (",")
2. Create two different dataflows to import data from files, one for delimiter "," and other for delimiter "~"
3. In spite of using FlatFile Source, use script component as source and use some code to split columns.

What you Choose?
I attached my last progress in the file... but because we cannot set the column delimiter dinamically, it ignores one delimiter. :-(

I can helçp you in all the 3 options you select.
Regards,
Pedro
www.pedrocgd.blogspot.com

Package-EE90b-dtsx.txt
test1.txt
test2.txt
Avatar of raphil

ASKER

Thanks Pedro.
Option 1 is not an option.  We are getting these files from outside vendors and they are not all willing to change to tilda delimited, which is our preference (especially considering that there are commas in some of the string values, which really screws everything up.)

Option2 will mean maintaining 2 separate Dataflows, which I woiuld really prefer not to do, as any time a change is made, it will require changing it in a number of places.

I don't understand what you are suggesting in option 3.  Can you expound a little.

I have written similar code to yours, however I can't either get the delimiter variable to work.  I also tried adding a script component to the dataflow and added the attached code snippet, but I can't get it to work.
ConnectionManagerFlatFileClass responseFile = Connections.ResponseFile as ConnectionManagerFlatFileClass;
        string theFile = Variables.varFileName.ToString();
        if (responseFile != null)
        {
            //theFile = responseFile.ConnectionString.ToString();
            if (theFile != string.Empty)
            {
                FileStream fileStream = File.OpenRead(theFile);
                StreamReader streamReader = new StreamReader(fileStream);
                string firstLine = streamReader.ReadLine();
                if (firstLine != null && firstLine.Trim() != string.Empty)
                {
                    if (firstLine.ToLower().IndexOf("ldc") == 0)  //this is header record
                        Variables.varColumnNamesInFirstDataRow = true;
                    else
                        Variables.varColumnNamesInFirstDataRow = false;
 
                    if (firstLine.Contains("~"))
                        Variables.varDelimiter = "~";
                    else
                        Variables.varDelimiter = ",";
 
                    responseFile.ColumnNamesInFirstDataRow = true;
                    foreach (IDTSConnectionManagerFlatFileColumn100 column in responseFile.Columns)
                        column.ColumnDelimiter = Variables.varDelimiter;
                }
            }
        }

Open in new window

ok.. I will follow option 3.
Confirm me this:
The schema is always teh same for all the files independently of the delimiter be "," or "~", correct?
Avatar of raphil

ASKER

yes, correct.  The only issue I will still face, is 1) some files have a column name header and some strings values have commas in them which causes a nightmare which I guess I will need to deal with separately.
ok... could you send me the schema?
Avatar of raphil

ASKER

We can use any simple schema.  Let's work with this.  Always have 3 string fields.

File 1
Test1,test2,test3
 
File 2
Test1~test2~test3
 
File 3
Column1~Column2~Column3
Test1~test2~test3,4,5
ok... If you prefer, I will use the test files created before...
just a moment
Avatar of raphil

ASKER

Pedro,
  I just figured out how to add a reference to the Microsoft.SqlServer.DTSRuntimeWrap.  This .dll is loaded into the GAC on my machine and I could not navigate to the .dll to add a reference to it.  However, when I created a Script Component, a reference to it was automatically loaded, so I copied the path to the dll for that reference, and I can not add a using clause to use the Runtime.wrapper namespace.

  So I can now add a Script Task to my Control Flow foreach file loop before calling the dataflow, with the below code.  The issue, I am now facing is that the responseFile is always null.  Any ideas how I can get a handle to the connection manager flat file so that I can set it's ColumnDelimiter?
RuntimeWrapper.ConnectionManagerFlatFileClass responseFile = Dts.Connections["ResponseFile"].InnerObject as RuntimeWrapper.ConnectionManagerFlatFileClass;
            if (responseFile != null)
            {
                string theFile = responseFile.ConnectionString.ToString();
                if (theFile != string.Empty)
                {
                    FileStream fileStream = File.OpenRead(theFile);
                    StreamReader streamReader = new StreamReader(fileStream);
                    string firstLine = streamReader.ReadLine();
                    if (firstLine != null && firstLine.Trim() != string.Empty)
                    {
                        if (firstLine.ToLower().IndexOf("ldc") == 0)  //this is header record
                            Dts.Variables["varColumnNamesInFirstDataRow"].Value = true;
                        else
                            Dts.Variables["varColumnNamesInFirstDataRow"].Value = false;
 
                        if (firstLine.Contains("~"))
                            Dts.Variables["varDelimiter"].Value = "~";
                        else
                            Dts.Variables["varDelimiter"].Value = ",";
 
                        responseFile.ColumnNamesInFirstDataRow = true;
                        foreach (RuntimeWrapper.IDTSConnectionManagerFlatFileColumn100 column in responseFile.Columns)
                            column.ColumnDelimiter = Dts.Variables["varDelimiter"].Value.ToString();
                    }
                }
            }

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of PedroCGD
PedroCGD
Flag of Portugal image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of raphil

ASKER

Great!  Looks nice.
I am not to familiar with VB, so to confirm what you did was?
1) In the Control flow add a Script Task to read the filename in a variable
  1a) read the first line and determine the delimiter and store in a variable
2) Add a Script Component to the DataFlow and open the file to read again, and for every line in the file
  2a) Add a new row to an in Memory Buffer
  2b) use the .Net split function to split the line read into an array using the delimiter in the variable set above
  2b) add the columns to the in memory row

Is this correct?  
Does this mean that SSIS will be trying to parse the file before creating the output rows?
Is it possible for you to write this using C#, so that I can get a better handle (if not I understand and I will work on decoding it into C#)?

Thanks Pedro!


It's exactly that!
I'm not familary with C# :-)
take a look on the web, there is a lot of pages that convert code from VB to C#
Cheers!
Pedro
Avatar of raphil

ASKER

Ok, thanks for all your help on this one.

Does this mean that SSIS will be trying to parse the file before creating the output rows?
SSIS will be creating the output row while read each row from source.
I think is the best solution for your case!
I'll be back in some hours.
If this resolve your pronlem, make the question as resolved!
Visit my blog! :-)
Regards,
Pedro
www.pedrocgd.blogspot.com
Avatar of raphil

ASKER

Ok, I am back.
It seems that I need to manually add all of the output columns to the Script Component.
Is there another option?  I need to do this for a large amount of columns, and I will need to create a few different packages and data flows for different file types which will share the same type of structure, but has a different schema, and is loaded into different tables.
Avatar of raphil

ASKER

I have actually just got what I was trying to do, to work.  I was so close, yet I was using one wrong type.
This is what I have done to accomplish what I need.  As is usually the case, once you know how to do this, it is very easy implement.
All I did was add a Script Task to the foreach flat file loop.
I added the below code which reads the first line of the file and determines the delimiter and as well determines if there is a column header record and sets 2 class variables.
It then loops through the columns and sets the delimiter for all of the columns accordingly.  Ensures that the last column on the row has a new line delimiter.
The rest of the data flow is normal.
The trick here for me was that the Microsoft.SqlServer.DTSRuntimeWrap assembly was installed in the GAC and only existed in some GAC directory, which can not be navigated to using the browse window in the add reference.  However, when I added a script componenet, I noticed that SSIS automatically added a reference to this library.  So I copied the path to the dll (which in my case is C:\WINDOWS\assembly\GAC_32\Microsoft.SqlServer.DTSRuntimeWrap\10.0.0.0__89845dcd8080cc91\Microsoft.SqlServer.DTSRuntimeWrap.dll) from the properties of that reference and pasted it into the browse windows text box for the Script Task add reference.  That added the reference to the assembly.
Once I had that reference I just needed to create an instance of the IDTSConnectionManagerFlatFile100 interface using the DTS.Connections innerObject property and casting it to IDTSConnectionManagerFlatFile100.  From there it was quite simple.  I was using the DTSConnectionManagerFlatFile100Class object instead, and that did not work.  With a little help form Ivan Peev from Cozyroc.com, I was able to figure out that this is what I needed to do.  Thanks Ivan!  Check out the SSIS+ Toolkit which they have made.  A number of great add ons.

The code is posted below for the Script Task.  And I have as posted 2 screenshots.  1 of the Control Flow and one of the Data Flow.

Thank you Pedro for all of your help on this and your great response time.  I will be checking your blogsite.
....
 
using Microsoft.SqlServer.Dts;
using RuntimeWrapper = Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.IO;
 
....
 
private string delimiter = ",";
        private bool ColumnNamesInFirstDataRow = false;
 
        private void SetVariables(string file)
        {
            if (file != string.Empty)
            {
                FileStream fileStream = File.OpenRead(file);
                StreamReader streamReader = new StreamReader(fileStream);
                string firstLine = streamReader.ReadLine();
                if (firstLine != null && firstLine.Trim() != string.Empty)
                {
                    if (firstLine.ToLower().IndexOf("thisisacolumnheader") == 0)  //this is header record
                        //Dts.Variables["varColumnNamesInFirstDataRow"].Value = true;
                        ColumnNamesInFirstDataRow = true;
                    else
                        //Dts.Variables["varColumnNamesInFirstDataRow"].Value = false;
                        ColumnNamesInFirstDataRow = false;
                    
                    if (firstLine.Contains("~"))
                        //Dts.Variables["varDelimiter"].Value = "~";
                        delimiter = "~";
                    else
                        //Dts.Variables["varDelimiter"].Value = ",";
                        delimiter = ",";
                }
            }
        }
 
        public void Main()
        {
            try
            {
                RuntimeWrapper.IDTSConnectionManagerFlatFile100 responseFile = Dts.Connections["ResponseFile"].InnerObject as RuntimeWrapper.IDTSConnectionManagerFlatFile100;
                string theFile = Dts.Variables["varFileName"].Value.ToString();
                SetVariables(theFile);
                if (responseFile != null)
                {
                    responseFile.ColumnNamesInFirstDataRow = ColumnNamesInFirstDataRow;
                    int totalColumns = responseFile.Columns.Count;
                    foreach (RuntimeWrapper.IDTSConnectionManagerFlatFileColumn100 column in responseFile.Columns)
                        if (column == responseFile.Columns[totalColumns - 1])
                            column.ColumnDelimiter = "\r\n";
                        else
                            column.ColumnDelimiter = delimiter;
 
                    Dts.TaskResult = (int)ScriptResults.Success;
                }
                else
                    Dts.TaskResult = (int)ScriptResults.Failure;
            }
            catch
            {
                Dts.TaskResult = (int)ScriptResults.Failure;
            }
        }

Open in new window

ControlFlow.bmp
DataFlow.bmp
ok...uufff....:-)
Finished? ehehehe
Pedro
>>I added the below code which reads the first line of the file and determines the delimiter<<
That is exactly what I told you and how we do it in our shop.  I am sorry it took you so long.
Sorry raphil,
I made a hard work with you presenting several solutions to your question, and you close the question?! I think you should respect the people that are here for free, wasting some time helping people like you.
I 'm a little disapointed with your atitude, and is for that reason that experts are leaving the support for some people.
Regards,
Pedro
Avatar of raphil

ASKER

Thanks Pedro.  This is my first time really using Experts-Exchange and I did not realize that you needed to get the points.
I absolutely appreciate what you have done, and will award you all of the points for helping me.
I dont need points, I'm not here for that. But I like to support people resolving their issues and recognize the effort of experts that are here for free. Only that.
Regards,
Pedro
Avatar of raphil

ASKER

No a problem.  I do support you and really do appreciate ALL of your help.  I had mentioned that in my comment and apologize if it hadn't come off across that way.
Although I think that your solution would work it was not exactly what I was looking for or what I had asked in the initial post.  Either way, I don't think that I would have figured this all out without your involvement and quick response time and I do recognize how much time you have given of yourself to help me out. ALL FOR FREE!
Thank you very much and you can be sure, that I will be reading your blogs.  I have already went to your blogspot to check it out.  I will be back and I know that I will be looking for you to help me with further questions.  Is there a way that I can request that you help me with further ETL related questions?
Really Man.  I sincerely apologize.
OK raphil! Friends again! :-)
Be free to contact me for some issue.
Pedro
Avatar of raphil

ASKER

Much Better.

:)