We help IT Professionals succeed at work.

Problem Importing Flat File delimited with Commas using SQL Server Import Wizard

294 Views
Last Modified: 2017-03-14
I have a text file I'm trying to import that contains 14 million records, but it's delimited with commas and there is one field that also contains commas, and there is no text qualifer to denote that the field has multiple values separated by commas.  Does anyone know a way import with this scenario, or to correct delimited file for import?

The import is supposed to look like this:

RecordID   Status          Role           Category
-------------    --------    ------------------   -------------
12342          Active  CLT, ACT, DIS    Yellow
45544          Active  Act, CLT, DOS    Orange
45465          Pend    CLT, ACT, DES   Green

However, it actually imports like this.

RecordID   Status          Role           Category
-------------    --------    ------------------   -------------
12342          Active  CLT                   ACT
45544          Active  Act                    CLT
45465          Pend    CLT                  ACT
Comment
Watch Question

Nitin SontakkeDeveloper
CERTIFIED EXPERT

Commented:
Not sure how exactly you will achieve it but simple answer is to enclose the contents of Role in double quotes as "CLT, ACT, DIS".

Just as I was typing this, while it is hugely unlikely, does this column always has 3 values? If yes, it is simpler then.

Should you need any assistance putting those double quotes with the help of tiny C# program, please do let me know.
prodempseyBI Director
CERTIFIED EXPERT
Top Expert 2016

Author

Commented:
Hi Nitin,
The Role column doesn't always have 3 values.  Sometimes there is one two or no values.   I'm open to using C# to insert the double quotes entirely around the Role values, and I could definitely use some assistance as my C# skills are limited.
Nitin SontakkeDeveloper
CERTIFIED EXPERT

Commented:
To the best of my knowledge, putting double quotes around is the best (and simplest) way out.

Paste few lines (10-15 maybe, if more attach) here and will be happy to paste the code snippet here.
Nitin SontakkeDeveloper
CERTIFIED EXPERT

Commented:
While off the topic and despite of no knowledge of context, here is a piece of unasked advice, you are better off putting Role values in a separate table (may be after import!) from strict relational point of view.
prodempseyBI Director
CERTIFIED EXPERT
Top Expert 2016

Author

Commented:
Is there a way to skip importing of the Role field?  I believe I tried this, but although most values were null, it still read the values with commas as a new column.
Nitin SontakkeDeveloper
CERTIFIED EXPERT

Commented:
The trouble is other way around. The import code is not been able to understand WHERE Role column starts and ends. This is a pre-condition to ignore it.
Nitin SontakkeDeveloper
CERTIFIED EXPERT

Commented:
On a lighter side, I can now ignore any more of your comments on this thread, after I have participated on thread. Me ignoring the comments on any thread I am not participating is (i don't know the appropriate word)!
Ryan ChongSoftware Tead Lead / Business Analyst / System Analyst / Data Engineer / Applications Consultant
CERTIFIED EXPERT

Commented:
so you need to either insert the proper text qualifer or use a unique delimiter in your source file.

Alternatively, try to re-engineer at the source base, so that the output can be properly formatted, if that's feasible.
prodempseyBI Director
CERTIFIED EXPERT
Top Expert 2016

Author

Commented:
Nitin,
Attached is a text file using 3 lines of sample data.  This is the same number of columns, and the format of my data is very similar.  Do you think you could use this to write a code snippet to wrap quotes around the Role values?
sample-data-03_09.txt
prodempseyBI Director
CERTIFIED EXPERT
Top Expert 2016

Author

Commented:
Ryan,
I agree a unique delimiter should have been used in the source file, something like ^ instead of commas, but I'm stuck with what I have.
Ryan ChongSoftware Tead Lead / Business Analyst / System Analyst / Data Engineer / Applications Consultant
CERTIFIED EXPERT

Commented:
and... do you have a list of values for column Role?
CLT, ACT, DIS ...

Open in new window

if the answer is yes, that would be good so that we can either insert the text qualifer or use a unique delimiter in your source file.

depends on how huge is your source file, it could be a painful job to manipulate it to the format that SSIS can recognised.
Nitin SontakkeDeveloper
CERTIFIED EXPERT

Commented:
Even programatically which is the best way to identify the end of role column? Because this condition will be hard coded. For example, what would be the ALL possible values of the next column? Or what would be the ALL possible values of Role column? So any other value indicates next column.
prodempseyBI Director
CERTIFIED EXPERT
Top Expert 2016

Author

Commented:
The values for the Role field can only be the following:

NULL
                                                                 -Intentially left blank
AGT
AGT, CLT
AGT, CLT, CTL, ISS
AGT, CLT, ISS
AGT, CLT, ISS, CTL
AGT, CLT, ISS, EXCH
AGT, CTL
AGT, CTL, CLT, ISS
AGT, ISS, CLT
BROKDEAL
BROKDEAL, CLT
Client Role
Dual Role
Non Client Role
prodempseyBI Director
CERTIFIED EXPERT
Top Expert 2016

Author

Commented:
All possible values of the column before Role is:

Active
Deactivated
Merged
Prospective

All possible values of the column after Role is:

NULL
1.1. Active Rem
1.1. Rem - Active Rem
1.3.4. Become Inactive
Nitin SontakkeDeveloper
CERTIFIED EXPERT

Commented:
Sample given by you also have values such as THR, Dual Client, Client Active. Do they belong to next column?

If yes, we would expect one more comma indicating a empty column.

I hope empty column does have two consecutive commas.
Ryan ChongSoftware Tead Lead / Business Analyst / System Analyst / Data Engineer / Applications Consultant
CERTIFIED EXPERT

Commented:
@prodempsey

your sample data file shows that:

2017-02-04 00:00:00,1252,DIB,Active,CLT, DLT, THR,1. Transact,L,1. Transact,1. Transact,1. Transact,H,Jan 2016,Pend,Pending,1. Transact,1. Transact

NULL
AGT
AGT, CLT
AGT, CLT, CTL, ISS
AGT, CLT, ISS
AGT, CLT, ISS, CTL
AGT, CLT, ISS, EXCH
AGT, CTL
AGT, CTL, CLT, ISS
AGT, ISS, CLT
BROKDEAL
BROKDEAL, CLT
Client Role
Dual Role
Non Client Role 

Open in new window


but this value is not in your Role list?

can you clarify?
Nitin SontakkeDeveloper
CERTIFIED EXPERT

Commented:
Furthermore, all possible values of NEXT column missing 1. Transact, which is there in sample data. Will next column always have "1."? Just trying to establish a pattern, which is a must for identification of end of previous column.
John TsioumprisSoftware & Systems Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
I wanted to ask if the text file is "consistent"..regardless of the multiple Roles it has the same amount of Columns even if they are empty or there are cases that it has missing data ...
If it is consistent then probably you could get away with some comma counting ....
From Start of line to Roles how many commas : I count 3
From End of line to Roles how many commas : i count 9
Then you will start extracting the data from Start up to Roles...then from the End to Roles ...Process the Roles and output the final data.
Ryan ChongSoftware Tead Lead / Business Analyst / System Analyst / Data Engineer / Applications Consultant
CERTIFIED EXPERT

Commented:
@prodempsey

you may do some test and see if this is helpful.

private Boolean UpdateFileContent(String SourceFile, String DestinationFile, List<String> ReplaceList)
        {
            try
            {
                //Proper handling to check if source file exists

                List<String> tmpList = (from line in System.IO.File.ReadAllLines(SourceFile)
                                           .Where(l => !String.IsNullOrWhiteSpace(l))
                                        select line).ToList<String>();
                for(int i = 0; i < tmpList.Count; i++)
                {
                    foreach (String t in ReplaceList)
                    {
                        if (tmpList[i].IndexOf(t) > 0)
                        {
                            tmpList[i] = tmpList[i].Replace(t, @"""" + t + @"""");
                            break;
                        }
                    }
                }
                //Recompile data to target file
                using (var stream = System.IO.File.OpenWrite(DestinationFile))
                {
                    using (var writer = new System.IO.StreamWriter(stream))
                    {
                        foreach (string line in tmpList)
                        {
                            writer.WriteLine(line);
                        }
                    }
                }

                //Proper handling to check if destination file exists, yes to delete or exit?
                return true;
            }
            catch (Exception)
            {
                return false;
            }
        }

Open in new window

use it like:

List<String> replaceList = new List<string>();
            replaceList.Add("AGT, CLT");
            replaceList.Add("AGT, CLT, CTL, ISS");
            replaceList.Add("AGT, CLT, ISS");
            replaceList.Add("AGT, CLT, ISS, CTL");
            replaceList.Add("AGT, CLT, ISS, EXCH");
            replaceList.Add("AGT, CTL");
            replaceList.Add("AGT, CTL, CLT, ISS");
            replaceList.Add("AGT, ISS, CLT");
            replaceList.Add("CLT, DLT, THR");
            
            if (UpdateFileContent(@"D:\yourPath\sample-data-03_09.txt", @"D:\yourPath\sample-data-03_09b.txt", replaceList))
            {
                //...
            }

Open in new window


this program will compare with the Role list you had provided and adding the text qualifier.

customization may be needed.
Developer
CERTIFIED EXPERT
Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION
John TsioumprisSoftware & Systems Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
This is a quick and dirty little method based on the comma counting assumption (just count the necessary commas)
		private static IEnumerable<string> deflateString(string inputLine)
		{
			List<string> startToRoles = new  List<string>();
			List<string> EndToRoles = new  List<string>();
			int i;
			var arr = inputLine.Split(',');
			for( i = 0;i<3;i++)
				startToRoles.Add(arr[i]);
			for(i=arr.Length-1;i>0;i--){
				if(i==9)
					break;
				EndToRoles.Add(arr[i]);
			}
			EndToRoles.Reverse();

			return startToRoles.Union(EndToRoles);
			
		}

Open in new window

Sharath SData Engineer
CERTIFIED EXPERT

Commented:
prodempseyBI Director
CERTIFIED EXPERT
Top Expert 2016

Author

Commented:
Nitin, should I save the text file to the bin folder in the project?
Nitin SontakkeDeveloper
CERTIFIED EXPERT

Commented:
I guess, if you provide FULL path as an input then it should work...for example D:\foldername\someotherfolder\abcd.txt

It's console application. Compile and execute in separate command window.

If it suites you, feel free to hard coded values and an appropriate places.
Nitin SontakkeDeveloper
CERTIFIED EXPERT

Commented:
FYI: I will soon be leaving for home from office and may not access this for about one hour+.
CERTIFIED EXPERT

Commented:
The only way to workaround this is to use a script that will read each line from the source file and count the commas. A line that has only 1 value in the Role field will have a total of, let's say, 3 comas (for the example with 4 fields). All the rows that have 3 commas you will import and the ones that have a different number, more or less, you will write into another file called exceptions.csv. After you imported the "regular" lines you will take care of the exception lines by adding the delimitation quotes where needed. Hopefully they will be not too many to the point it will be not manageable. There is no other way to do this.

You can write a script in VBScript, Python, Perl, JavaScript or whatever language you prefer.
prodempseyBI Director
CERTIFIED EXPERT
Top Expert 2016

Author

Commented:
Nitin,
Do you think it's possible to run this script from a Script Task in SSIS?
prodempseyBI Director
CERTIFIED EXPERT
Top Expert 2016

Author

Commented:
Zberteoc, I don't think counting the commas could work, because the amount of values with commas in that field varies, and there are 14 million rows, so it would be unmanageable.
Nitin SontakkeDeveloper
CERTIFIED EXPERT

Commented:
Not sure, to be honest. But, if I may ask, why? Personally, I believe that it would to lot quicker, to run the program once, have everything sorted out at source and import.

Here is a note on @Zberteoc's comment: All he meant was out of your 14mn rows what if trouble rows are just about 3-4mn? If so, process normal rows (80-90%) first, by putting them in a separate file and then put commas and process (10-20%) rows later. Smaller file to put commas in it.

Do you want a piece of code, counting commas and tell normal and abnormal rows count?
CERTIFIED EXPERT

Commented:
At contrary, counting the commas is THE ONLY way to deal with this in order to differentiate between the "regular" lines, with only one value in the Role field that can be imported with no issues, and the ones with multiple values in the Role field, which will break your import or do it in a faulty way! How else will you do it? You have to IDENTIFY the "broken" lines and that can be done ONLY by counting the commas!

Let's exemplify. Lets say you have only 4 fields as in your sample:

RecordID   Status          Role           Category
-------------    --------    ------------------   -------------
12342          Active  CLT, ACT, DIS    Yellow              -- "bad" line
45544          Active  Act, CLT, DOS    Orange            -- "bad" line
45465          Pend    CLT, ACT, DES   Green            -- "bad" line
5678           Pend     CLT                     Blue            -- "good" line
5678           Pend     DIS                    Red            -- "good" line
5678           Pend     ACT                     Purple            -- "good" line

The good lines will all have only 3 commas(between any 2 felds)  while the bad lines will have any number not equal to 3! You will have to build a script or a piece of code that will check each line and if comma_count=3 import else write the line to a "exception.csv" file! After the good lines are imported you will deal wit the exceptions.csv file. The only problem I see here is if the"bad" lines are frequent within the file in which case you will have to contact your source ad let them know to export a proper CSV file with enclosed fields if they have commas!

Any flat file import process should be build anyway with the possibility of errors lines in mind lines and to capture them in a sort of log file.
prodempseyBI Director
CERTIFIED EXPERT
Top Expert 2016

Author

Commented:
My version of Visual Studio only contains the Business Intelligence templates, so I can't use the C# template.   Is there a way to do this in VBA?
CERTIFIED EXPERT

Commented:
C# is fine.
Nitin SontakkeDeveloper
CERTIFIED EXPERT

Commented:
I am happy to supply with .exe as well, all you have to do is trust me! No sure if it is acceptable to attach .zip containing a .exe file or a .exe itself.
John TsioumprisSoftware & Systems Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
VBA would be more than fine...just read the text line by line and applying splitting...
prodempseyBI Director
CERTIFIED EXPERT
Top Expert 2016

Author

Commented:
I found a way to download Visual Studio at work with the C# template, I'm testing now.
prodempseyBI Director
CERTIFIED EXPERT
Top Expert 2016

Author

Commented:
Nitin, your script runs, but I'm not seeing any changes on the existing file that I choose.  Should it wrap all values in the Role column with quotation marks?
Nitin SontakkeDeveloper
CERTIFIED EXPERT

Commented:
I am not modifying the original file. Are you seeing another file with .out extension added to your original filename? For example, your original file name abcd.txt, there should be another file abcd.txt.out.

Always believed in commandment : Thou shall never contaminate the source!
Nitin SontakkeDeveloper
CERTIFIED EXPERT

Commented:
Furthermore, should you see my code failed in enclosing the values in double quotes as expected, check the value of the next column. It might be one not listed in expected values of next column.

I have placed a comment in code where you should be adding those values enclosed within pipe | character. Add that value there and re-run the code. You may have to delete the .out file created from earlier run.
prodempseyBI Director
CERTIFIED EXPERT
Top Expert 2016

Author

Commented:
Nitin,
I think your script is working.  I can see it's adding quotation marks around the values in the Role column.  I'm running through all 14 million rows now, then I'll try to import it.  If I run into any more issues on the import I'll raise a separate question.  

Thanks to everyone for your help!  I really appreciate it.

Jeremy
prodempseyBI Director
CERTIFIED EXPERT
Top Expert 2016

Author

Commented:
Thanks again!
prodempseyBI Director
CERTIFIED EXPERT
Top Expert 2016

Author

Commented:
I'm having an issue importing the file into SQL Server after running the txt file through Nitin's script.  For only rows where the Role is blank, it's throwing off the commas for the other commas.  Does anyone know what could be causing that to happen?
Nitin SontakkeDeveloper
CERTIFIED EXPERT

Commented:
Please post the set of sample lines. If there is pattern, it can definitely be handled. Even better if you can identify the pattern as you are familiar with the data.

I hope empty column does have two consecutive commas.
is what I had mentioned in my post earlier.

Gain unlimited access to on-demand training courses with an Experts Exchange subscription.

Get Access
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Empower Your Career
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE

Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a sample view!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.