Avatar of Jimbo99999
Jimbo99999
Flag for United States of America asked on

VB.Net - Import CSV File using Split

Good Day Experts!

I am using Split in my Import of CSV file to seperate the string of data by the commas.  My problem is that within the "cell" of data is a comma.  

How do I replace the comma with a space before I do the Split?

Thanks,
jimbo99999
Visual Basic.NET

Avatar of undefined
Last Comment
Jimbo99999

8/22/2022 - Mon
Daniel Kline

"CSV" is "comma separated values"  There is no way to differentiate which commas are cell borders and which a content.  Is it possible to save the file with a different delimiter "~", tilde, is usually pretty safe.
Jimbo99999

ASKER
Ok, here is what I found out.  I am using the following line:

For Each sLine As String In stream.ReadToEnd.Split(Environment.NewLine)

I was trying to then do a Split on a comma.

When I look at sLine in the Immediate window,  it appears there are is a a dobule quote around the cell data that contains a comma...here is a sample:

,Maria Korakidou,"10-12 Nikodemou Milona, Paphos",P.o.x.60040,

So, I guess I cannot use Split. I have to parse apart "manually" somehow.  Or maybe replace the  comma if found between the double quotes then do a Split.

What do you all think how I can approach this?

Thanks,
jimbo99999
Daniel Kline

Take the File back into Excel and save it as a "Text Tab delimited ("*.txt")".

Now, split on a tab character and you should be on your way.
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
SOLUTION
Göran Andersson

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
Fernando Soto

Hi jimbo99999;

Visual Basic has a class called TextFieldRarser which can handle the comma within quote. You configure the parser as needed and then read the file line by line. Each line is returned as an array of cells. Please check out the link above.
Jimbo99999

ASKER
Good Day Experts

Goran:

I am unfamiliar with everything including and after .Cast.  The "function" term is causing an error.  

Fernando Soto:

I have tried the TextFieldParser and it looked reall promising.  But I found another issue in the data.  I have this scenario when I interrogate the sLine string:

,8839094,"Wood Gallery Oversized Picture Frame, 25 x 25", Espresso stain",8.90

The ItemDescription is ,"Wood Gallery Oversized Picture Frame, 25 x 25", Espresso stain",    But the TextFieldParser is grabbing  "Wood Gallery Oversized Picture Frame, 25 x 25" as the ItemDescription and Espresso stain" is getting treated as the next field which is ItemWeight.  

Do you see anything else I can do?  This is feeling like the Customer sending the data needs to get it cleaned up.

Thanks,
jimbo99999
Daniel Kline

You could still use Excel to clean it up yourself using my earlier suggestion of saving it as a tab delimitted file.
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Jimbo99999

ASKER
I totally forgot to go back to that.  I tried opening and resaving and then was distracted with all the other User needs.  

I was menaing to ask do I have to open it and resave manually or is there a way to do it programatically?

Thansk,
jimbo9999
ASKER CERTIFIED SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Jimbo99999

ASKER
Hello:

It is not fixed width fields.  

I am open formatting modification suggestions as I am just about out of options here.  I just tried saving seperated by Tab and it seprated like 95% of the lines and then there is groups of 5-10 that still have comma seperating the fields!

What is your suggestion?

I am concerned with the issue of the dbl quote within the quoted string:

 ,"Wood Gallery Oversized Picture Frame, 25 x 25", Espresso stain",    

Thanks,
jimbo99999
Fernando Soto

Hi jimbo99999;

Being open to modifying the file if you escape the double quote inside a quoted field like so :

 ,"Wood Gallery Oversized Picture Frame, 25 x 25"", Espresso stain",

Then the TextFieldParser will correctly interpret the double quote and insert a single quote in the field so that in your program the text will be :

 ,"Wood Gallery Oversized Picture Frame, 25 x 25", Espresso stain",
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
Jimbo99999

ASKER
Hello:

I apologize but I am not quite sure what you are meaning by "escaping" the the double quote".

Does that mean to just put another double quote right after the already present double quote?

Thanks,
jimbo99999
Fernando Soto

Yes, as stated,  ,"Wood Gallery Oversized Picture Frame, 25 x 25"", Espresso stain", note the double quote after the second 25 it has "" and NOT ". That double quote is taken literally and gives it no special meaning.
Jimbo99999

ASKER
Thanks...trying right now.
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Göran Andersson

It seems that you are using framework 2, that doesn't have LINQ methods...

Here is a version of the code that uses a plain loop instead:

Dim matches = Regex.Matches(sLine, "\s*""(?'v'[^""]*)""\s*|\s*(?'v'[^,]+)\s*")
Dim values As New List(Of String)
For Each m As Match In matches
  values.Add(m.Groups("v").Value)
Next

Open in new window

Jimbo99999

ASKER
Good Day Experts!

Ok, with the TextFileParser I came across 52 records that would not load...the rest of the 21,000 were ok and imported into my SQL Table.

I then saved the file as Tab delimited and came across 63 records that would not load...the rest of of the 21,000 were ok and imported into my SQL Table.

Goran: I will try your method next.

Thanks,
jimbo99999
Fernando Soto

Do you know what the issue was with the lines that did not load?
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Jimbo99999

ASKER
On the CSV attempt, it was like there were special characters making the row height is Excel really large on some of them...never seen this before.  


On the Tab delimited ones, the records still had commas seperating the "cell Values". So for some reason when I opened the CSV and resaved with a Tab delimiter those did not get converted.

In my code here is how I am checking hte end of the line:

Using file As New IO.FileStream(Path, FileMode.Open, FileAccess.Read)
           Using streamDup As New IO.StreamReader(file)
                     For Each sLine As String In streamDup.ReadToEnd.Split(Environment.NewLine)
                                Dim trimChars As Char() = {Chr(10), Chr(13)}
                                sLine = sLine.Trim(trimChars)
                      Next
            End Using
End Using

Then again that just is taking the Char off the end.  Is that ok or should I be doing it a different way?

Thanks,
jimbo99999
Daniel Kline

The CSV special character were probably Control Characters (non printing).  They indicate careless entry.  For the Tab Delimited I'm confused.  I can't recreate the issue.

Is there sensitive information in the file, or can you upload a sample of it.  I'd like to see what the data looks like.  I've done these conversions many times and it should be very easy.  If you can get a good sampling of the error rows, that would be helpful.
Jimbo99999

ASKER
Hello:

It is a bit sensitive. I am pressing on and trying again here.  I did notice that when I open the CSV file to save as Tab delimited text file another issue has popped up that I did not notice.  2.0016E+15 is getting saved to the database instead of 2001601005905940.  I have tried opening Excel and setting the columns to Text before opening the file but that does not work.  

If I just run the CSV file without opening to re-save as Tab delimeted and use TextFilePArser to split on  comma 2001601005905940 gets saved to the database.  

Is it possible to save as Tab delimited Text without opening CSV?

Thanks,
jimbo99999
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Daniel Kline

So, if I understand you, you are opening Excel, importing the data using the Data Menu.  Try Setting the column type as number with 0 decimal points.  It works for me.  You can see the text in the screen shot.  When you view your "txt" file does it look like a string of numeric digits or in scientific notation.

If it's a numeric digit string, make sure you are casting to a string in TextFieldParser.  It could be that VB in interpreting the value as numeric.
DecZeroExample.png
Göran Andersson

You said that you would try my method. Any progress on that?
Jimbo99999

ASKER
Good Day Everyone

Goran:
 I have tried your suggestion but am getting an error. The values in values.Add is causing this error:

"Object reference not set to an instance of an object"

Do I need to add something else to make this work?

Daniel:
I am opening the CSV just to save it as tab delimited text file.  This is causing the 2001601005905940 to be saved 2.0016E+15.  So, when I open the Tab delimited text file, 2.0016E+15 is in there not 2001601005905940. Then when I read the file 2.0016E+15 is retrieved and subsequently saved to the db.  If I never open the CSV file and use the following code the data is saved properly to the db:

                                        Using rdr As New StringReader(sLine)
                                            Using parser As New TextFieldParser(rdr)
                                                parser.TextFieldType = FieldType.Delimited
                                                parser.Delimiters = New String() {","}
                                                parser.HasFieldsEnclosedInQuotes = True
                                                sRecords = parser.ReadFields
                                            End Using
                                        End Using

Thanks,
jimbo99999
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
Göran Andersson

If the variable 'values' causes that error, then there is something missing from the code. The variable is assigned a list when it is declared.

The only other way that I managed to repeat the expection is it the variable 'sLine' is null.
Daniel Kline

Hi jimbo99999,

It is indeed strange that excel is working differently for you.  I need to chew on this one and see if there is another approach.
Daniel Kline

In the interim check out this approach to stripping special characters out of the original file.  It may be that embedded control codes are raising hell with excel and your code.  http://www.freevbcode.com/ShowCode.asp?ID=31
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Jimbo99999

ASKER
Thanks for the efforts.  I am back to this project for the week.  Reports are that the Customer will not be approached to do any kind of data cleanup prior to sending it to us.  So, I need to find the best way to handle this.  

I will keep trying all the approaches everyone has offered.

Thanks for your continued assistance,
jimbo99999
Jimbo99999

ASKER
Good Day
While I have not found a programmatic solution to this issue, I am going to go ahead and close .  All three of you have provided excellent information that I was not aware  of and will keep it in my knowledge base.

Thanks again,jimbo99999