Problem with comma in data itself of csv file

Posted on 2004-04-27
Medium Priority
Last Modified: 2007-12-19
I'm writing a program which read in a csv file and do process of data. I am using VB and Split function with comma with delimiter. The result is incorrect when the data in the column itself contain a comma.

(If I export the csv to txt file, I can see that there will be a double quote added to those field containing comma.)

May I know how do u guys solve this problem, able to ignore the comma in inside the data itself when I do csv processing?
Question by:biao81
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 86

Accepted Solution

Mike Tomlinson earned 150 total points
ID: 10935559
One possible solution is to check for the presence of double quotes in your input line.  If they are found then, find the next occurence of double quotes after that.  Now search for any commas that exist inbetween the double quotes.  If any are found, then replace them with some kind of unique marker that doesn't contain a comma.  You could use something like <<comma>>.  Now perform your split as usual using the comma as the delimiter.  After the split, you must check each string in the resulting array and change your <<comma>> markers back to an actual comma.  

These operations can be achieved using the Instr() and Replace() functions.


Author Comment

ID: 10935767
This sounds feasible, but what the data itself also contain double quote, what is the senario?
LVL 86

Expert Comment

by:Mike Tomlinson
ID: 10935848
Then you have to walk the entire input line character by character and count each time you encounter double quotes to keep track of whether you are inside a variable or outside a variable.  If inside, change any commas found to something else, otherwise leave them as is.  After making a complete pass through the input string making any necessary replacements, then you can use the split() function and change the commas back in the resulting array.

Could you post a sample input line so we can better help you solve the problem?

I'm going to bed now, however, so I won't be able to take a gander until tomorrow morning.


Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

LVL 10

Expert Comment

ID: 10937173
hi there.. u can use ADO if u wan....

below is the code... simple and reliable...

 cn.Open "Provider=Microsoft.Jet" _
                    & ".OLEDB.4.0;Data Source=" & fpath _
                    & ";Extended Properties='text;HDR=YES;" _
                    & "FMT=Delimited'"
            rsTxt.Open "select * from [" & fName & "#csv]", cn, adOpenStatic, _
                  adLockReadOnly, adCmdText

where fpath is the File Path and Fname name of the CSV file to open...

rest u can understand from the code itself..


LVL 26

Expert Comment

ID: 10938203
Can you split with

arr = split(str, """,")

Author Comment

ID: 10945509
Thanks Idle_Mind, ur first idea give me great help, while a lot more complicated calculation need to be considere to achieve this.
This is because the data itself can actually contain also multiple comma or double quote, anyway all this has been solved too :)

Author Comment

ID: 10945534
Maybe I can post my solution here for those who would like to tackle this in future. By the way, this code only include FROM the part after the csv line is stored into an array strLine() and place back comma and quote part is after the line is being Split.

'In order to deal with the invereted comma in data itself, as it will confuse comma replacement
'Take note that " in the data itself will be shown as "" in text file
'Having done this, there is only one potential error found currently which is when the data itself contain
'only invereted comma, which is not very likely to be happen in the data we have :)
Private Function ReplaceInvertedComma()
    Dim i As Integer
    For i = 0 To UBound(strLine) - 1
        strLine(i) = Replace(strLine(i), """""", quoteString)

End Function

'In order to deal with comma in data itself, replace comma with <<comma>>
Private Function ReplaceComma()

    Dim i As Integer
    Dim invertedCommaPos As Integer
    Dim firstOccur, secondOccur As Integer
    Dim occurCount As Integer
    Dim startPos As Integer
    startPos = 1
    firstOccur = 0
    secondOccur = 0
    invertedCommaPos = 1
    'commaString = "<<comma>>"
    For i = 0 To UBound(strLine) - 1
        invertedCommaPos = 1
        While InStr(invertedCommaPos, strLine(i), """") <> 0
            occurCount = 0
            startPos = 1
            firstOccur = InStr(invertedCommaPos, strLine(i), """")
            secondOccur = InStr(firstOccur + 1, strLine(i), """")
            'To replace multiple comma found inside two inverted comma, keep track of the number of occurance
            While InStr(startPos, Mid(strLine(i), firstOccur, secondOccur - firstOccur + 1), ",") <> 0
                 startPos = InStr(startPos, Mid(strLine(i), firstOccur, secondOccur - firstOccur + 1), ",") + 1
                 occurCount = occurCount + 1
            'Replace all occurance of comma within the inverted comma
            strLine(i) = Mid(strLine(i), 1, firstOccur - 1) & Replace(Mid(strLine(i), firstOccur, secondOccur - firstOccur + 1), ",", commaString) & Mid(strLine(i), secondOccur + 1)
            'Calculate the last found inverted comma position, considering multiple comma found
            invertedCommaPos = secondOccur + (Len(commaString) * occurCount) + 1
End Function

Private Function PlaceCommaBack()
    Dim i As Integer
    For i = LBound(aryData) To UBound(aryData)
        aryData(i) = Replace(aryData(i), commaString, ",")
End Function

Private Function PlaceInvertedCommaBack()
    Dim i As Integer

    For i = LBound(aryData) To UBound(aryData)
        aryData(i) = Replace(aryData(i), quoteString, """")

End Function


Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction While answering a recent question about filtering a custom class collection, I realized that this could be accomplished with very little code by using the ScriptControl (SC) library.  This article will introduce you to the SC library a…
You can of course define an array to hold data that is of a particular type like an array of Strings to hold customer names or an array of Doubles to hold customer sales, but what do you do if you want to coordinate that data? This article describes…
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…
Suggested Courses
Course of the Month13 days, 19 hours left to enroll

801 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question