Problem with comma in data itself of csv file

Posted on 2004-04-27
Last Modified: 2007-12-19
I'm writing a program which read in a csv file and do process of data. I am using VB and Split function with comma with delimiter. The result is incorrect when the data in the column itself contain a comma.

(If I export the csv to txt file, I can see that there will be a double quote added to those field containing comma.)

May I know how do u guys solve this problem, able to ignore the comma in inside the data itself when I do csv processing?
Question by:biao81
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 86

Accepted Solution

Mike Tomlinson earned 50 total points
ID: 10935559
One possible solution is to check for the presence of double quotes in your input line.  If they are found then, find the next occurence of double quotes after that.  Now search for any commas that exist inbetween the double quotes.  If any are found, then replace them with some kind of unique marker that doesn't contain a comma.  You could use something like <<comma>>.  Now perform your split as usual using the comma as the delimiter.  After the split, you must check each string in the resulting array and change your <<comma>> markers back to an actual comma.  

These operations can be achieved using the Instr() and Replace() functions.


Author Comment

ID: 10935767
This sounds feasible, but what the data itself also contain double quote, what is the senario?
LVL 86

Expert Comment

by:Mike Tomlinson
ID: 10935848
Then you have to walk the entire input line character by character and count each time you encounter double quotes to keep track of whether you are inside a variable or outside a variable.  If inside, change any commas found to something else, otherwise leave them as is.  After making a complete pass through the input string making any necessary replacements, then you can use the split() function and change the commas back in the resulting array.

Could you post a sample input line so we can better help you solve the problem?

I'm going to bed now, however, so I won't be able to take a gander until tomorrow morning.

Instantly Create Instructional Tutorials

Contextual Guidance at the moment of need helps your employees adopt to new software or processes instantly. Boost knowledge retention and employee engagement step-by-step with one easy solution.

LVL 10

Expert Comment

ID: 10937173
hi there.. u can use ADO if u wan....

below is the code... simple and reliable...

 cn.Open "Provider=Microsoft.Jet" _
                    & ".OLEDB.4.0;Data Source=" & fpath _
                    & ";Extended Properties='text;HDR=YES;" _
                    & "FMT=Delimited'"
            rsTxt.Open "select * from [" & fName & "#csv]", cn, adOpenStatic, _
                  adLockReadOnly, adCmdText

where fpath is the File Path and Fname name of the CSV file to open...

rest u can understand from the code itself..


LVL 26

Expert Comment

ID: 10938203
Can you split with

arr = split(str, """,")

Author Comment

ID: 10945509
Thanks Idle_Mind, ur first idea give me great help, while a lot more complicated calculation need to be considere to achieve this.
This is because the data itself can actually contain also multiple comma or double quote, anyway all this has been solved too :)

Author Comment

ID: 10945534
Maybe I can post my solution here for those who would like to tackle this in future. By the way, this code only include FROM the part after the csv line is stored into an array strLine() and place back comma and quote part is after the line is being Split.

'In order to deal with the invereted comma in data itself, as it will confuse comma replacement
'Take note that " in the data itself will be shown as "" in text file
'Having done this, there is only one potential error found currently which is when the data itself contain
'only invereted comma, which is not very likely to be happen in the data we have :)
Private Function ReplaceInvertedComma()
    Dim i As Integer
    For i = 0 To UBound(strLine) - 1
        strLine(i) = Replace(strLine(i), """""", quoteString)

End Function

'In order to deal with comma in data itself, replace comma with <<comma>>
Private Function ReplaceComma()

    Dim i As Integer
    Dim invertedCommaPos As Integer
    Dim firstOccur, secondOccur As Integer
    Dim occurCount As Integer
    Dim startPos As Integer
    startPos = 1
    firstOccur = 0
    secondOccur = 0
    invertedCommaPos = 1
    'commaString = "<<comma>>"
    For i = 0 To UBound(strLine) - 1
        invertedCommaPos = 1
        While InStr(invertedCommaPos, strLine(i), """") <> 0
            occurCount = 0
            startPos = 1
            firstOccur = InStr(invertedCommaPos, strLine(i), """")
            secondOccur = InStr(firstOccur + 1, strLine(i), """")
            'To replace multiple comma found inside two inverted comma, keep track of the number of occurance
            While InStr(startPos, Mid(strLine(i), firstOccur, secondOccur - firstOccur + 1), ",") <> 0
                 startPos = InStr(startPos, Mid(strLine(i), firstOccur, secondOccur - firstOccur + 1), ",") + 1
                 occurCount = occurCount + 1
            'Replace all occurance of comma within the inverted comma
            strLine(i) = Mid(strLine(i), 1, firstOccur - 1) & Replace(Mid(strLine(i), firstOccur, secondOccur - firstOccur + 1), ",", commaString) & Mid(strLine(i), secondOccur + 1)
            'Calculate the last found inverted comma position, considering multiple comma found
            invertedCommaPos = secondOccur + (Len(commaString) * occurCount) + 1
End Function

Private Function PlaceCommaBack()
    Dim i As Integer
    For i = LBound(aryData) To UBound(aryData)
        aryData(i) = Replace(aryData(i), commaString, ",")
End Function

Private Function PlaceInvertedCommaBack()
    Dim i As Integer

    For i = LBound(aryData) To UBound(aryData)
        aryData(i) = Replace(aryData(i), quoteString, """")

End Function


Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The debugging module of the VB 6 IDE can be accessed by way of the Debug menu item. That menu item can normally be found in the IDE's main menu line as shown in this picture.   There is also a companion Debug Toolbar that looks like the followin…
If you need to start windows update installation remotely or as a scheduled task you will find this very helpful.
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…

718 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question