Solved

Problem with comma in data itself of csv file

Posted on 2004-04-27
7
2,838 Views
Last Modified: 2007-12-19
I'm writing a program which read in a csv file and do process of data. I am using VB and Split function with comma with delimiter. The result is incorrect when the data in the column itself contain a comma.

(If I export the csv to txt file, I can see that there will be a double quote added to those field containing comma.)

May I know how do u guys solve this problem, able to ignore the comma in inside the data itself when I do csv processing?
0
Comment
Question by:biao81
7 Comments
 
LVL 85

Accepted Solution

by:
Mike Tomlinson earned 50 total points
ID: 10935559
One possible solution is to check for the presence of double quotes in your input line.  If they are found then, find the next occurence of double quotes after that.  Now search for any commas that exist inbetween the double quotes.  If any are found, then replace them with some kind of unique marker that doesn't contain a comma.  You could use something like <<comma>>.  Now perform your split as usual using the comma as the delimiter.  After the split, you must check each string in the resulting array and change your <<comma>> markers back to an actual comma.  

These operations can be achieved using the Instr() and Replace() functions.

Idle_Mind
0
 

Author Comment

by:biao81
ID: 10935767
This sounds feasible, but what the data itself also contain double quote, what is the senario?
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 10935848
Then you have to walk the entire input line character by character and count each time you encounter double quotes to keep track of whether you are inside a variable or outside a variable.  If inside, change any commas found to something else, otherwise leave them as is.  After making a complete pass through the input string making any necessary replacements, then you can use the split() function and change the commas back in the resulting array.

Could you post a sample input line so we can better help you solve the problem?

I'm going to bed now, however, so I won't be able to take a gander until tomorrow morning.

Idle_Mind
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 10

Expert Comment

by:anv
ID: 10937173
hi there.. u can use ADO if u wan....

below is the code... simple and reliable...

 cn.Open "Provider=Microsoft.Jet" _
                    & ".OLEDB.4.0;Data Source=" & fpath _
                    & ";Extended Properties='text;HDR=YES;" _
                    & "FMT=Delimited'"
                   
            rsTxt.Open "select * from [" & fName & "#csv]", cn, adOpenStatic, _
                  adLockReadOnly, adCmdText

where fpath is the File Path and Fname name of the CSV file to open...

rest u can understand from the code itself..

regards

anv
0
 
LVL 26

Expert Comment

by:EDDYKT
ID: 10938203
Can you split with

arr = split(str, """,")
0
 

Author Comment

by:biao81
ID: 10945509
Thanks Idle_Mind, ur first idea give me great help, while a lot more complicated calculation need to be considere to achieve this.
This is because the data itself can actually contain also multiple comma or double quote, anyway all this has been solved too :)
0
 

Author Comment

by:biao81
ID: 10945534
Maybe I can post my solution here for those who would like to tackle this in future. By the way, this code only include FROM the part after the csv line is stored into an array strLine() and place back comma and quote part is after the line is being Split.



'In order to deal with the invereted comma in data itself, as it will confuse comma replacement
'Take note that " in the data itself will be shown as "" in text file
'Having done this, there is only one potential error found currently which is when the data itself contain
'only invereted comma, which is not very likely to be happen in the data we have :)
Private Function ReplaceInvertedComma()
    Dim i As Integer
   
    For i = 0 To UBound(strLine) - 1
        strLine(i) = Replace(strLine(i), """""", quoteString)
    Next

End Function

'In order to deal with comma in data itself, replace comma with <<comma>>
Private Function ReplaceComma()

    Dim i As Integer
    Dim invertedCommaPos As Integer
    Dim firstOccur, secondOccur As Integer
   
    Dim occurCount As Integer
    Dim startPos As Integer
    startPos = 1
   
    firstOccur = 0
    secondOccur = 0
    invertedCommaPos = 1
   
    'commaString = "<<comma>>"
   
    For i = 0 To UBound(strLine) - 1
        invertedCommaPos = 1
        While InStr(invertedCommaPos, strLine(i), """") <> 0
            occurCount = 0
            startPos = 1
           
           
            firstOccur = InStr(invertedCommaPos, strLine(i), """")
           
            secondOccur = InStr(firstOccur + 1, strLine(i), """")
         
           
            'To replace multiple comma found inside two inverted comma, keep track of the number of occurance
            While InStr(startPos, Mid(strLine(i), firstOccur, secondOccur - firstOccur + 1), ",") <> 0
                 startPos = InStr(startPos, Mid(strLine(i), firstOccur, secondOccur - firstOccur + 1), ",") + 1
                 occurCount = occurCount + 1
            Wend
           
            'Replace all occurance of comma within the inverted comma
            strLine(i) = Mid(strLine(i), 1, firstOccur - 1) & Replace(Mid(strLine(i), firstOccur, secondOccur - firstOccur + 1), ",", commaString) & Mid(strLine(i), secondOccur + 1)
                         
                           
            'Calculate the last found inverted comma position, considering multiple comma found
            invertedCommaPos = secondOccur + (Len(commaString) * occurCount) + 1
        Wend
    Next
End Function

Private Function PlaceCommaBack()
    Dim i As Integer
   
    For i = LBound(aryData) To UBound(aryData)
        aryData(i) = Replace(aryData(i), commaString, ",")
    Next
   
End Function

Private Function PlaceInvertedCommaBack()
    Dim i As Integer

    For i = LBound(aryData) To UBound(aryData)
        aryData(i) = Replace(aryData(i), quoteString, """")
    Next

End Function

0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When designing a form there are several BorderStyles to choose from, all of which can be classified as either 'Fixed' or 'Sizable' and I'd guess that 'Fixed Single' or one of the other fixed types is the most popular choice. I assume it's the most p…
Enums (shorthand for ‘enumerations’) are not often used by programmers but they can be quite valuable when they are.  What are they? An Enum is just a type of variable like a string or an Integer, but in this case one that you create that contains…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question