Solved

Problem with comma in data itself of csv file

Posted on 2004-04-27
7
2,833 Views
Last Modified: 2007-12-19
I'm writing a program which read in a csv file and do process of data. I am using VB and Split function with comma with delimiter. The result is incorrect when the data in the column itself contain a comma.

(If I export the csv to txt file, I can see that there will be a double quote added to those field containing comma.)

May I know how do u guys solve this problem, able to ignore the comma in inside the data itself when I do csv processing?
0
Comment
Question by:biao81
7 Comments
 
LVL 85

Accepted Solution

by:
Mike Tomlinson earned 50 total points
ID: 10935559
One possible solution is to check for the presence of double quotes in your input line.  If they are found then, find the next occurence of double quotes after that.  Now search for any commas that exist inbetween the double quotes.  If any are found, then replace them with some kind of unique marker that doesn't contain a comma.  You could use something like <<comma>>.  Now perform your split as usual using the comma as the delimiter.  After the split, you must check each string in the resulting array and change your <<comma>> markers back to an actual comma.  

These operations can be achieved using the Instr() and Replace() functions.

Idle_Mind
0
 

Author Comment

by:biao81
ID: 10935767
This sounds feasible, but what the data itself also contain double quote, what is the senario?
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 10935848
Then you have to walk the entire input line character by character and count each time you encounter double quotes to keep track of whether you are inside a variable or outside a variable.  If inside, change any commas found to something else, otherwise leave them as is.  After making a complete pass through the input string making any necessary replacements, then you can use the split() function and change the commas back in the resulting array.

Could you post a sample input line so we can better help you solve the problem?

I'm going to bed now, however, so I won't be able to take a gander until tomorrow morning.

Idle_Mind
0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 
LVL 10

Expert Comment

by:anv
ID: 10937173
hi there.. u can use ADO if u wan....

below is the code... simple and reliable...

 cn.Open "Provider=Microsoft.Jet" _
                    & ".OLEDB.4.0;Data Source=" & fpath _
                    & ";Extended Properties='text;HDR=YES;" _
                    & "FMT=Delimited'"
                   
            rsTxt.Open "select * from [" & fName & "#csv]", cn, adOpenStatic, _
                  adLockReadOnly, adCmdText

where fpath is the File Path and Fname name of the CSV file to open...

rest u can understand from the code itself..

regards

anv
0
 
LVL 26

Expert Comment

by:EDDYKT
ID: 10938203
Can you split with

arr = split(str, """,")
0
 

Author Comment

by:biao81
ID: 10945509
Thanks Idle_Mind, ur first idea give me great help, while a lot more complicated calculation need to be considere to achieve this.
This is because the data itself can actually contain also multiple comma or double quote, anyway all this has been solved too :)
0
 

Author Comment

by:biao81
ID: 10945534
Maybe I can post my solution here for those who would like to tackle this in future. By the way, this code only include FROM the part after the csv line is stored into an array strLine() and place back comma and quote part is after the line is being Split.



'In order to deal with the invereted comma in data itself, as it will confuse comma replacement
'Take note that " in the data itself will be shown as "" in text file
'Having done this, there is only one potential error found currently which is when the data itself contain
'only invereted comma, which is not very likely to be happen in the data we have :)
Private Function ReplaceInvertedComma()
    Dim i As Integer
   
    For i = 0 To UBound(strLine) - 1
        strLine(i) = Replace(strLine(i), """""", quoteString)
    Next

End Function

'In order to deal with comma in data itself, replace comma with <<comma>>
Private Function ReplaceComma()

    Dim i As Integer
    Dim invertedCommaPos As Integer
    Dim firstOccur, secondOccur As Integer
   
    Dim occurCount As Integer
    Dim startPos As Integer
    startPos = 1
   
    firstOccur = 0
    secondOccur = 0
    invertedCommaPos = 1
   
    'commaString = "<<comma>>"
   
    For i = 0 To UBound(strLine) - 1
        invertedCommaPos = 1
        While InStr(invertedCommaPos, strLine(i), """") <> 0
            occurCount = 0
            startPos = 1
           
           
            firstOccur = InStr(invertedCommaPos, strLine(i), """")
           
            secondOccur = InStr(firstOccur + 1, strLine(i), """")
         
           
            'To replace multiple comma found inside two inverted comma, keep track of the number of occurance
            While InStr(startPos, Mid(strLine(i), firstOccur, secondOccur - firstOccur + 1), ",") <> 0
                 startPos = InStr(startPos, Mid(strLine(i), firstOccur, secondOccur - firstOccur + 1), ",") + 1
                 occurCount = occurCount + 1
            Wend
           
            'Replace all occurance of comma within the inverted comma
            strLine(i) = Mid(strLine(i), 1, firstOccur - 1) & Replace(Mid(strLine(i), firstOccur, secondOccur - firstOccur + 1), ",", commaString) & Mid(strLine(i), secondOccur + 1)
                         
                           
            'Calculate the last found inverted comma position, considering multiple comma found
            invertedCommaPos = secondOccur + (Len(commaString) * occurCount) + 1
        Wend
    Next
End Function

Private Function PlaceCommaBack()
    Dim i As Integer
   
    For i = LBound(aryData) To UBound(aryData)
        aryData(i) = Replace(aryData(i), commaString, ",")
    Next
   
End Function

Private Function PlaceInvertedCommaBack()
    Dim i As Integer

    For i = LBound(aryData) To UBound(aryData)
        aryData(i) = Replace(aryData(i), quoteString, """")
    Next

End Function

0

Featured Post

Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

When trying to find the cause of a problem in VBA or VB6 it's often valuable to know what procedures were executed prior to the error. You can use the Call Stack for that but it is often inadequate because it may show procedures you aren't intereste…
If you have ever used Microsoft Word then you know that it has a good spell checker and it may have occurred to you that the ability to check spelling might be a nice piece of functionality to add to certain applications of yours. Well the code that…
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question