Solved

Problem with comma in data itself of csv file

Posted on 2004-04-27
7
2,824 Views
Last Modified: 2007-12-19
I'm writing a program which read in a csv file and do process of data. I am using VB and Split function with comma with delimiter. The result is incorrect when the data in the column itself contain a comma.

(If I export the csv to txt file, I can see that there will be a double quote added to those field containing comma.)

May I know how do u guys solve this problem, able to ignore the comma in inside the data itself when I do csv processing?
0
Comment
Question by:biao81
7 Comments
 
LVL 85

Accepted Solution

by:
Mike Tomlinson earned 50 total points
Comment Utility
One possible solution is to check for the presence of double quotes in your input line.  If they are found then, find the next occurence of double quotes after that.  Now search for any commas that exist inbetween the double quotes.  If any are found, then replace them with some kind of unique marker that doesn't contain a comma.  You could use something like <<comma>>.  Now perform your split as usual using the comma as the delimiter.  After the split, you must check each string in the resulting array and change your <<comma>> markers back to an actual comma.  

These operations can be achieved using the Instr() and Replace() functions.

Idle_Mind
0
 

Author Comment

by:biao81
Comment Utility
This sounds feasible, but what the data itself also contain double quote, what is the senario?
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
Comment Utility
Then you have to walk the entire input line character by character and count each time you encounter double quotes to keep track of whether you are inside a variable or outside a variable.  If inside, change any commas found to something else, otherwise leave them as is.  After making a complete pass through the input string making any necessary replacements, then you can use the split() function and change the commas back in the resulting array.

Could you post a sample input line so we can better help you solve the problem?

I'm going to bed now, however, so I won't be able to take a gander until tomorrow morning.

Idle_Mind
0
Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

 
LVL 10

Expert Comment

by:anv
Comment Utility
hi there.. u can use ADO if u wan....

below is the code... simple and reliable...

 cn.Open "Provider=Microsoft.Jet" _
                    & ".OLEDB.4.0;Data Source=" & fpath _
                    & ";Extended Properties='text;HDR=YES;" _
                    & "FMT=Delimited'"
                   
            rsTxt.Open "select * from [" & fName & "#csv]", cn, adOpenStatic, _
                  adLockReadOnly, adCmdText

where fpath is the File Path and Fname name of the CSV file to open...

rest u can understand from the code itself..

regards

anv
0
 
LVL 26

Expert Comment

by:EDDYKT
Comment Utility
Can you split with

arr = split(str, """,")
0
 

Author Comment

by:biao81
Comment Utility
Thanks Idle_Mind, ur first idea give me great help, while a lot more complicated calculation need to be considere to achieve this.
This is because the data itself can actually contain also multiple comma or double quote, anyway all this has been solved too :)
0
 

Author Comment

by:biao81
Comment Utility
Maybe I can post my solution here for those who would like to tackle this in future. By the way, this code only include FROM the part after the csv line is stored into an array strLine() and place back comma and quote part is after the line is being Split.



'In order to deal with the invereted comma in data itself, as it will confuse comma replacement
'Take note that " in the data itself will be shown as "" in text file
'Having done this, there is only one potential error found currently which is when the data itself contain
'only invereted comma, which is not very likely to be happen in the data we have :)
Private Function ReplaceInvertedComma()
    Dim i As Integer
   
    For i = 0 To UBound(strLine) - 1
        strLine(i) = Replace(strLine(i), """""", quoteString)
    Next

End Function

'In order to deal with comma in data itself, replace comma with <<comma>>
Private Function ReplaceComma()

    Dim i As Integer
    Dim invertedCommaPos As Integer
    Dim firstOccur, secondOccur As Integer
   
    Dim occurCount As Integer
    Dim startPos As Integer
    startPos = 1
   
    firstOccur = 0
    secondOccur = 0
    invertedCommaPos = 1
   
    'commaString = "<<comma>>"
   
    For i = 0 To UBound(strLine) - 1
        invertedCommaPos = 1
        While InStr(invertedCommaPos, strLine(i), """") <> 0
            occurCount = 0
            startPos = 1
           
           
            firstOccur = InStr(invertedCommaPos, strLine(i), """")
           
            secondOccur = InStr(firstOccur + 1, strLine(i), """")
         
           
            'To replace multiple comma found inside two inverted comma, keep track of the number of occurance
            While InStr(startPos, Mid(strLine(i), firstOccur, secondOccur - firstOccur + 1), ",") <> 0
                 startPos = InStr(startPos, Mid(strLine(i), firstOccur, secondOccur - firstOccur + 1), ",") + 1
                 occurCount = occurCount + 1
            Wend
           
            'Replace all occurance of comma within the inverted comma
            strLine(i) = Mid(strLine(i), 1, firstOccur - 1) & Replace(Mid(strLine(i), firstOccur, secondOccur - firstOccur + 1), ",", commaString) & Mid(strLine(i), secondOccur + 1)
                         
                           
            'Calculate the last found inverted comma position, considering multiple comma found
            invertedCommaPos = secondOccur + (Len(commaString) * occurCount) + 1
        Wend
    Next
End Function

Private Function PlaceCommaBack()
    Dim i As Integer
   
    For i = LBound(aryData) To UBound(aryData)
        aryData(i) = Replace(aryData(i), commaString, ",")
    Next
   
End Function

Private Function PlaceInvertedCommaBack()
    Dim i As Integer

    For i = LBound(aryData) To UBound(aryData)
        aryData(i) = Replace(aryData(i), quoteString, """")
    Next

End Function

0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Background What I'm presenting in this article is the result of 2 conditions in my work area: We have a SQL Server production environment but no development or test environment; andWe have an MS Access front end using tables in SQL Server but we a…
You can of course define an array to hold data that is of a particular type like an array of Strings to hold customer names or an array of Doubles to hold customer sales, but what do you do if you want to coordinate that data? This article describes…
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

7 Experts available now in Live!

Get 1:1 Help Now