Solved

Read CSV file to array

Posted on 2004-03-29
10
88,873 Views
Last Modified: 2011-08-18
I know how to read a file.
But how to split the content? Beside being separated by comma, the actual data also contains comma. What should I do?
I need to store the contents into a multi-d array too.
0
Comment
Question by:yan_er04
10 Comments
 
LVL 48

Assisted Solution

by:Mikal613
Mikal613 earned 50 total points
ID: 10705438
Dim file_name As String
Dim fnum As Integer
Dim whole_file As String
Dim lines As Variant
Dim num_rows As Long
Dim the_array() As Variant
Dim R As Long
Dim C As Long

    file_name = App.Path
    If Right$(file_name, 1) <> "\" Then file_name = _
        file_name & "\"
    file_name = file_name & "test.csv"

    ' Load the file.
    fnum = FreeFile
    Open file_name For Input As fnum
    whole_file = Input$(LOF(fnum), #fnum)
    Close fnum

    ' Break the file into lines.
    lines = Split(whole_file, vbCrLf)

    ' Dimension the array.
    num_rows = UBound(lines)
    ReDim the_array(num_rows)

    ' Copy the data into the array.
    For R = 0 To num_rows
        the_array(R) = Split(lines(R), ",")
    Next R

    ' Prove we have the data loaded.
    For R = 0 To UBound(the_array)
        For C = 0 To UBound(the_array(R))
            Debug.Print the_array(R)(C) & "/";
        Next C
        Debug.Print
    Next R
    Debug.Print "===END==="
0
 
LVL 7

Expert Comment

by:dante469
ID: 10705632
The file is comma delimited and the data contains commas....

Comma delimitted is not a good option for you in that you will have to write a great deal of logic to determin true delimeters from data :(...

If possible could you delimit with different character...  Say a tilde ~ ...

That is the easiest answer and would simple use code similar to that provided by Mikal613 :)...

Good luck,
Dante
0
 
LVL 3

Expert Comment

by:gillit
ID: 10707418
// semi-pseudo-code to change your delimiter
bool inString = false;
for(int x=0;x<strFile.Length;x++)
  if(strFile[x]=='\"')  //Character that indicates that the comma is not a delimiter, normally double quote
    inString = !inString;
  else if(!inString && strFile[x]==',')
    strFile[x]='~'; //Some character that is not used in your file.

// Add code to split by new delimiter, in my example ~
0
Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

 
LVL 12

Expert Comment

by:farsight
ID: 10710508
Give us a sample line that illustrates this.
> The file is comma delimited and the data contains commas....
Also illustrate any other special characters in the data.  What if it contains a single-quote?  A double-quote?   etc.

A simple regular expression will parse it easily (if it's parseable).

Also note that much of Mikal613's code for this example is VB6-style, instead of VB.NET style.
You may prefer to translate it to use the newer methods.
0
 
LVL 12

Expert Comment

by:farsight
ID: 10710669
(I haven't tried this one, but it sounds promising.)
Parsing Comma-Separated Values or CSV files into a DataTable using Regular Expressions
http://www.hotblue.com/article0000.aspx?a=0006

--- --- ---

from:  http://www.3leaf.com/resources/articles/regex.aspx#Examples  [in C#]

Get Values of a CSV (updated by Arnold Bailey)
,(?=(?:[^"]*"[^"]*")*(?![^"]*"))
In conjunction with this code:

Regex r = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
string s = "\"a\",b,\"c, d, e\",,f";
string[] sAry = r.Split(s);
for(int i=0;i < sAry.Length;i++)
{
   Console.WriteLine(sAry[i]);
}

--- --- ---

from:  http://radio.weblogs.com/0117167/2003/02/18.html#a132  [in C#]

 CSV Regex
I haven't found a Regex that will extract just the values from a comma separated value file, but thanks to Joe Gulisano, I do have one that will find all the "real" commas:

,(?=([^"]*"[^"]*")*(?![^"]*"))
So getting the values is just:

Regex r = new Regex(",(?=([^\"]*"[^"]*")*(?![^"]*"))");string s = "\"a",b,"c, d, e",,f";int start = 0;foreach (Match m in r.Matches(s)){  Console.WriteLine(s.Substring(start, m.Index - start));  start = m.Index + 1;}Console.WriteLine(s.Substring(start, s.Length - start));
Outputs:

"a"b"c, d, e"
f
0
 

Author Comment

by:yan_er04
ID: 10711016
I can't change the format of the file.

A sample data will be
$100,"$2,350",a,15/04/1998

I am using VB.NET.

Is there no other way beside regular expression?

Well, I have tried regular expression before but... let's just say I am not an expert in it. =D

0
 

Author Comment

by:yan_er04
ID: 10711034
I notice all the examples are using 1-d array.
How to change that to 2-d?
I seems to have problems adding data to a 2-d array.
0
 
LVL 12

Accepted Solution

by:
farsight earned 150 total points
ID: 10713019
Here's some VB.NET based on one of those C# examples.
ParseCSV is a test driver.
ParseLine is the routine you'd actually want to use.

    Private Shared Sub ParseCSV()
        Dim oneLine As String = _
        """quoted"",nonquoted,""comma, """"inner-quoted-comma"""", chamelean"",,after-empty"

        Dim fieldValues As String() = ParseLine(oneLine)

        For i As Integer = 0 To fieldValues.Length - 1
            Console.WriteLine(i & "  [" & fieldValues(i) & "]")
        Next
    End Sub

    Private Shared Function ParseLine(ByVal oneLine As String) As String()
        ' Returns an array containing the values of the comma-separated fields.

        ' This pattern actually recognizes the correct commas.
        ' The Regex.Split() command later gets text between the commas.
        Dim pattern As String = ",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))"
        Dim r As System.Text.RegularExpressions.Regex = _
                New System.Text.RegularExpressions.Regex(pattern)

        Return r.Split(oneLine)
    End Function

Correct output:
0  ["quoted"]
1  [nonquoted]
2  ["comma, ""inner-quoted-comma"", chamelean"]
3  []
4  [after-empty]

---

Here's some simple multidimensional array code.
http://www.bhupeshs.com/vbMultiarray.htm

Here's another post about multidimensional arrays -- already answered.
http://www.experts-exchange.com/Programming/Programming_Languages/Visual_Basic/Q_20744300.html

Here's some good stuff, especially near the bottom.
http://www.dotnet247.com/247reference/msgs/23/116904.aspx

Here's a sample program for two-dimensional arrays.
Near the bottom, there's a link to a very good PowerPoint presentation, too.  Chapter8.ppt
http://www.okc.cc.ok.us/aphilipp/WebCT/VB/09_VB_CourseContent.htm

0
 

Expert Comment

by:alex_developer
ID: 11603448
Odbc supports reading CSV:

string ConnectionString = @"Driver={Microsoft Text Driver (*.txt; *.csv)};DBQ=c:\";
OdbcConnection connection = new OdbcConnection(ConnectionString);
connection.Open();
OdbcCommand command = new OdbcCommand("Select * FROM Subscribers.csv", connection);
OdbcDataReader reader = command.ExecuteReader();
while (reader.Read())
{
      // loop through every field
      for (int x = 0; x < reader.FieldCount; x++)
      {
                // add code for puting fields into array....
            Console.WriteLine(reader.GetDataTypeName(x) + " - " + reader.GetValue(x));
      }
}

                  // Close the connection
                  connection.Close();
0
 
LVL 1

Expert Comment

by:bswiftly
ID: 13870822
         This code takes a comma delimited xml flie and puts it into a table. The regex is slightly more efficient I think..

            Dim regEx As Regex
            Dim cMatch As Text.RegularExpressions.MatchCollection
            Dim m As Text.RegularExpressions.Match

            cMatch = regEx.Matches(line, "(?<=(?:^|,\s*)"")(?:[^""]|"""")*|(?<=(?:^|,))(?:\s*)(?=(?:,|$))")

            For Each m In cMatch
               i += 1
               dc = New DataColumn
               dc.ColumnName = IIf(m.Value.Length = 0, colPrefix & i + 1, m.Value)
               dt.Columns.Add(dc)
            Next
0

Featured Post

DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

For those of you who don't follow the news, or just happen to live under rocks, Microsoft Research released a beta SDK (http://www.microsoft.com/en-us/download/details.aspx?id=27876) for the Xbox 360 Kinect. If you don't know what a Kinect is (http:…
A long time ago (May 2011), I have written an article showing you how to create a DLL using Visual Studio 2005 to be hosted in SQL Server 2005. That was valid at that time and it is still valid if you are still using these versions. You can still re…
This Micro Tutorial will give you a basic overview how to record your screen with Microsoft Expression Encoder. This program is still free and open for the public to download. This will be demonstrated using Microsoft Expression Encoder 4.
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question