Solved

Data missing when reading CSV using ADO.NET

Posted on 2008-10-10
10
1,343 Views
Last Modified: 2013-11-07
Hi Experts,

Below is my function that returns a data table containing the contents of a CSV file. As this CSV file isnt actually record based the field types are mixed. I have used IMEX=1 but data is still missing.

Can anyone tell me off the top of their head what I am doing wrong?

Thanks

Tom
Private Function GetDataADO(ByVal Path As String) As DataTable
	Dim Filename As String = IO.Path.GetFileName(Path)
	Dim FilePath As String = IO.Path.GetDirectoryName(Path)
	Dim ConnectionString As String = "Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties= ""Text;HDR=No;IMEX=1;"" ; Data Source=" & FilePath
	Dim Connection As New OleDb.OleDbConnection(ConnectionString)
	Connection.Open()
	Dim Command As New OleDb.OleDbCommand
	Command.CommandType = CommandType.Text
	Command.Connection = Connection
	Command.CommandText = "SELECT * FROM  [" & Filename & "]"
	Dim DataTable As New DataTable
	Dim DataAdapter As New OleDb.OleDbDataAdapter(Command)
	DataAdapter.Fill(DataTable)
	Connection.Close()
	Connection.Dispose()
	Return DataTable
End Function

Open in new window

0
Comment
Question by:TommyTwoPints
  • 5
  • 3
  • 2
10 Comments
 
LVL 1

Expert Comment

by:AWestEng
ID: 22694508
I'm not sure why your code dosen't work but here is another example how to read a CSV file, it maybe help you.
Public Function ReadCSVFile(ByVal strDataSource As String, _
           ByVal delimiter As String, _
           ByVal IncludeFieldName As Boolean, _
           Optional ByVal StartAtLine As Integer = 0, _
           Optional ByVal StopAtLine As Integer = 0) _
           As DataTable
 
    '*************************************************************************************
    ' strDataSource     : Filepath to the file thats going to be read
    '                   : Value: Sting
    ' StartAtLine       : The row that the reader is going to start at in the file
    '                   : Value:  1 or 2
    ' ShowData          : If a dataGrid exist can the file  be display in this 
    '                   : Value:  True or False
    ' delimiter         : The delimiter for the file reader
    '                   : Value: ";" or ","
    ' IncludeFieldName  : The first row are fieldnames include this or nor
    '                   : Value: True or False
    ' StopAtLine        : The row that the reader is going to end at if StopAtLine <> 0
    '*************************************************************************************
 
    Dim intColCntr As Integer 'Counter for number of columns in the text file. 
    Dim intRowCntr As Integer 'Counter row to be added. 
 
    'stopatline is 0 then read all rows 
    If StopAtLine = 0 Then
        StopAtLine = Int32.MaxValue
    End If
 
    Dim arrSQLCols() As String
    Dim myDataTable As New DataTable
    Dim myDataCol As New DataColumn
    Dim myDataRow As DataRow
    Dim strLine As String
    Dim sr As New StreamReader(strDataSource)
    Dim loopctr As Integer = 0
 
    Dim fi As New FileInfo(strDataSource)
    Dim filelength As Long = fi.Length
    Dim bytesread As Long = 0
 
    Try
        While sr.Peek <> -1
            strLine = sr.ReadLine
 
            'Search for qualifier character. 
            If strLine.Substring(0, 1) = """" Or strLine.Substring(0, 1) = "'" Then
                strLine = Strings.Replace(strLine, strLine.Substring(0, 1), "", 1, -1)
            End If
 
            Dim arrColName() As String = strLine.Split(delimiter)
            If loopctr = 0 Then
                'Check to include first row as field name. 
                If IncludeFieldName = True Then
                    'Add columns to the table. 
                    For intColCntr = 0 To arrColName.Length - 1
                        myDataCol = New DataColumn(arrColName(intColCntr))
                        myDataCol.AutoIncrement = True
                        myDataCol.DataType = Type.GetType("System.String")
                        myDataTable.Columns.Add(myDataCol)
                        arrSQLCols = arrColName
                    Next
                Else
                    For intColCntr = 0 To arrColName.Length - 1
                        myDataCol = New DataColumn("Field" & " " & intColCntr)
                        myDataCol.AutoIncrement = True
                        myDataCol.DataType = Type.GetType("System.String")
                        myDataTable.Columns.Add(myDataCol)
                    Next
                End If
            Else
                If loopctr >= StartAtLine AndAlso loopctr <= StopAtLine Then
                    'Add rows to the table. 
                    Dim arrRow() As String = strLine.Split(delimiter)
                    myDataRow = myDataTable.NewRow
                    For intRowCntr = 0 To arrColName.Length - 1
                        myDataRow(intRowCntr) = arrRow(intRowCntr)
                    Next
                    myDataTable.Rows.Add(myDataRow)
                End If
            End If
            loopctr += 1
        End While
        sr.Close()
 
 
        Return myDataTable
 
    Catch ex As Exception
        Throw
    Finally
        myDataTable = Nothing
        myDataCol = Nothing
        System.GC.Collect()
    End Try
End Function

Open in new window

0
 
LVL 1

Expert Comment

by:AWestEng
ID: 22694518
or maybe this
Public Function ReadCSVFileToDataTable(ByVal FilePath As String, ByVal ColumnNames() As String) As DataTable
    Dim myDataTable As New DataTable
    Dim r As New RegularExpressions.Regex(ControlChars.CrLf)
    Dim myDataCol As DataColumn
 
    If String.IsNullOrEmpty(FilePath) Then Return Nothing
 
    'Create the columns
    For i As Integer = 0 To ColumnNames.Length - 1
        myDataCol = New DataColumn(ColumnNames(i), GetType(String))
        myDataTable.Columns.Add(myDataCol)
    Next
 
    Dim lines() As String
 
    Try
        '// Create a new reader
        Dim sr As New StreamReader(FilePath, Encoding.Default, True)
 
        '// Read file
        Dim data As String = sr.ReadToEnd
 
        '// Close file
        sr.Close()
 
        '// Split up the data
        lines = r.Split(data)
 
    Catch ex As Exception
        Throw
    End Try
 
    Dim values() As String
    Dim row As DataRow
 
    For Each line As String In lines
 
        values = line.Split(";"c)
        row = myDataTable.NewRow
 
        Try
            If Not line = String.Empty Then
                For i As Integer = 0 To myDataTable.Columns.Count - 1
                    row(i) = values(i).ToString
                Next
                myDataTable.Rows.Add(row)
            End If
        Catch ex As Exception
            Throw
        End Try
    Next
 
    Return myDataTable
End Function

Open in new window

0
 
LVL 1

Expert Comment

by:AWestEng
ID: 22695675
Here you got a function for ADO to that I found
Private Sub cmdLoad_Click()
Dim Cn1 As ADODB.Connection
Dim Rs1 As ADODB.Recordset
Dim iSQLStr As String
Dim field_num As Integer
 
    Set Cn1 = New ADODB.Connection
    Cn1.ConnectionString = _
        "Driver={Microsoft Text Driver (*.txt; *.csv)};" & _
        "DefaultDir=" & txtDir.Text
    Cn1.Open
 
    lstResults.Visible = False
    DoEvents
 
    iSQLStr = "Select * FROM " & txtFile.Text & _
        " ORDER BY " & txtField.Text
    field_num = CInt(txtField.Text) - 1
    Set Rs1 = Cn1.Execute(iSQLStr)
    lstResults.Clear
    While Not Rs1.EOF
        If IsNull(Rs1.Fields(field_num).Value) Then
            lstResults.AddItem "<null>"
        Else
            lstResults.AddItem Rs1.Fields(field_num).Value
        End If
        Rs1.MoveNext
    Wend
 
    lstResults.Visible = True
End Sub

Open in new window

0
Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

 
LVL 23

Expert Comment

by:Tiggerito
ID: 22696310
I have these notes related to this problem. Doing the registry change may fix things for you...

The problem is that the driver uses the first 8 rows to determine the type of a column.

Use the OLE/JET provided instead of the ODBC driver (even though both have the same problem)
 
Make sure the connection string has///
Extended Properties="Excel 8.0;IMEX=1;"

The IMEX=1 is the important bit for this bug fix. It makes it default to text over numbers

Then, alter the registry:

 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\3.5\Engines\Excel
or on my machine it was
 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel

set TypeGuessRows to 0 . this makes it check all rows for a type.

Note: setting these values in the connection string does not work so they have to be set in this way.

0
 
LVL 3

Author Comment

by:TommyTwoPints
ID: 22700451
AWestEng: The first 2 examples read text files from a stream; I already have a system like this but it uses REGEX. Performance is an issue on large files when done this way so I thought I would try using a DB connection. Your third example seems to use Microsoft® ActiveX® Data Objects. I think this is a com reference so i am reluctant to use it but I will give it a try.
Tiggerito: I dont want my application to change a users registry at runtime as this can only be done if the user account has rights to do so. Also when you try to use 'Excel 8.0' instead of 'Text' when reading a CSV an error occurs (External table is not in the expected format.).
 Thankyou both for your suggestions
0
 
LVL 3

Author Comment

by:TommyTwoPints
ID: 22700558
AWestEng: The third sample you provided is using an ODBC connection that I have already tested in ADO.NET; It also doesnt read all the data correctly.
Maybe I'll just stick with my Regex method and sacrafice performance for accuracy.
0
 
LVL 23

Expert Comment

by:Tiggerito
ID: 22700628
I found this article an excelent source on writing a CSV parser.

http://www.boyet.com/Articles/CsvParser.html

With a little work you can create a consumer that parses directly into a DataTable.

This way would probably be the fastest and most reliable solution.
0
 
LVL 1

Accepted Solution

by:
AWestEng earned 500 total points
ID: 22700956
Here you have a really fast CSV reader.
http://www.codeproject.com/KB/database/CsvReader.aspx
If you have problmes with the C# code just let me know .
C# to VB.net translater
http://www.developerfusion.com/tools/convert/csharp-to-vb/
 
0
 
LVL 3

Author Comment

by:TommyTwoPints
ID: 22739164
Thankyou both for your help. I have decided to go with AWestEng's solution.
0
 
LVL 1

Expert Comment

by:AWestEng
ID: 22739272
Thx for the points :)
0

Featured Post

3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Parsing a CSV file is a task that we are confronted with regularly, and although there are a vast number of means to do this, as a newbie, the field can be confusing and the tools can seem complex. A simple solution to parsing a customized CSV fi…
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
This video shows how to quickly and easily add an email signature for all users on Exchange 2016. The resulting signature is applied on a server level by Exchange Online. The email signature template has been downloaded from: www.mail-signatures…
Email security requires an ever evolving service that stays up to date with counter-evolving threats. The Email Laundry perform Research and Development to ensure their email security service evolves faster than cyber criminals. We apply our Threat…

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question