Link to home
Start Free TrialLog in
Avatar of TommyTwoPints
TommyTwoPintsFlag for Thailand

asked on

Data missing when reading CSV using ADO.NET

Hi Experts,

Below is my function that returns a data table containing the contents of a CSV file. As this CSV file isnt actually record based the field types are mixed. I have used IMEX=1 but data is still missing.

Can anyone tell me off the top of their head what I am doing wrong?

Thanks

Tom
Private Function GetDataADO(ByVal Path As String) As DataTable
	Dim Filename As String = IO.Path.GetFileName(Path)
	Dim FilePath As String = IO.Path.GetDirectoryName(Path)
	Dim ConnectionString As String = "Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties= ""Text;HDR=No;IMEX=1;"" ; Data Source=" & FilePath
	Dim Connection As New OleDb.OleDbConnection(ConnectionString)
	Connection.Open()
	Dim Command As New OleDb.OleDbCommand
	Command.CommandType = CommandType.Text
	Command.Connection = Connection
	Command.CommandText = "SELECT * FROM  [" & Filename & "]"
	Dim DataTable As New DataTable
	Dim DataAdapter As New OleDb.OleDbDataAdapter(Command)
	DataAdapter.Fill(DataTable)
	Connection.Close()
	Connection.Dispose()
	Return DataTable
End Function

Open in new window

Avatar of AWestEng
AWestEng
Flag of Sweden image

I'm not sure why your code dosen't work but here is another example how to read a CSV file, it maybe help you.
Public Function ReadCSVFile(ByVal strDataSource As String, _
           ByVal delimiter As String, _
           ByVal IncludeFieldName As Boolean, _
           Optional ByVal StartAtLine As Integer = 0, _
           Optional ByVal StopAtLine As Integer = 0) _
           As DataTable
 
    '*************************************************************************************
    ' strDataSource     : Filepath to the file thats going to be read
    '                   : Value: Sting
    ' StartAtLine       : The row that the reader is going to start at in the file
    '                   : Value:  1 or 2
    ' ShowData          : If a dataGrid exist can the file  be display in this 
    '                   : Value:  True or False
    ' delimiter         : The delimiter for the file reader
    '                   : Value: ";" or ","
    ' IncludeFieldName  : The first row are fieldnames include this or nor
    '                   : Value: True or False
    ' StopAtLine        : The row that the reader is going to end at if StopAtLine <> 0
    '*************************************************************************************
 
    Dim intColCntr As Integer 'Counter for number of columns in the text file. 
    Dim intRowCntr As Integer 'Counter row to be added. 
 
    'stopatline is 0 then read all rows 
    If StopAtLine = 0 Then
        StopAtLine = Int32.MaxValue
    End If
 
    Dim arrSQLCols() As String
    Dim myDataTable As New DataTable
    Dim myDataCol As New DataColumn
    Dim myDataRow As DataRow
    Dim strLine As String
    Dim sr As New StreamReader(strDataSource)
    Dim loopctr As Integer = 0
 
    Dim fi As New FileInfo(strDataSource)
    Dim filelength As Long = fi.Length
    Dim bytesread As Long = 0
 
    Try
        While sr.Peek <> -1
            strLine = sr.ReadLine
 
            'Search for qualifier character. 
            If strLine.Substring(0, 1) = """" Or strLine.Substring(0, 1) = "'" Then
                strLine = Strings.Replace(strLine, strLine.Substring(0, 1), "", 1, -1)
            End If
 
            Dim arrColName() As String = strLine.Split(delimiter)
            If loopctr = 0 Then
                'Check to include first row as field name. 
                If IncludeFieldName = True Then
                    'Add columns to the table. 
                    For intColCntr = 0 To arrColName.Length - 1
                        myDataCol = New DataColumn(arrColName(intColCntr))
                        myDataCol.AutoIncrement = True
                        myDataCol.DataType = Type.GetType("System.String")
                        myDataTable.Columns.Add(myDataCol)
                        arrSQLCols = arrColName
                    Next
                Else
                    For intColCntr = 0 To arrColName.Length - 1
                        myDataCol = New DataColumn("Field" & " " & intColCntr)
                        myDataCol.AutoIncrement = True
                        myDataCol.DataType = Type.GetType("System.String")
                        myDataTable.Columns.Add(myDataCol)
                    Next
                End If
            Else
                If loopctr >= StartAtLine AndAlso loopctr <= StopAtLine Then
                    'Add rows to the table. 
                    Dim arrRow() As String = strLine.Split(delimiter)
                    myDataRow = myDataTable.NewRow
                    For intRowCntr = 0 To arrColName.Length - 1
                        myDataRow(intRowCntr) = arrRow(intRowCntr)
                    Next
                    myDataTable.Rows.Add(myDataRow)
                End If
            End If
            loopctr += 1
        End While
        sr.Close()
 
 
        Return myDataTable
 
    Catch ex As Exception
        Throw
    Finally
        myDataTable = Nothing
        myDataCol = Nothing
        System.GC.Collect()
    End Try
End Function

Open in new window

or maybe this
Public Function ReadCSVFileToDataTable(ByVal FilePath As String, ByVal ColumnNames() As String) As DataTable
    Dim myDataTable As New DataTable
    Dim r As New RegularExpressions.Regex(ControlChars.CrLf)
    Dim myDataCol As DataColumn
 
    If String.IsNullOrEmpty(FilePath) Then Return Nothing
 
    'Create the columns
    For i As Integer = 0 To ColumnNames.Length - 1
        myDataCol = New DataColumn(ColumnNames(i), GetType(String))
        myDataTable.Columns.Add(myDataCol)
    Next
 
    Dim lines() As String
 
    Try
        '// Create a new reader
        Dim sr As New StreamReader(FilePath, Encoding.Default, True)
 
        '// Read file
        Dim data As String = sr.ReadToEnd
 
        '// Close file
        sr.Close()
 
        '// Split up the data
        lines = r.Split(data)
 
    Catch ex As Exception
        Throw
    End Try
 
    Dim values() As String
    Dim row As DataRow
 
    For Each line As String In lines
 
        values = line.Split(";"c)
        row = myDataTable.NewRow
 
        Try
            If Not line = String.Empty Then
                For i As Integer = 0 To myDataTable.Columns.Count - 1
                    row(i) = values(i).ToString
                Next
                myDataTable.Rows.Add(row)
            End If
        Catch ex As Exception
            Throw
        End Try
    Next
 
    Return myDataTable
End Function

Open in new window

Here you got a function for ADO to that I found
Private Sub cmdLoad_Click()
Dim Cn1 As ADODB.Connection
Dim Rs1 As ADODB.Recordset
Dim iSQLStr As String
Dim field_num As Integer
 
    Set Cn1 = New ADODB.Connection
    Cn1.ConnectionString = _
        "Driver={Microsoft Text Driver (*.txt; *.csv)};" & _
        "DefaultDir=" & txtDir.Text
    Cn1.Open
 
    lstResults.Visible = False
    DoEvents
 
    iSQLStr = "Select * FROM " & txtFile.Text & _
        " ORDER BY " & txtField.Text
    field_num = CInt(txtField.Text) - 1
    Set Rs1 = Cn1.Execute(iSQLStr)
    lstResults.Clear
    While Not Rs1.EOF
        If IsNull(Rs1.Fields(field_num).Value) Then
            lstResults.AddItem "<null>"
        Else
            lstResults.AddItem Rs1.Fields(field_num).Value
        End If
        Rs1.MoveNext
    Wend
 
    lstResults.Visible = True
End Sub

Open in new window

I have these notes related to this problem. Doing the registry change may fix things for you...

The problem is that the driver uses the first 8 rows to determine the type of a column.

Use the OLE/JET provided instead of the ODBC driver (even though both have the same problem)
 
Make sure the connection string has///
Extended Properties="Excel 8.0;IMEX=1;"

The IMEX=1 is the important bit for this bug fix. It makes it default to text over numbers

Then, alter the registry:

 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\3.5\Engines\Excel
or on my machine it was
 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel

set TypeGuessRows to 0 . this makes it check all rows for a type.

Note: setting these values in the connection string does not work so they have to be set in this way.

Avatar of TommyTwoPints

ASKER

AWestEng: The first 2 examples read text files from a stream; I already have a system like this but it uses REGEX. Performance is an issue on large files when done this way so I thought I would try using a DB connection. Your third example seems to use Microsoft® ActiveX® Data Objects. I think this is a com reference so i am reluctant to use it but I will give it a try.
Tiggerito: I dont want my application to change a users registry at runtime as this can only be done if the user account has rights to do so. Also when you try to use 'Excel 8.0' instead of 'Text' when reading a CSV an error occurs (External table is not in the expected format.).
 Thankyou both for your suggestions
AWestEng: The third sample you provided is using an ODBC connection that I have already tested in ADO.NET; It also doesnt read all the data correctly.
Maybe I'll just stick with my Regex method and sacrafice performance for accuracy.
I found this article an excelent source on writing a CSV parser.

http://www.boyet.com/Articles/CsvParser.html

With a little work you can create a consumer that parses directly into a DataTable.

This way would probably be the fastest and most reliable solution.
ASKER CERTIFIED SOLUTION
Avatar of AWestEng
AWestEng
Flag of Sweden image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thankyou both for your help. I have decided to go with AWestEng's solution.
Thx for the points :)