TommyTwoPints
asked on
Data missing when reading CSV using ADO.NET
Hi Experts,
Below is my function that returns a data table containing the contents of a CSV file. As this CSV file isnt actually record based the field types are mixed. I have used IMEX=1 but data is still missing.
Can anyone tell me off the top of their head what I am doing wrong?
Thanks
Tom
Below is my function that returns a data table containing the contents of a CSV file. As this CSV file isnt actually record based the field types are mixed. I have used IMEX=1 but data is still missing.
Can anyone tell me off the top of their head what I am doing wrong?
Thanks
Tom
Private Function GetDataADO(ByVal Path As String) As DataTable
Dim Filename As String = IO.Path.GetFileName(Path)
Dim FilePath As String = IO.Path.GetDirectoryName(Path)
Dim ConnectionString As String = "Provider=Microsoft.Jet.OLEDB.4.0;Extended Properties= ""Text;HDR=No;IMEX=1;"" ; Data Source=" & FilePath
Dim Connection As New OleDb.OleDbConnection(ConnectionString)
Connection.Open()
Dim Command As New OleDb.OleDbCommand
Command.CommandType = CommandType.Text
Command.Connection = Connection
Command.CommandText = "SELECT * FROM [" & Filename & "]"
Dim DataTable As New DataTable
Dim DataAdapter As New OleDb.OleDbDataAdapter(Command)
DataAdapter.Fill(DataTable)
Connection.Close()
Connection.Dispose()
Return DataTable
End Function
or maybe this
Public Function ReadCSVFileToDataTable(ByVal FilePath As String, ByVal ColumnNames() As String) As DataTable
Dim myDataTable As New DataTable
Dim r As New RegularExpressions.Regex(ControlChars.CrLf)
Dim myDataCol As DataColumn
If String.IsNullOrEmpty(FilePath) Then Return Nothing
'Create the columns
For i As Integer = 0 To ColumnNames.Length - 1
myDataCol = New DataColumn(ColumnNames(i), GetType(String))
myDataTable.Columns.Add(myDataCol)
Next
Dim lines() As String
Try
'// Create a new reader
Dim sr As New StreamReader(FilePath, Encoding.Default, True)
'// Read file
Dim data As String = sr.ReadToEnd
'// Close file
sr.Close()
'// Split up the data
lines = r.Split(data)
Catch ex As Exception
Throw
End Try
Dim values() As String
Dim row As DataRow
For Each line As String In lines
values = line.Split(";"c)
row = myDataTable.NewRow
Try
If Not line = String.Empty Then
For i As Integer = 0 To myDataTable.Columns.Count - 1
row(i) = values(i).ToString
Next
myDataTable.Rows.Add(row)
End If
Catch ex As Exception
Throw
End Try
Next
Return myDataTable
End Function
Here you got a function for ADO to that I found
Private Sub cmdLoad_Click()
Dim Cn1 As ADODB.Connection
Dim Rs1 As ADODB.Recordset
Dim iSQLStr As String
Dim field_num As Integer
Set Cn1 = New ADODB.Connection
Cn1.ConnectionString = _
"Driver={Microsoft Text Driver (*.txt; *.csv)};" & _
"DefaultDir=" & txtDir.Text
Cn1.Open
lstResults.Visible = False
DoEvents
iSQLStr = "Select * FROM " & txtFile.Text & _
" ORDER BY " & txtField.Text
field_num = CInt(txtField.Text) - 1
Set Rs1 = Cn1.Execute(iSQLStr)
lstResults.Clear
While Not Rs1.EOF
If IsNull(Rs1.Fields(field_num).Value) Then
lstResults.AddItem "<null>"
Else
lstResults.AddItem Rs1.Fields(field_num).Value
End If
Rs1.MoveNext
Wend
lstResults.Visible = True
End Sub
I have these notes related to this problem. Doing the registry change may fix things for you...
The problem is that the driver uses the first 8 rows to determine the type of a column.
Use the OLE/JET provided instead of the ODBC driver (even though both have the same problem)
Make sure the connection string has///
Extended Properties="Excel 8.0;IMEX=1;"
The IMEX=1 is the important bit for this bug fix. It makes it default to text over numbers
Then, alter the registry:
HKEY_LOCAL_MACHINE\SOFTWAR E\Microsof t\Jet\3.5\ Engines\Ex cel
or on my machine it was
HKEY_LOCAL_MACHINE\SOFTWAR E\Microsof t\Jet\4.0\ Engines\Ex cel
set TypeGuessRows to 0 . this makes it check all rows for a type.
Note: setting these values in the connection string does not work so they have to be set in this way.
The problem is that the driver uses the first 8 rows to determine the type of a column.
Use the OLE/JET provided instead of the ODBC driver (even though both have the same problem)
Make sure the connection string has///
Extended Properties="Excel 8.0;IMEX=1;"
The IMEX=1 is the important bit for this bug fix. It makes it default to text over numbers
Then, alter the registry:
HKEY_LOCAL_MACHINE\SOFTWAR
or on my machine it was
HKEY_LOCAL_MACHINE\SOFTWAR
set TypeGuessRows to 0 . this makes it check all rows for a type.
Note: setting these values in the connection string does not work so they have to be set in this way.
ASKER
AWestEng: The first 2 examples read text files from a stream; I already have a system like this but it uses REGEX. Performance is an issue on large files when done this way so I thought I would try using a DB connection. Your third example seems to use Microsoft® ActiveX® Data Objects. I think this is a com reference so i am reluctant to use it but I will give it a try.
Tiggerito: I dont want my application to change a users registry at runtime as this can only be done if the user account has rights to do so. Also when you try to use 'Excel 8.0' instead of 'Text' when reading a CSV an error occurs (External table is not in the expected format.).
Thankyou both for your suggestions
Tiggerito: I dont want my application to change a users registry at runtime as this can only be done if the user account has rights to do so. Also when you try to use 'Excel 8.0' instead of 'Text' when reading a CSV an error occurs (External table is not in the expected format.).
Thankyou both for your suggestions
ASKER
AWestEng: The third sample you provided is using an ODBC connection that I have already tested in ADO.NET; It also doesnt read all the data correctly.
Maybe I'll just stick with my Regex method and sacrafice performance for accuracy.
Maybe I'll just stick with my Regex method and sacrafice performance for accuracy.
I found this article an excelent source on writing a CSV parser.
http://www.boyet.com/Articles/CsvParser.html
With a little work you can create a consumer that parses directly into a DataTable.
This way would probably be the fastest and most reliable solution.
http://www.boyet.com/Articles/CsvParser.html
With a little work you can create a consumer that parses directly into a DataTable.
This way would probably be the fastest and most reliable solution.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thankyou both for your help. I have decided to go with AWestEng's solution.
Thx for the points :)
Open in new window