parse text file

michaelhenderson
michaelhenderson used Ask the Experts™
on
I am trying to import a comma separated text file (without headers) into SQL 2005 in Visual Basic.NET.  

After getting various column mapping errors, I created a datagrid called dgTEST to visually test the text file.  The text file should be split up into 46 columns and the database has the same 46 columns.

To my surprise, the datagrid displayed the text file in 86 columns!  I’m not sure why but I presume I need to parse the data somehow.

Here’s my code:

    Public Sub ImportTextToSQL()
        Dim myConnection As New SqlConnection(ConnectionString)
        Dim i As Long = 0
        Dim sr As StreamReader = New StreamReader(NewCopyImportFile)
        Dim line As String = sr.ReadLine()

        Dim strArray As String() = line.Split(",")
        Dim dt As DataTable = New DataTable()
        Dim row As DataRow

        For Each s As String In strArray
            dt.Columns.Add(New DataColumn())
        Next

        Do
            row = dt.NewRow()
            row.ItemArray = line.Split(",")
            dt.Rows.Add(row)
            i = i + 1
            line = sr.ReadLine()

        Loop While Not line = String.Empty

        Try
            dgTEST.DataSource = dt

            'Dim bc As SqlBulkCopy = New SqlBulkCopy(myConnection, SqlBulkCopyOptions.TableLock, Nothing)

            'bc.DestinationTableName = "dbo.WindData"
            'bc.BatchSize = dt.Rows.Count

            'Try
            '    myConnection.Open()
            '    bc.WriteToServer(dt)
            '    myConnection.Close()
            '    bc.Close()

        Catch ex As Exception
            MsgBox(ex.Message & " - ImportTextToSQL()", MsgBoxStyle.Critical, "Error")
        End Try
    End Sub

Open in new window


Does anyone know how I can do this?

Thanks.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Commented:
From what I understand, the problem is that after this lines of code:
        Dim strArray As String() = line.Split(",")
        Dim dt As DataTable = New DataTable()
        Dim row As DataRow

        For Each s As String In strArray
            dt.Columns.Add(New DataColumn())
        Next
Your datatable has 86 columns, which means that the  line.Split(",") part results in an 86 item array.

You must check that the data you import doesnot contain any extra commas. For example, if the data contains decimal numbers, maybe there are commas there that cause this error.

Assume a 2 column csv which contains coordinates. If the decimal seperator is comma, the line whould be:
23,384855,38,38485
So, instead of reading the 2 columns (23,384855 and 38,38485) you are doing an error by reading 4 (23, 384855 ,38 and 38485).

Can you please do a debug.write (line) afther the initial sr.readline and post here the output?




Dear Michael,
you are filling the gridview twice !
just rem the first line inside you try catch block:
dgTEST.DataSource = dt

Let me know...
Best Regards
Sorry,i misread your code, so ignore my previous comment.
You should set Autogeneratecolumn=false in your gridview
somethig like that:
dgTEST.AutoGenerateColumns =false
Why Diversity in Tech Matters

Kesha Williams, certified professional and software developer, explores the imbalance of diversity in the world of technology -- especially when it comes to hiring women. She showcases ways she's making a difference through the Colors of STEM program.

Author

Commented:
Hi thanks for the comments.

Yes I think the problem may lie with numbers having commas in them so an extra column is being created.  That's really what I want to know how to solve.

OK, update...

have used the debug.write to get the output and it looks like there is 86 commas.  I'm wondering if the text file schema I was given is wrong.  Here is a line of data (and there is definately a lot more than 46 columns!)

BRBK-2933,3354854,0310085,12518,100930,183357,30:04,386,105,231,180,22,21,22,45,DB,F:\oem\caq\data\2010-01_719.dat,,,,3353418,T41008-09,379,,,,,,,3353418,T41008-09,378,3353418,T41008-09,375,3353418,T41008-09,339,3353418,T41008-09,350,3353418,T41008-09,372,3353418,T41008-09,400,3353418,T41008-09,403,,,,,,,,,,,,,,,,3353419,T31006-02,121,3353419,T31006-02,116,3353419,T31006-02,148,3353419,T31006-02,154,3353419,T31006-02,156,300910,2041,0344,0,0,0

SBI Marco, I'm not sure how the datagrid is getting filled in twice?  dt is only being used once (I've commented out the lines below the dgTEST.DataSource = dt line as the bulk insert won't load while there are column differences.
 
Thanks

Author

Commented:
:-)  thanks SBI Marco
Commented:
Michael,

can't really tell if the schema you were given is wrong or if the data contains decimals. You should probably ask the text file provider for this information...

If the problem is with the decimals. then there is no way to understand which comma is a field separator and which comma is a decimal separator. Again, you should ask the text file provider for his help, either to change the decimal separator to dot ".", or to change the field separator to semicolon ";".

Author

Commented:
Hi

Thanks for the help.  I spoke to the text file provider and there are a lot of unused columns which still have entries in the file.  I have added all 86 columns to the database and named them as generic "column56, column 57, etc".

The bulk insert works now.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial