Avatar of pfcs_sql_admin

asked on 

How to input from text/csv and ignore formatting?

I apologize for the length of the description.

I have a comma-delimited text file (csv) with upwards of 250 columns in it.  In fact, it can have either 185 or 250!  Furthermore, within this comma separated file, some of the values have commas in it (and no quotes or anything to offset).  Nice huh?  The file comes from a vendor, so that part is out of my hands.

Anyway, the purpose of the utility is to import the file and break the lines into two other files.  The utility doesn't do anything to the file itself and there is no user interaction other than to select the file.  All it does is bring it in, cycle through the lines, and put some of the lines in File A, and some of the in File B.

What I have done is to import the file using an oledb connection to read it in directly, and then .Fill a dataset/datatable.  The default view of that table is used as the source for a hidden datagridview.  The datagridview is used so that I can strip the commas out.  For some reason, reading the csv file in with the oledb breaks it up correctly, allowing me to strip the commas out of the offending values.

However ... the REAL problem began when I realized what the .Fill was doing.  The file contains huge account numbers.  And the moment the account number hits the datatable from the .Fill method, it auto-converts to scientific notation!  I have since modified my code to uncovert them on the way out.  Which lead me to discover a new problem - that since the datatable is created using the values it starts with, later on there is data that doesn't fit the format - so that data is outright ignored and is not imported!

So - here is my question - how can I import from a text file and *ignore* data types? (or set them all to one type)  I merely want to import everything as string columns!  When I tried creating a table with string columns, all it did was append columns to the table (instead of 250, I had 250 strings (empty) and 250 import cols).

How can this be done?  There HAS to be a way!

Visual Basic.NETASP.NET

Avatar of undefined
Last Comment
Christopher Kile
Avatar of Christopher Kile
Christopher Kile
Flag of United States of America image

Create an XML scheme for your data, then initialize the data set with tables by reading the schema (without data) into the database.  Make sure that each element of your scheme that represents a field is of type xs:string.  Make sure that any field that may NOT appear in the output has a minOccurrence of zero.

I've attached example code that shows how to load the schema.  I'll be happy to help you build your schema, if necessary.
Private Sub ReadSchemaFromXmlTextReader()
    ' Create the DataSet to read the schema into.
    Dim thisDataSet As New DataSet()
    ' Set the file path and name. Modify this for your purposes.
    Dim filename As String = "mySchema.xml"
    ' Create a FileStream object with the file path and name.
    Dim myFileStream As New System.IO.FileStream _
       (filename, System.IO.FileMode.Open)
    ' Create a new XmlTextReader object with the FileStream.
    Dim myXmlTextReader As New System.Xml.XmlTextReader(myFileStream)
    ' Read the schema into the DataSet and close the reader.
End Sub

Open in new window

Avatar of pfcs_sql_admin


I'd love the opportunity to learn a bit about XML, as I don't know much about it.  :/

I see from your example (thanks!) how you are retrieving the schema, and applying it to the dataset, but I don't see how a table is being created from it.  Or is that automatic?

Also, how do I fill the table from the text data and use the schema?  Because when I tried to fill an already-existing table before, all it did was append all the new columns to the existing ones, instead of populating them.

Here is a sample app that uses a schema to make  with a delimiter of your choice.
This uses a access db and view in a listview screen.
Imports System.IO
Imports System.Data.OleDb
Public Class frmAddTable
    Inherits System.Windows.Forms.Form
    Dim strFlds As String
    Dim strPath As String = Application.StartupPath
    Dim strFile As String = "Test.txt"      '<---- Input file
    Dim sSQL As String
    Dim sPathtoMDB As String = strPath & "\Temp.mdb"     '<----Make a temp database
    ' Dim Con1 As New OleDb.OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0; Data Source=" & strPath & "\" & strFile & ";")
    Dim Con1 As New OleDb.OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0; Data Source=" & sPathtoMDB & ";")
#Region " Windows Form Designer generated code "
    Public Sub New()
        'This call is required by the Windows Form Designer.
        'Add any initialization after the InitializeComponent() call
    End Sub
    'Form overrides dispose to clean up the component list.
    Protected Overloads Overrides Sub Dispose(ByVal disposing As Boolean)
        If disposing Then
            If Not (components Is Nothing) Then
            End If
        End If
    End Sub
    'Required by the Windows Form Designer
    Private components As System.ComponentModel.IContainer
    'NOTE: The following procedure is required by the Windows Form Designer
    'It can be modified using the Windows Form Designer.  
    'Do not modify it using the code editor.
    Friend WithEvents Button1 As System.Windows.Forms.Button
    Friend WithEvents Button2 As System.Windows.Forms.Button
    Friend WithEvents ListView1 As System.Windows.Forms.ListView
    Friend WithEvents Button3 As System.Windows.Forms.Button
    <System.Diagnostics.DebuggerStepThrough()> Private Sub InitializeComponent()
        Me.Button1 = New System.Windows.Forms.Button
        Me.Button2 = New System.Windows.Forms.Button
        Me.ListView1 = New System.Windows.Forms.ListView
        Me.Button3 = New System.Windows.Forms.Button
        Me.Button1.Location = New System.Drawing.Point(20, 144)
        Me.Button1.Name = "Button1"
        Me.Button1.Size = New System.Drawing.Size(112, 23)
        Me.Button1.TabIndex = 0
        Me.Button1.Text = "1. CreateSchema"
        Me.Button2.Location = New System.Drawing.Point(144, 144)
        Me.Button2.Name = "Button2"
        Me.Button2.Size = New System.Drawing.Size(120, 23)
        Me.Button2.TabIndex = 1
        Me.Button2.Text = "2.  Add Table"
        Me.ListView1.Location = New System.Drawing.Point(20, 8)
        Me.ListView1.Name = "ListView1"
        Me.ListView1.Size = New System.Drawing.Size(452, 108)
        Me.ListView1.TabIndex = 2
        Me.Button3.Location = New System.Drawing.Point(288, 144)
        Me.Button3.Name = "Button3"
        Me.Button3.Size = New System.Drawing.Size(152, 23)
        Me.Button3.TabIndex = 3
        Me.Button3.Text = "3.  Read to ListView"
        Me.AutoScaleBaseSize = New System.Drawing.Size(5, 13)
        Me.ClientSize = New System.Drawing.Size(524, 185)
        Me.Name = "frmAddTable"
        Me.Text = "Form1"
    End Sub
#End Region
#Region "Click Events"
    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        CreateSchema("Test", 4)
        ' CreateSchema("Test", 4, "", "CustName,CustNumber,CustPhone,Product")
        'The name of the table is assumed to be the same name as the file,
        'and all files are assumed to be in the same path (which is how all my data is set up).
        MessageBox.Show("Create Schema Done")
    End Sub
    Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
    End Sub
    Private Sub Button3_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button3.Click
    End Sub
#End Region
#Region "Make Data"
    Private Sub CreateSchema(ByVal strTable As String, ByVal intDesc As Integer, Optional ByVal strDlmt As String = "", Optional ByVal strHdrs As String = "")
        Dim sr As StreamReader, sw As StreamWriter
        Dim strPath As String = Application.StartupPath
        Dim strFile As String = strTable & ".txt"
        'Dim strFile As String = strTable & ".csv"
        Dim x As Integer
        Dim arrHdrs As String()
        'Read the headers from the text file, if none were specified...
        If strHdrs = "" Then
            sr = New StreamReader(strPath & "\" & strFile)
            If strDlmt = "" Then
                arrHdrs = Split(sr.ReadLine, vbTab)
                arrHdrs = Split(sr.ReadLine, strDlmt)
            End If
            sr = Nothing
            arrHdrs = Split(strHdrs, ",")
        End If
        ' create the Schema.ini file and write the appropriate lines to it...
        sw = New StreamWriter(strPath & "\" & "Schema.ini", False)
        sw.WriteLine("[" & strFile & "]")
        If strHdrs = "" Then
            sw.WriteLine("ColNameHeader=False")   'True
            sw.WriteLine("ColNameHeader=True")    'False
        End If
        If strDlmt = "" Then
            sw.WriteLine("Format=Delimited(" & strDlmt & ")")
        End If
        ' loop through the headers and create a Col line in the schema.ini file including the data type
        ' also, add the headers to a string variable which will be used to create the file in the DB (using SQL)
        For x = 0 To UBound(arrHdrs)
            If arrHdrs(x) = "" And x = UBound(arrHdrs) Then
                Exit For
            ElseIf arrHdrs(x) = "" Then
                arrHdrs(x) = "F" & x
            End If
            If x < intDesc Then
                'USE this structure if you know the field length you want
                'Select Case x
                '    Case 0 'Col1=DirPermissions Text Width 10
                '        sw.WriteLine("Col1=DirPermissions Text Width 10")
                '    Case 1 'Col2=Users          Text Width 20
                '        sw.WriteLine("Col2=Users          Text Width 20")
                '    Case 2 'Col3=FileSize       Text Width 12
                '        sw.WriteLine("Col3=FileSize       Text Width 12")
                '    Case 3 'Col4=DateTime       Text Width 12
                '        sw.WriteLine("Col4=DateTime       Text Width 12")
                '    Case 4 'Col5=FileName       Text Width 30
                '        sw.WriteLine("Col5=FileName       Text Width 30")
                '    Case 5 'Col6=DirLevel       Text Width 5 
                '        sw.WriteLine("Col6=DirLevel       Text Width 5")
                'End Select
                'strFlds += "[" & Trim(arrHdrs(x)) & "] char(20), "
                'sw.WriteLine("Col" & x + 1 & "=""" & Trim(arrHdrs(x)) & """ Text Width 20")
                'USE this structure if you do NOT know the field length adds length 10
                strFlds += "[" & Trim(arrHdrs(x)) & "] char(" & Trim(arrHdrs(x).Length) + 10 & "), "
                sw.WriteLine("Col" & x + 1 & "=""" & Trim(arrHdrs(x)) & """ Text Width " & Trim(arrHdrs(x)).Length + 10)
                strFlds += "[" & Trim(arrHdrs(x)) & "] Single, "
                sw.WriteLine("Col" & x + 1 & "=""" & Trim(arrHdrs(x)) & """ Decimal")
            End If
        strFlds = "(" & strFlds.Substring(0, strFlds.Length - 2) & ")"
        sw = Nothing
    End Sub
    Private Sub AddTable(ByVal strTable As String)
            ' delete the old table
            RunSQL("DROP TABLE [" & strTable & "]")
            ' create the new table (using the string variable)
            RunSQL("CREATE TABLE [" & strTable & "] " & strFlds)
            ' insert the contents of the text file into the table
            ' (the insert will fail if the Schema.ini file is not
            ' present in the same directory as the text file)
            RunSQL("INSERT INTO [" & strTable & "] SELECT * FROM [Text;Database=" & strPath & "].[" & strFile & "]")
            MessageBox.Show("Added Table")
        Catch exp As Exception
            MsgBox(exp.Message, MsgBoxStyle.OKOnly)
        End Try
    End Sub
    Private Sub RunSQL(ByVal strSQL As String)
        Dim Cmd1 = New OleDb.OleDbCommand(strSQL, Con1)
        Catch e As OleDb.OleDbException
            Dim errorMessages As String
            Dim i As Integer
            For i = 0 To e.Errors.Count - 1
                errorMessages += "Index #" & i.ToString() & ControlChars.Cr _
                               & "Message: " & e.Errors(i).Message & ControlChars.Cr _
                               & "NativeError: " & e.Errors(i).NativeError & ControlChars.Cr _
                               & "Source: " & e.Errors(i).Source & ControlChars.Cr _
                               & "SQLState: " & e.Errors(i).SQLState & ControlChars.Cr
            Next i
            Dim log As System.Diagnostics.EventLog = New System.Diagnostics.EventLog
            log.Source = "My Application"
            MessageBox.Show("An exception occurred. Please contact your system administrator.")
        Catch exp As Exception
        End Try
    End Sub
    Private Sub ReadTextTODataset()
        'this method uses the connection syntax and a Schema.ini file to control the format of the text file
        'Dim oCon As System.Data.OleDb.OleDbConnection = New System.Data.OleDb.OleDbConnection( _
        ' "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strPath & ";" & _
        ' "Extended Properties='Text;HDR=NO;FMT=FixedLength'")
        'this method uses a Schema.ini file to control the format of the text file
        Dim oCon As System.Data.OleDb.OleDbConnection = New System.Data.OleDb.OleDbConnection( _
             "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strPath & ";" & _
             "Extended Properties='Text;'")
            ADOAdapter = New OleDb.OleDbDataAdapter("select distinct * from " & strFile, oCon)
            ADOAdapter.AcceptChangesDuringFill = False
        Catch Exp As Exception
            MsgBox(Exp.Message, MsgBoxStyle.Critical, "General Error")
        End Try
    End Sub
    Private Sub PopulateListFromDS(ByVal DS As DataSet)
        Dim s As String
        Dim i As Integer
        With ListView1
            .Columns.Add("Name", 80, HorizontalAlignment.Right)
            .Columns.Add("Number", 50, HorizontalAlignment.Left)
            .Columns.Add("Phone", 80, HorizontalAlignment.Right)
            .Columns.Add("Product", 90, HorizontalAlignment.Left)
        End With
        For Each myDataRow In DS.Tables(0).Rows
            With ListView1
                iCounter += 1
            End With
        ListView1.View = View.Details
    End Sub
#End Region
    Private dt As DataTable
    'Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    '    dt = DS.Tables.Add("MyData")
    '    dt.Columns.Add("Date", "".GetType)
    '    dt.Columns.Add("Num1", CInt("0").GetType)
    '    dt.Columns.Add("Num2", CInt("0").GetType)
    '    dt.Columns.Add("Num3", CInt("0").GetType)
    '    dt.Columns.Add("Num4", CInt("0").GetType)
    '    dt.Columns.Add("Num5", CInt("0").GetType)
    '    dt.Columns.Add("Num6", CInt("0").GetType)
    '    dt.Columns.Add("Num7", CInt("0").GetType)
    '    dt.Columns.Add("Num8", CInt("0").GetType)
    '    dt.Columns.Add("Num9", CInt("0").GetType)
    '    Dim number As Integer
    '    Dim name As String
    '    Dim sr As New System.IO.StreamReader(strPath & "\someFile.txt")
    '    Dim inputLine As String = sr.ReadLine
    '    While Not (inputLine Is Nothing)
    '        number = Integer.Parse(inputLine.Substring(0, 6))
    '        name = inputLine.Substring(6)
    '        dt.Rows.Add(New Object() {number, name})
    '        inputLine = sr.ReadLine
    '    End While
    '    sr.Close()
    '    ' DataGrid1.DataSource = dt
    'End Sub
End Class
Imports System.Data
Imports System.Data.OleDb
Module ModADO
#Region " MY Global Vars "
    'Const adOpenStatic = 3
    'Const adLockOptimistic = 3
    'Const adCmdText = &H1
    '********* database Var's ************
    Public sSql As String
    Public sTableName As String
    Public iCounter As Short
    Public myDataRow As DataRow
    Public DS As New DataSet
    Public ADOAdapter As OleDb.OleDbDataAdapter  '(ADO Access)
    '**** Server Connection Vars *********
    Public sDBServer As String               '<--- Server Name
    Public sUSERid As String                 '<--- User Id
    Public sDBName As String                 '<--- Database Name
    Public sDBPass As String                 '<--- Password
    Public Con As OleDb.OleDbConnection      '<--- for Access
    '******* Standard Var's *************
    Public sAppPath As String = System.Windows.Forms.Application.StartupPath
    Public bLogin As Boolean
#End Region
#Region " MY Server Connections "
    '***Sample for a Access Database***
    'User ID=Admin;
    'Data Source=D:\MyDatabase.mdb;
    'Mode=Share Deny None;
    'Extended Properties="";
    'Locale Identifier=1033;
    'Persist Security Info=False;
    'Jet OLEDB:System database=D:\SYSTEM.MDW;
    'Jet OLEDB:Registry Path="";
    'Jet OLEDB:Database Password="";
    'Jet OLEDB:Engine Type=5;         'Type=4 is for Access 95
    'Jet OLEDB:Database Locking Mode=1;
    'Jet OLEDB:Global Partial Bulk Ops=2;
    'Jet OLEDB:Global Bulk Transactions=1;
    'Jet OLEDB:New Database Password="";
    'Jet OLEDB:Create System Database=False;
    'Jet OLEDB:Encrypt Database=False;
    'Jet OLEDB:Don't Copy Locale on Compact=False;
    'Jet OLEDB:Compact Without Replica Repair=False;
    'Jet OLEDB:SFP=False
    Public Sub OpenDBADOConnection()
            Con = New OleDb.OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & sDBName & ";password=" & sDBPass & ";")
        Catch Ex As Exception
            MsgBox("Exception: " & Ex.Message & "  " & Ex.ToString, MsgBoxStyle.Critical)
            Cursor.Current = Cursors.Default
        End Try
    End Sub
#End Region
End Module

Open in new window

Hmmm...instead of using an adapter to fill it, you could use a reader, build the table row by row and fill it field by field as you read through the input data.  It's a bit old-fashioned, but it allows you to specifically convert each field to its target value.  I'm a little pressed for time right now, but


links a complete description (with example) of the OleDbDataReader class.
Avatar of GeneM

Blurred text
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of pfcs_sql_admin


I'm in the process of trying a few of the options presented so far.

I had NO idea of the TextFieldParser at all.  If not for the commas within some of the fields (random), that would probably be the perfect solution.

Thanks for the input so far!
>I had NO idea of the TextFieldParser at all.  If not for the commas within some of the fields (random), that
>would probably be the perfect solution.

*nods* it was a surprise to me that you found your fields were getting formatted correctly when you let the .Fill method work.  The other methods seem to depend on parsing using comma as a delimiter, which you have already noted is problematic due to the embedded commas in your field.  One of the reasons I suggested OleDBReader was so you could examine its output and determine whether the driver was correctly formatting the data; if so, then filling the table cell-by-cell and row-by-row is going to be your only bet.
Avatar of pfcs_sql_admin


Actually, my "guess" is that it is peeking ahead at the data to determine datatypes and numbers of columns.

 So I'm figuring that out of the several rows it looks at, it is taking the majority.  For example the first 2 rows actually have "bonus commas" in it and yet the number of columns is correct.  I can only assume it is looking at more than the first couple and is just ignoring the couple of commas that "don't fit"

Dunno hehe.
Maybe....and maybe it's sampling enough to see you have differing numbers of columns, as well, so it knows that columns above 185 must be Nullable.  
Avatar of pfcs_sql_admin


This is maddening.

I'm using the Reader and first created 250 String columns in my table.

On the read, I have tried (using your syntax above) MyReader.GetString(val) to presumably pull back the string that it read in (since it IS a text file so everything is a string).  I got nothing but blanks back.  Not a single value!

So I then changed it to MyReader(val).ToString and got back all my data .... formatted just as it always was - Scientific notation for the large numbers, AND blanks where the non-numeric strings in the columns didn't match the "assumed" datatype of Double.  Note that no where at all is there a datatype specified other than a string.

I then tried MyReader.Item(val) which by its very definition says that it returns the value in its native format.  Same as always, scientific notation and blanks where it failed to import a non-numeric string into a (presumed) double column.

Again ... it is all the "automatic" translations that are killing this thing.  Argh!

*nods* I was afraid of that...I know you're being supplied with this data file, but can you ask for the data to be quote-delimited?  Surrounding each field with doublequotes would force every field into string format.
do you have a sample test file that has these multi- commas in it?
I have some code i would like to test and see if it will fix your problem.
Avatar of pfcs_sql_admin


Interesting discovery, and it actually explains a couple things.

It turns out that the random fields that contain the extra comma ARE being set off with quotes.  Which totally explains now how those commas are being ignored on the read and allowing the number of columns to remain accurate.  Oddly enough (well clearly its on purpose) the quotes do not exist for the same column value when there aren't any offending quotes.

This should allow me now to use the other methods and read it in and handle the quote/comma issues as they arise and to hopefully handle the PITA issues with the data connection automatically converting values.

I'm going to work on this a bit to see that it will work then hand out prizes.

Avatar of GeneM

A correctly formatted .csv file has 2 special considerations:

1.  Embedded commas within a field.  This type of field must be enclosed in quotes.
2.  Embedded quotes within a field.  The embedded quote must be replaced in the .csv file with 2 quotes.

These 2 conventions are followed by Excel when saving a file as a .csv file.
The TextFieldParser handles both of these conditions.

View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
I note that GeneM's TextFieldParser could also be used for this solution, but I still recommend my own method of intializing the DataTable.
Avatar of pfcs_sql_admin


Wooo Finally!

I split the points (pro-rating some) between the two of you.

cpkilekofp got the most for sticking with me and providing multiple feedback as I worked through this, as well as the foundation for the final solution.

GeneM for the TextFieldParser that also provided part of the solution.  Not to mention that it is a cool thing regardless!

Thank you both!
yvw :) and thanks to GeneM for pointing out such a useful tool :)

The successor to Active Server Pages, ASP.NET websites utilize the .NET framework to produce dynamic, data and content-driven web applications and services. ASP.NET code can be written using any .NET supported language. As of 2009, ASP.NET can also apply the Model-View-Controller (MVC) pattern to web applications

Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews


IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo