Link to home
Start Free TrialLog in
Avatar of gr8life
gr8life

asked on

Problem with appending data using stringbuilder

I have very painfully trying to modify an application to increase its performance.  This application has to process a lot of data, but currently is too slow to be of much value.  I believe it is because it writes too many times and I want to change this function to utilize a stringbuilder approach, but am not able to make to change.  Please help.

Thank you for your time and very valuable expertise,
Gr8life

Function I am having trouble with:

    Public Sub ConvertFiles(ByVal filein As String, ByVal fileout As String)
        Try
            If (IO.File.Exists(filein)) Then
                'Dim sin As New IO.StreamReader(filein) 'was commented out
                Dim sout As New IO.StreamWriter(fileout, False)
                sout.AutoFlush = False ' output file is NOT updated after every WriteLine() call
                Dim lineCounter As Integer
                Dim items() As String
                Dim sOutLine As String
                Dim dtInput As DataTable
                dtInput = GetDataFromCSVFile(filein)
                dtInput.Columns.Add(New DataColumn("Lookup1"))
                dtInput.Columns.Add(New DataColumn("Lookup2"))
                For Each dr As DataRow In dtInput.Rows
                    If Not IsDBNull(dr.Item(3)) Then
                        dr.Item("Lookup1") = IPLookup(dr.Item(3))
                    Else
                        dr.Item("Lookup1") = ""
                    End If
                    If Not IsDBNull(dr.Item(5)) Then
                        dr.Item("Lookup2") = IPLookup(dr.Item(5))
                    Else
                        dr.Item("Lookup2") = ""
                    End If
                    sOutLine = ""
                    For iCnt As Integer = 0 To 15
                        If IsDBNull(dr.Item(iCnt)) Then
                            sOutLine &= ","
                        Else
                            sOutLine &= dr.Item(iCnt) & ","
                        End If
                        If iCnt = 3 Then
                            sOutLine &= dr.Item("Lookup1") & ","
                        ElseIf iCnt = 5 Then
                            sOutLine &= dr.Item("Lookup2") & ","
                        End If
                    Next
                    sout.WriteLine(sOutLine)
                    lineCounter = lineCounter + 1
                    If lineCounter Mod 10000 = 0 Then
                        sout.Flush()
                    End If
                Next
                'sin.Close() 'was commented out
                sout.Flush()
                sout.Close()
            End If
        Catch ex As Exception
            MsgBox("Problem Occurred" & ex.Message)
        End Try
    End Sub

Avatar of Howard Cantrell
Howard Cantrell
Flag of United States of America image

Do you have a small example file of what you are wanting to read in and split?
your code above does not have a good flow, but I need to see what you have and where you want to go with what kind of data.
 Is the data a csv file and are you wanting to place it into a database?
Avatar of gr8life
gr8life

ASKER

I will post a sample file, when I get to my desk at work.  Would it help you understand this application better if I posted all the code instead of just this portion?
Thank you very much for reading this question,
Gr8life
Avatar of gr8life

ASKER

The code reads a tab delimited text file and outputs a CSV.  The application compares IP addresses against an Access database of country names to determine what country the IP is associated with. As far as a sample of the file the only data that is of concern are the two IP addresses in each row, which are the 4th and 6th columns.

Before:
A B C (1st IP Address) E (2nd IP Address) G H I J K L M N  

As the file is “processed” the country name is appended to the right of the IP addresses.

After:
A B C (1st IP Address) (1st Country Name) E (2nd IP Address) (2nd Country Name) G H I J K L M N  
Thanks again for your time,
Gr8life
1) How many files are you working with?

2) .NET version?

3) Where is the crucial section (what takes up the most time)?

Bob
If you need to upload an example, you can do it this way:

http://www.ee-stuff.com/Expert/Upload/viewFilesQuestion.php?qid=<qid>

Replace <qid> with the question ID:

http://www.ee-stuff.com/Expert/Upload/viewFilesQuestion.php?qid=21847301

Bob
My impression from your orignal post was that you were inputting one .csv file and outputting a .csv file virtually the same, but with the values in Cols 3 and 5 changed from references to a look-up table to the actual values from that look-up table.  On that basis you would greatly improve the efficiency of the operation by treating it as a DataBase/DataTable operation rather than one involving the manipulation of strings.  To illustrate what I mean, I put together a little app.

I created two .csv files.  Here's what they looked like to start with.

InputRecord.csv was

Col0,Col1,Col2,Col3,Col4
"Col0;Row0",1,"Col2;Row0","Col3;Row0",1
"Col0;Row1",2,"Col2;Row1","Col3;Row1",0
"Col0;Row2",3,"Col2;Row2","Col3;Row2",2
"Col0;Row3",4,"Col2;Row3","Col3;Row3",0
"Col0;Row4",5,"Col2;Row4","Col3;Row4",2
"Col0;Row5",6,"Col2;Row5","Col3;Row5",1
"Col0;Row6",7,"Col2;Row6","Col3;Row6",
"Col0;Row7",8,"Col2;Row7","Col3;Row7"
"Col0;Row8",9,"Col2;Row8","Col3;Row8",1

OutputRecord.csv was

Col0,Col1,Col2,Col3,Col4

Then, with a form containing just a button, this was the code

Imports System.Data.OleDb
Public Class Form1
    Inherits System.Windows.Forms.Form

    Dim sourceConStr As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\Test\;Extended Properties=""text;HDR=Yes;FMT=Delimited"""
    Dim sourceSQL As String = "SELECT * FROM InputRecord.csv"
    Dim dt As New DataTable
    Dim con As OleDbConnection
    Dim da As OleDbDataAdapter

    Private Sub SaveTable()

        Dim cmd As New OleDbCommand
        With cmd
            .Connection = con
            .CommandText = "INSERT INTO OutputRecord.csv (Col0, Col1, Col2, Col3, Col4) VALUES (?,?,?,?,?)"
            .Parameters.Add("0", OleDbType.VarWChar)
            .Parameters.Add("1", OleDbType.Integer)
            .Parameters.Add("2", OleDbType.VarWChar)
            .Parameters.Add("3", OleDbType.VarWChar)
            .Parameters.Add("4", OleDbType.VarWChar)
            .CommandType = CommandType.Text
        End With
        con.Open()
        For Each dr As DataRow In dt.Rows
            cmd.Parameters(0).Value = dr(0)
            cmd.Parameters(1).Value = dr(1)
            cmd.Parameters(2).Value = dr(2)
            cmd.Parameters(3).Value = dr(3)
            cmd.Parameters(4).Value = dr(5)
            cmd.ExecuteNonQuery()
        Next
        con.Close()
    End Sub

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        GetTable()
    End Sub

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        AmendTable()
        SaveTable()
    End Sub

    Private Sub GetTable()
        con = New OleDbConnection(sourceConStr)
        da = New OleDbDataAdapter(sourceSQL, con)
        da.Fill(dt)
    End Sub

    Private Sub AmendTable()
        Dim dc As New DataColumn
        dc.ColumnName = "Col5"
        dc.DefaultValue = ""
        dc.DataType = GetType(String)
        dt.Columns.Add(dc)
        For Each dr As DataRow In dt.Rows
            If Not IsDBNull(dr("Col4")) Then
                Select Case dr("Col4")
                    Case 0
                        dr("Col5") = "First"
                    Case 1
                        dr("Col5") = "Second"
                    Case 2
                        dr("Col5") = "Third"
                End Select

            End If

        Next
    End Sub

End Class

Here's what OutputRecord.csv looked like after that.

Col0,Col1,Col2,Col3,Col4
"Col0;Row0","1","Col2;Row0","Col3;Row0","Second"
"Col0;Row1","2","Col2;Row1","Col3;Row1","First"
"Col0;Row2","3","Col2;Row2","Col3;Row2","Third"
"Col0;Row3","4","Col2;Row3","Col3;Row3","First"
"Col0;Row4","5","Col2;Row4","Col3;Row4","Third"
"Col0;Row5","6","Col2;Row5","Col3;Row5","Second"
"Col0;Row6","7","Col2;Row6","Col3;Row6",""
"Col0;Row7","8","Col2;Row7","Col3;Row7",""
"Col0;Row8","9","Col2;Row8","Col3;Row8","Second"

I did that before your latest posting, so I see that the assumption that both input and output were .csv was wrong.  And - I knew this anyway - the coding for the AmendTable sub would need to be considerably more complicated than my simple example.  But the crucial point is that by making use of the Jet OLEDB engine's ability to read and write delimited files as though they were tables in databases you should be able to make the (minimal) changes you want within datatables in VB.NET and so avoid the time (and resource) consuming building of strings.

Roger
Avatar of gr8life

ASKER

Here is the code involved:
Thank you,
Gr8life

Imports System.Data.Common
Imports System.Data.Oledb
Imports System.IO
Imports System.Text
Imports System.Data
Imports System.Data.Odbc

Private ds As New DataSet
    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Try
            Dim inFile As String
            Dim outFile As String
            Dim openFileDialog1 As New OpenFileDialog
            openFileDialog1.InitialDirectory = "c:\resource"
            openFileDialog1.Filter = "txt files (*.txt|*.txt|All files(*.*)|*.*"
            openFileDialog1.FilterIndex = 2
            openFileDialog1.RestoreDirectory = True

            If openFileDialog1.ShowDialog() = DialogResult.OK Then
                inFile = openFileDialog1.FileName
                outFile = Mid(inFile, 1, inFile.LastIndexOf("."))
                outFile += "-out.CSV"
            End If

            If (inFile.Length > 0) Then
                If (outFile.Length > 0) Then
                    CountryDatatable() 'Here is where I am calling function
                    ConvertFiles(inFile, outFile)                  
                    MsgBox("File Complete!")
                Else
                    MsgBox("No output file specified!")
                End If
            Else
                MsgBox("No input file specified!")
            End If
        Catch ex As Exception
            MsgBox("Error in Button1_Click :" & ex.message)
        End Try
    End Sub

    Public Sub ConvertFiles(ByVal filein As String, ByVal fileout As String)
        Try
            If (IO.File.Exists(filein)) Then
                'Dim sin As New IO.StreamReader(filein) 'was commented out
                Dim sout As New IO.StreamWriter(fileout, False)
                sout.AutoFlush = False ' output file is NOT updated after every WriteLine() call
                Dim lineCounter As Integer
                Dim items() As String
                Dim sOutLine As String
                Dim dtInput As DataTable
                dtInput = GetDataFromCSVFile(filein)
                dtInput.Columns.Add(New DataColumn("Lookup1"))
                dtInput.Columns.Add(New DataColumn("Lookup2"))
                For Each dr As DataRow In dtInput.Rows
                    If Not IsDBNull(dr.Item(3)) Then
                        dr.Item("Lookup1") = IPLookup(dr.Item(3))
                    Else
                        dr.Item("Lookup1") = ""
                    End If
                    If Not IsDBNull(dr.Item(5)) Then
                        dr.Item("Lookup2") = IPLookup(dr.Item(5))
                    Else
                        dr.Item("Lookup2") = ""
                    End If
                    sOutLine = ""
                    For iCnt As Integer = 0 To 15
                        If IsDBNull(dr.Item(iCnt)) Then
                            sOutLine &= ","
                        Else
                            sOutLine &= dr.Item(iCnt) & ","
                        End If
                        If iCnt = 3 Then
                            sOutLine &= dr.Item("Lookup1") & ","
                        ElseIf iCnt = 5 Then
                            sOutLine &= dr.Item("Lookup2") & ","
                        End If
                    Next
                    sout.WriteLine(sOutLine)
                    lineCounter = lineCounter + 1
                    If lineCounter Mod 10000 = 0 Then
                        sout.Flush()
                    End If
                Next
                'sin.Close() 'was commented out
                sout.Flush()
                sout.Close()
            End If
        Catch ex As Exception
            MsgBox("Problem Occurred" & ex.Message)
        End Try
    End Sub

    Public Function Dot2LongIP(ByVal DottedIP As String) As Long
        Dim arrDec() As String
        Dim i As Integer
        Dim intResult As Long
        If DottedIP = "" Then
            Return 0
        Else
            arrDec = DottedIP.Split(".")
            For i = arrDec.Length - 1 To 0 Step -1
                intResult = intResult + ((Int(arrDec(i)) Mod 256) * Math.Pow(256, 3 - i))
            Next
            Return intResult
        End If
    End Function

    Public Sub CountryDatatable()
        Dim cmdSelect As New OleDbCommand
        Dim dtm As DataTableMapping
        Dim da As New OleDbDataAdapter
        Dim dt As New DataTable
        Dim con As New OleDbConnection

        With da
            .SelectCommand = New OleDbCommand
            With .SelectCommand
                .Connection = New OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source = C:\resource\master.mdb")
                .CommandText = "SELECT * from Country order by ipl1 , ipl2"
            End With

            .TableMappings.Add("Table", "Country")
            .AcceptChangesDuringFill = True
        End With

        Try
            da.Fill(ds)
        Catch ex As Exception
            MessageBox.Show("Unable to load Country data: " + ex.Message)
        End Try
    End Sub

    Public Function IPLookup(ByVal DottedIP As String) As String
        Dim lngIP As Long
        Dim strCountry As String
        Dim foundrows() As DataRow
        Dim sql As String

        lngIP = Dot2LongIP(DottedIP)

        sql = lngIP.ToString & ">=ipl1 and " & lngIP.ToString & " <= ipl2"
        foundrows = ds.Tables("Country").Select(sql)

        If foundrows.Length = 0 Then
            strCountry = "unknown"
        Else
            strCountry = foundrows(0).Item("countryname")
        End If
        Return strCountry
    End Function

    Private Function GetDataFromCSVFile(ByVal CSVFileName As String) As DataTable

        Dim strConnectText As String
        Dim myOdbcConnection As OdbcConnection = Nothing
        Dim myOdbcCommand As OdbcCommand = Nothing
        Dim daData As OdbcDataAdapter = Nothing
        Dim dsData As New DataSet
        Try

            '   Create Schema.ini file in the same directory where CSV data file exists, to enable to read data from first row.
            CreateSchemaIni(System.IO.Path.GetFileName(CSVFileName), System.IO.Path.GetDirectoryName(CSVFileName))
            '   ODBC Connection details to read data from csv files
            strConnectText = "Driver={Microsoft Text Driver (*.txt; *.csv)};DRIVERID=27;MAXSCANROWS=1;"
            strConnectText &= "DefaultDir=" & System.IO.Path.GetDirectoryName(CSVFileName) & ";"
            strConnectText &= "DBQ=" & System.IO.Path.GetDirectoryName(CSVFileName) & ";"
            strConnectText &= "UserCommitSync=Yes;FIL=text;UID=admin;MaxBufferSize=2048;Threads=3;SafeTransactions=0;COLNAMEHEADER=False;"

            '   Create new ODBC Connection to read data from CSV
            myOdbcConnection = New OdbcConnection(strConnectText)
            myOdbcCommand = New OdbcCommand("SELECT * FROM `" & System.IO.Path.GetFileName(CSVFileName) & "`", myOdbcConnection)
            daData = New OdbcDataAdapter(myOdbcCommand)
            '   Get data from CSV to dataset
            daData.Fill(dsData, "tab1")
            Return dsData.Tables(0)

        Catch ex As Exception
            Throw ex
        Finally
            If Not myOdbcCommand Is Nothing Then
                myOdbcCommand.Dispose()
                myOdbcCommand = Nothing
            End If
            If Not myOdbcConnection Is Nothing Then
                myOdbcConnection.Close()
                myOdbcConnection.Dispose()
                myOdbcConnection = Nothing
            End If
            If Not daData Is Nothing Then
                daData.Dispose()
                daData = Nothing
            End If
            If Not dsData Is Nothing Then
                dsData.Dispose()
                dsData = Nothing
            End If
        End Try

    End Function

    Private Sub CreateSchemaIni(ByVal CSVFileName As String, ByVal CSVPath As String)
        Dim oSw As StreamWriter = Nothing
        Dim strFileName As String
        Dim intFldCnt As Integer

        Try
            strFileName = CSVPath
            If Not strFileName.EndsWith("\") Then
                strFileName &= "\"
            End If
            strFileName &= "schema.ini"

            oSw = New StreamWriter(strFileName, False, Encoding.GetEncoding("Shift_JIS"))

            oSw.WriteLine("[" & CSVFileName & "]")
            oSw.WriteLine("ColNameHeader = False")
            oSw.WriteLine("Format = TabDelimited")

            intFldCnt = 16

            For intCnt As Integer = 1 To intFldCnt
                oSw.WriteLine("Col" & intCnt.ToString.Trim & " =F" & intCnt.ToString.Trim & " Text ")
            Next


        Catch ex As Exception

        Finally
            If Not oSw Is Nothing Then
                oSw.Close()
            End If
        End Try

    End Sub
End Class
Avatar of gr8life

ASKER

I really don't have the output to be CSV.  I am fine with the format staying vbtab.
Gr8life
Avatar of gr8life

ASKER

1) How many files are you working with?

The application is being used daily to process large sized files, (100 MB-2 GB), but only one file at a time is required.

2) .NET version?

I have access to the Microsoft Development Environment 2003, with the MS .NET 1.1 Framework.

3) Where is the crucial section (what takes up the most time)?

The latency appears to be during the convert files portion, because it has to convert an IP to long then find the country in the database.

Thank you,
Gr8life
ASKER CERTIFIED SOLUTION
Avatar of Bob Learned
Bob Learned
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of gr8life

ASKER

TheLearnedOne,
Other than guess how do I narrow down the time consuming processes?  Sorry if this question sounds newbie... I'm a newbie so I guess I can't help it.
Thank you for your time,
Gr8life
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of gr8life

ASKER

WOW! Now that is what I call expert support!  I am going away on a short business trip and will be back Thursday. I will not have internet access until then, so I am going to leave this question open until my return.  Thank you all so much for your time and even though I haven't had very much time to work with the code I really appreciate all your input.  I just wish I could give out more points.
Thank you for your expertise,
Gr8life
Avatar of gr8life

ASKER

Thank you for all the expert advice.  
Gr8life