Link to home
Create AccountLog in
Avatar of gr8life
gr8life

asked on

Possible Performance Improvements

I am currently using the code below.  The performance is very slow and I am looking for suggestions on how to improve it.  This application is currently being used to process between 100 meg and 2 gig of data.

Thank you very much for your time and expertise,
Gr8life

Imports System.Data.Common
Imports System.Data.Oledb
Imports System.IO
Imports System.Text

Public Class Form1

    Private ds As New DataSet

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Try
            Dim inFile As String
            Dim outFile As String
            Dim openFileDialog1 As New OpenFileDialog
            openFileDialog1.InitialDirectory = "c:\resource"
            openFileDialog1.Filter = "txt files (*.txt|*.txt|All files(*.*)|*.*"
            openFileDialog1.FilterIndex = 2
            openFileDialog1.RestoreDirectory = True

            If openFileDialog1.ShowDialog() = DialogResult.OK Then
                inFile = openFileDialog1.FileName
                outFile = Mid(inFile, 1, inFile.LastIndexOf("."))
                outFile += "-out.CSV"
            End If

            If (inFile.Length > 0) Then
                If (outFile.Length > 0) Then
                    CountryDatatable() 'Here is where I am calling function
                    ConvertFiles(inFile, outFile)
                    'Wave.Play("C:\resource\sounds\completed.wav")
                    MsgBox("File Complete!")
                Else
                    MsgBox("No output file specified!")
                End If
            Else
                MsgBox("No input file specified!")
            End If
        Catch ex As Exception
            MsgBox("Error in Button1_Click :" & ex.message)
        End Try
    End Sub

    Public Sub ConvertFiles(ByVal filein As String, ByVal fileout As String)
        Dim sb As New StringBuilder   'I will be trying other values later   (100) was removed after stringbuilder    
        Try
            If (IO.File.Exists(filein)) Then
                Dim sin As New IO.StreamReader(filein)
                Dim items() As String
                While True
                    Dim readline As String = sin.ReadLine
                    If (IsNothing(readline)) Then Exit While

                    items = readline.Split(vbTab)
                    items(3) = items(3) & "," & IPLookup(items(3))
                    items(5) = items(5) & "," & IPLookup(items(5))
                    sb.Append(items(3).ToString & "," & items(5).ToString & vbCrLf)

                End While
                sin.Close()

                Dim sout As New IO.StreamWriter(fileout, False)
                sout.WriteLine(String.Join(",", items)) 'Name 'items' not declared
                sout.Flush()
                sout.Close()
            End If
        Catch ex As Exception
            MsgBox("Problem Occurred" & ex.Message)
        End Try
    End Sub

    Public Function Dot2LongIP(ByVal DottedIP As String) As Long
        Dim arrDec() As String
        Dim i As Integer
        Dim intResult As Long
        If DottedIP = "" Then
            Return 0
        Else
            arrDec = DottedIP.Split(".")
            For i = arrDec.Length - 1 To 0 Step -1
                intResult = intResult + ((Int(arrDec(i)) Mod 256) * Math.Pow(256, 3 - i))
            Next
            Return intResult
        End If
    End Function

    Public Sub CountryDatatable()
        Dim cmdSelect As New OleDbCommand
        Dim dtm As DataTableMapping
        Dim da As New OleDbDataAdapter
        Dim dt As New DataTable
        Dim con As New OleDbConnection

        With da
            .SelectCommand = New OleDbCommand
            With .SelectCommand
                .Connection = New OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source = C:\resource\master.mdb")
                .CommandText = "SELECT * from Country"
            End With

            .TableMappings.Add("Table", "Country")
            .AcceptChangesDuringFill = True
        End With

        Try
            da.Fill(ds)
        Catch ex As Exception
            MessageBox.Show("Unable to load Country data: " + ex.Message)
        End Try
    End Sub

    Public Function IPLookup(ByVal DottedIP As String) As String
        Dim lngIP As Long
        Dim strCountry As String
        Dim foundrows() As DataRow
        Dim sql As String

        lngIP = Dot2LongIP(DottedIP)

        sql = lngIP.ToString & ">=ipl1 and " & lngIP.ToString & " <= ipl2"
        foundrows = ds.Tables("Country").Select(sql)

        If foundrows.Length = 0 Then
            strCountry = "unknown"
        Else
            strCountry = foundrows(0).Item("countryname")
        End If
        Return strCountry
    End Function
End Class


Avatar of appari
appari
Flag of India image


seems OK to me.

check these points.
1. in ConvertFiles sub you have "sb As New StringBuilder   " declared and appending some data to this sb. but its not used anywhere else in the sub. if it is not used delete it.

2. in CountryDatatable sub change the sql "SELECT * from Country" to "SELECT * from Country order by ipl1 , ipl2 "


3. in sout (output file ) you want only one record? its outside while loop so it writes the last record only.
Avatar of gr8life
gr8life

ASKER

The above code is being used to resolve IP addresses to country names.  

Application Process:
Reads a VbTab delimited text file
Load a table from a database into memory
Convert the dotted decimal address to its long equivalency
Compare the long address against a range of addresses
Write the match to a new CSV delimited file

Sorry, I thought this part was included when I included my original question.  What I’m looking for is suggestions for improvements.  Anything that you can see as a performance improvement will greatly benefit me.

Thanks for taking the time to read my post,
Gr8life

does your code generate intended output?
i think the code you posted outputs file with only one record.
Avatar of gr8life

ASKER

I made the changes you suggested and the application is still very slow. I'm new to programming VB.Net so please excuse my ignorance, but it seams to me that this process should be able to completed in a relatively short time.  Is my whole approach fundamentally wrong? Please help!!!!

Thanks for your time and valuable expertise,
Gr8life

Modified code:    
Public Sub ConvertFiles(ByVal filein As String, ByVal fileout As String)
        Try
            If (IO.File.Exists(filein)) Then
                Dim sin As New IO.StreamReader(filein)
                Dim sout As New IO.StreamWriter(fileout, False)
                sout.AutoFlush = False ' output file is NOT updated after every WriteLine() call
                Dim lineCounter As Integer
                Dim items() As String
                Dim readline As String = sin.ReadLine
                While Not IsNothing(readline)
                    items = readline.Split(vbTab)
                    items(3) = items(3) & "," & IPLookup(items(3))
                    items(5) = items(5) & "," & IPLookup(items(5))
                    sout.WriteLine(String.Join(",", items))

                    ' only write to the file every 10000 lines
                    lineCounter = lineCounter + 1
                    If lineCounter Mod 10000 = 0 Then
                        sout.Flush()
                    End If
                    readline = sin.ReadLine
                End While
                sin.Close()
                sout.Flush()
                sout.Close()
            End If
        Catch ex As Exception
            MsgBox("Problem Occurred" & ex.Message)
        End Try
    End Sub

logic seems ok to me. if data is large i think there is no other way, but to wait till the process completes.
you can try the part reading of input file. instead of using streamreader and reading line by line you can use odbc and read from csv file to get the whole data at once into a datatable and use that datatable to process the data. if you want to try this way i can post the code.
Avatar of gr8life

ASKER

When I made the above changes I now am receiving an error message:
"Problem OccurredIndex was outside the bounds of the array."
Once I close the error message box, the message box I have in my code pops up and displays "File Complete".
I checked the output file and all the data was processed correctly. Not what to do.... really frustrated with this.  I would greatly appreciate it if you would post your code.

Thank you very much for your time,
Gr8life
>>Problem OccurredIndex was outside the bounds of the array
where are you getting this error?
Avatar of gr8life

ASKER

I'm not sure. I started processing some data with the debugger turned on.
Thanks,
Gr8life

BEFORE CHANGING TAKE BACKUP OF YOUR EXISTING CODE, IF NOT USING SOURCESAFE

add the following two functions/subs to your code

Private Function GetDataFromCSVFile(ByVal CSVFileName As String) As DataTable

        Dim strConnectText As String
        Dim myOdbcConnection As OdbcConnection = Nothing
        Dim myOdbcCommand As OdbcCommand = Nothing
        Dim daData As OdbcDataAdapter = Nothing
        Dim dsData As New DataSet
        Try


                '   Create Schema.ini file in the same directory where CSV data file exists, to enable to read data from first row.
                CreateSchemaIni(System.IO.Path.GetFileName(CSVFileName), System.IO.Path.GetDirectoryName(CSVFileName), curFileType)
                '   ODBC Connection details to read data from csv files
                strConnectText = "Driver={Microsoft Text Driver (*.txt; *.csv)};DRIVERID=27;MAXSCANROWS=1;"
                strConnectText &= "DefaultDir=" & System.IO.Path.GetDirectoryName(CSVFileName) & ";"
                strConnectText &= "DBQ=" & System.IO.Path.GetDirectoryName(CSVFileName) & ";"
                strConnectText &= "UserCommitSync=Yes;FIL=text;UID=admin;MaxBufferSize=2048;Threads=3;SafeTransactions=0;COLNAMEHEADER=False;"

                '   Create new ODBC Connection to read data from CSV
                myOdbcConnection = New OdbcConnection(strConnectText)
                myOdbcCommand = New OdbcCommand("SELECT * FROM `" & System.IO.Path.GetFileName(CSVFileName) & "`", myOdbcConnection)
                daData = New OdbcDataAdapter(myOdbcCommand)
                '   Get data from CSV to dataset
                daData.Fill(dsData, "tab1")
                Return dsData.Tables(0)

        Catch ex As Exception
            Throw ex
        Finally
            If Not myOdbcCommand Is Nothing Then
                myOdbcCommand.Dispose()
                myOdbcCommand = Nothing
            End If
            If Not myOdbcConnection Is Nothing Then
                myOdbcConnection.Close()
                myOdbcConnection.Dispose()
                myOdbcConnection = Nothing
            End If
            If Not daData Is Nothing Then
                daData.Dispose()
                daData = Nothing
            End If
            If Not dsData Is Nothing Then
                dsData.Dispose()
                dsData = Nothing
            End If
        End Try

end function


    Private Sub CreateSchemaIni(ByVal CSVFileName As String, ByVal CSVPath As String, ByVal AddColumnsType As HeaderType)
        Dim oSw As StreamWriter = Nothing
        Dim strFileName As String
        Dim intFldCnt As Integer

        Try
            strFileName = CSVPath
            If Not strFileName.EndsWith("\") Then
                strFileName &= "\"
            End If
            strFileName &= "schema.ini"

            oSw = New StreamWriter(strFileName, False, Encoding.GetEncoding("Shift_JIS"))

            oSw.WriteLine("[" & CSVFileName & "]")
            oSw.WriteLine("ColNameHeader = False")
            oSw.WriteLine("Format = TabDelimited")
         
            intFldCnt = 5

            For intCnt As Integer = 1 To intFldCnt
                oSw.WriteLine("Col" & intCnt.ToString.Trim & " =F" & intCnt.ToString.Trim & " Text ")
            Next


        Catch ex As Exception

        Finally
            If Not oSw Is Nothing Then
                oSw.Close()
            End If
        End Try

    End Sub


And change your ConvertFiles as follows

    Public Sub ConvertFiles(ByVal filein As String, ByVal fileout As String)
        Try
            If (IO.File.Exists(filein)) Then
                Dim sin As New IO.StreamReader(filein)
                Dim sout As New IO.StreamWriter(fileout, False)
                sout.AutoFlush = False ' output file is NOT updated after every WriteLine() call
                Dim lineCounter As Integer
                Dim items() As String
                Dim sOutLine As String

                Dim dtInput As DataTable
                dtInput = GetDataFromCSVFile(filein)

                dtInput.Columns.Add(New DataColumn("Lookup1"))
                dtInput.Columns.Add(New DataColumn("Lookup2"))

                For Each dr As datarow In dtInput.rows
                    dr.Item("Lookup1") = iplookup(dr.Item(3))
                    dr.Item("Lookup2") = iplookup(dr.Item(5))

                    sOutLine = ""
                    For iCnt As Integer = 0 To 5
                        sOutLine &= dr.Item(iCnt) & ","
                        If iCnt = 3 Then
                            sOutLine &= dr.Item("Lookup1") & ","
                        End If
                    Next
                    sOutLine &= dr.Item("Lookup2") & ","

                    sout.WriteLine(sOutLine)

                    lineCounter = lineCounter + 1
                    If lineCounter Mod 10000 = 0 Then
                        sout.Flush()
                    End If

                Next

                sin.Close()
                sout.Flush()
                sout.Close()
            End If
        Catch ex As Exception
            MsgBox("Problem Occurred" & ex.Message)
        End Try
    End Sub
Avatar of gr8life

ASKER

I wasn't able to find any indication of what was causing the error by debugging this way.  Is there another way to find the problem?  Also I'm not sure if this is of value or not but I tried only working with a small data set 11 rows and didn't get the error message.  Then I tried a data set with 1000 rows and I got the error message. Not sure!
Thanks,
Gr8life
do you have blank lines in the input file?

change your code like this and try

             While Not IsNothing(readline)
                  if readline.trim.length<>0 then
                    items = readline.Split(vbTab)
                    items(3) = items(3) & "," & IPLookup(items(3))
                    items(5) = items(5) & "," & IPLookup(items(5))
                    sout.WriteLine(String.Join(",", items))

                    ' only write to the file every 10000 lines
                    lineCounter = lineCounter + 1
                    If lineCounter Mod 10000 = 0 Then
                        sout.Flush()
                    End If
                 end if
                    readline = sin.ReadLine
                End While

and in my previous code ( the one to use datatable logic)

change

 Private Sub CreateSchemaIni(ByVal CSVFileName As String, ByVal CSVPath As String, ByVal AddColumnsType As HeaderType)

to

 Private Sub CreateSchemaIni(ByVal CSVFileName As String, ByVal CSVPath As String)

and remove that extra parameter while calling. i just copied from my existing project and forgot to remove the parameter.
Avatar of gr8life

ASKER

I added the new code and in the private function GetdatafromCSVFile I have underlined errors for:
OdbcConnection
OdbcCommand
OdbcAdapter
all errors indicate the above as type not defined

Also getting curFileType is not declared

Then in the Private Sub CreateSchemaIni
there is an underlined error Headertype is not defined

Any suggestions?

Also I was attempting to understand your code and wasn't sure if this solution is going to work because my source input data is VbTab delimited not CSV.

Thank you very much for your time,
Gr8life
ok do the following

1. add the following imports at the begining of your code

Imports System.Data
Imports System.IO
Imports System.Text
Imports System.Data.Odbc

2. remove the param ByVal AddColumnsType As HeaderType from createschemaini

3. change this
 CreateSchemaIni(System.IO.Path.GetFileName(CSVFileName), System.IO.Path.GetDirectoryName(CSVFileName), curFileType)
to
 CreateSchemaIni(System.IO.Path.GetFileName(CSVFileName), System.IO.Path.GetDirectoryName(CSVFileName))

this code works for tabdelimited files
Avatar of gr8life

ASKER

I made the changes and got an error message:
Problem OccrurredError [HY000][Microsoft][ODBC Text Driver] The microsoft Jet database engine cannot open the file '(unknown). It is already opened exclusively by another user, or you need permission to view its data.

Any suggestions?

Thanks for the help,
Gr8life

comment out these lines

Dim sin As New IO.StreamReader(filein)

and

sin.close()
Avatar of gr8life

ASKER

I made the changes and got this error:
Problem OccuredCast from string "United States" to type 'Double' is not valid.
I checked the Access database table properties and the countryname column data type is text.  I'm not sureif this is what is causing the error or not.
Thanks,
Gr8life

can you run it in debug mode and find out on which line the error is coming?

i suspect its coming on this line
                    dr.Item("Lookup1") = iplookup(dr.Item(3))

if it is on the above line try changing

dtInput.Columns.Add(New DataColumn("Lookup1"))
dtInput.Columns.Add(New DataColumn("Lookup2"))

to

 dtInput.Columns.Add(New DataColumn("Lookup1", System.Type.GetType("System.String")))
 dtInput.Columns.Add(New DataColumn("Lookup2", System.Type.GetType("System.String")))
Avatar of gr8life

ASKER

The program '[384] ....' has exited with code 0 (0x0)

Is this what you needed?
thanks,
Gr8life
Avatar of gr8life

ASKER

same error ocurred.

Thanks,
Gr8life
Avatar of gr8life

ASKER

This time the debug was
the program '[1196] ...

what i am asking is on which line you are getting this error?
Avatar of gr8life

ASKER

I'm sorry I don't know how to determine that.  Can you briefly explain how to do it?
Thanks,
Gr8life
ok i checked here with sample data
the problem is in CreateSchemaIni sub

in CreateSchemaIni sub change
intFldCnt = 5
to

intFldCnt = 6 ' Here i dont know how many columns exists in input file change this to number of columns in the input file
Avatar of gr8life

ASKER

I made the recomended change and tried it on a small data set.  It worked great.  I then tried it on a data set with 1k lines and I got a memory stack error which I believe is caused by the fact I'm at home and I'm running .NET on a older laptop.  I am going to try the code out when I get to work PC where limitted resources is not a problem.
Thank you for all of your help!!! I really appreciate it.

Gr8life
Avatar of gr8life

ASKER

I tried the code on my work pc and it completed without errors however the new data set that is created it incomplete.  The rest of each row is not added to the output file. The output file should have a total of 16 columns and now it only has 8.  

Any suggestions?
Thanks,
Gr8life


in CreateSchemaIni sub change
intFldCnt = 5
to

intFldCnt = 16 ' Here i dont know how many columns exists in input file change this to number of columns in the input file
ASKER CERTIFIED SOLUTION
Avatar of appari
appari
Flag of India image

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
See answer
Avatar of gr8life

ASKER

I made the changes the changes you posted here and got some error messages.  I remembered to comment out the two lines dealing with the sin… That worked. I then tried the code on a very small data set 11 lines and It worked great and the missing data was there. Thank you.  Then I tried it on a data set with 1,000 lines and it wrote past the last IP address in line 983, by five fields and then it popped an error.  Problem OccurredCast from type ‘DBNull’ to type ‘String’ is not valid. Then the message box File Complete.  I’m not sure what is causing this, however I did some research and it appears to be related to null values in the database.  Does that make sense to you?
Thank you for all the help you have provided me,
Gr8life

so is there any difference in performance compared to old one?
Avatar of gr8life

ASKER

It is much faster than the old one.  Very, very happy about that.  If I could just get the error fixed I can go to sleep.  Been up over two days+.
Gr8life
Avatar of gr8life

ASKER

Do you have any suggestions?
Avatar of gr8life

ASKER

If you do post them on:
https://www.experts-exchange.com/questions/21794685/Problem-OccurredCast-from-type-DBNull-to-type-String-is-not-valid.html
If not thank you, thank you, thank you, thank you, thank you!!!!

I truly appreciate the time and effort you put forth in helping me,
Gr8life
i think that error is coming here

For iCnt As Integer = 0 To 15
                        sOutLine &= dr.Item(iCnt) & ","
                        If iCnt = 3 Then
                            sOutLine &= dr.Item("Lookup1") & ","
                        else If iCnt = 5 Then
                            sOutLine &= dr.Item("Lookup2") & ","
                        End If
                    Next


try changing this to

                For iCnt As Integer = 0 To 15
                        If IsDBNull(dr.Item(iCnt) ) Then
                               sOutLine &=  ","
                        else
                               sOutLine &= dr.Item(iCnt) & ","
                        end if
                        If iCnt = 3 Then
                            sOutLine &= dr.Item("Lookup1") & ","
                        else If iCnt = 5 Then
                            sOutLine &= dr.Item("Lookup2") & ","
                        End If
                    Next

shall i post this in new question:)
Avatar of gr8life

ASKER

Yes, please continue this in the new question.  I wanted to give you more points for helping me so much.  I really appreciate the help and wanted to show it.
Thanks,
Gr8life