Link to home
Start Free TrialLog in
Avatar of Diammond
Diammond

asked on

Visual Basic code to search and replace within a large data file

Hello.  A few months ago an expert helped me develop visual basic code to perform a search and replace in a data file.  At the time the file was about 100mb.  The file has grown to over 200mb and the program does not work anymore.  Is there a way to modify the code so that it won't bomb on files over 200mb?  Any help would be greatly appreciated.  

Thank you
Diammond
Imports System.Text.RegularExpressions
Module Module1
    Public strNewPhoneNumber As String = """2126420001"""
    Public intRepCount As Integer = 0
    Sub Main()
        If IO.File.Exists("c:\reports\replaced.log") Then
            IO.File.Delete("c:\reports\replaced.log")
        End If
        Dim srDataFile As New System.IO.StreamReader("c:\reports\data.txt")
        Dim strDataFile As String = srDataFile.ReadToEnd
        srDataFile.Close()
 
        Dim srPhoneNumbers As New System.IO.StreamReader("c:\realPhoneNumbers.txt")
        Dim strPhoneNumbers As String = srPhoneNumbers.ReadToEnd
        srDataFile.Close()
 
        Dim reNumbers As Regex = New Regex("(?<=#PhoneNumber\r\n"")\d+(?="")")
        Dim mcNumbers As MatchCollection = reNumbers.Matches(strPhoneNumbers)
        Dim strPhoneNumbersToExclude As String = ""
        For Each mNumber As Match In mcNumbers
            strPhoneNumbersToExclude = strPhoneNumbersToExclude & mNumber.Groups(0).Value & "|"
        Next
        strPhoneNumbersToExclude = Left(strPhoneNumbersToExclude, Len(strPhoneNumbersToExclude) - 1)
 
        Dim matchpattern As String = "(?<=#PhoneNumber\r\n)""(?:(?!" & strPhoneNumbersToExclude & ")\d+)"""
        Dim myEval As MatchEvaluator = New MatchEvaluator(AddressOf ReplaceMatch)
 
        Dim swNewDataFile As New System.IO.StreamWriter("c:\reports\data.txt")
        swNewDataFile.Write(Regex.Replace(strDataFile, matchpattern, myEval))
        swNewDataFile.Close()
        Console.WriteLine(intRepCount & " replacements, finished at " & Now())
    End Sub
    Public Function ReplaceMatch(ByVal m As Match) As String
        Dim swLogFile As New System.IO.StreamWriter("c:\reports\replaced.log", True)
        Dim strLogLine As String = "Replaced " & m.Groups(0).Value & " with " & strNewPhoneNumber & " " & Now()
        Console.WriteLine(strLogLine)
        swLogFile.WriteLine(strLogLine)
        swLogFile.Close()
        intRepCount = intRepCount + 1
        Return strNewPhoneNumber
    End Function
End Module

Open in new window

Avatar of Arthur_Wood
Arthur_Wood
Flag of United States of America image

here is your problem:

Dim strDataFile As String = srDataFile.ReadToEnd


you are reading the ENTIRE file into memory (into a single string object, by the way), in one fell swoop.

you can break this down, so it reads and process one line at a time - may be slow, but it will not choke.

simply wrap the processing in a Loop, that reads one line from the input file, proicesses that line, writes the processed line to the output file, and reads the next line, untill you have processed the entire original file.

If you need help, just ask.

AW
Avatar of Diammond
Diammond

ASKER

Thanks for your response.  Here is my revised code.  However it is not working. Any suggestions?

Imports System.Text.RegularExpressions
Imports System.IO
Imports System.Collections

Module Module1
    Public strNewPhoneNumber As String = """2126420001"""
    Public intRepCount As Integer = 0
    Sub Main()
        If IO.File.Exists("c:\reports\replaced.log") Then
            IO.File.Delete("c:\reports\replaced.log")
        End If
        Dim objReader As New StreamReader("c:\reports\data.txt")
        Dim strDataFile As String = ""
        Dim arrText As New ArrayList()

        Do
            strDataFile = objReader.ReadLine()
            If Not strDataFile Is Nothing Then
                arrText.Add(strDataFile)
            End If
        Loop Until strDataFile Is Nothing
        objReader.Close()
 
        Dim srPhoneNumbers As New System.IO.StreamReader("c:\realPhoneNumbers.txt")
        Dim strPhoneNumbers As String = srPhoneNumbers.ReadToEnd
        ObjReader.Close()
 
        Dim reNumbers As Regex = New Regex("(?<=#PhoneNumber\r\n"")\d+(?="")")
        Dim mcNumbers As MatchCollection = reNumbers.Matches(strPhoneNumbers)
        Dim strPhoneNumbersToExclude As String = ""
        For Each mNumber As Match In mcNumbers
            strPhoneNumbersToExclude = strPhoneNumbersToExclude & mNumber.Groups(0).Value & "|"
        Next
        strPhoneNumbersToExclude = Left(strPhoneNumbersToExclude, Len(strPhoneNumbersToExclude) - 1)
 
        Dim matchpattern As String = "(?<=#PhoneNumber\r\n)""(?:(?!" & strPhoneNumbersToExclude & ")\d+)"""
        Dim myEval As MatchEvaluator = New MatchEvaluator(AddressOf ReplaceMatch)
 
        Dim swNewDataFile As New System.IO.StreamWriter("c:\reports\data.txt")
        swNewDataFile.Write(Regex.Replace(strDataFile, matchpattern, myEval))
        swNewDataFile.Close()
        Console.WriteLine(intRepCount & " replacements, finished at " & Now())
    End Sub
    Public Function ReplaceMatch(ByVal m As Match) As String
        Dim swLogFile As New System.IO.StreamWriter("c:\reports\replaced.log", True)
        Dim strLogLine As String = "Replaced " & m.Groups(0).Value & " with " & strNewPhoneNumber & " " & Now()
        Console.WriteLine(strLogLine)
        swLogFile.WriteLine(strLogLine)
        swLogFile.Close()
        intRepCount = intRepCount + 1
        Return strNewPhoneNumber
    End Function
End Module
Any more takers on this?
you say "However it is not working" -- what is happening?  Can you single step the code, and see what happens after you fill the array?

you will have the same problem with this code:

        Dim strPhoneNumbers As String = srPhoneNumbers.ReadToEnd
        ObjReader.Close()
 you cannot read the entire Phone number list into a single string - you will need a second array for the Phone numbers.

AW
The file is too big to step through.  It takes forever to fill the array.  I am going to run it on a smaller file to see if the new code works at all.

The Phone number list is only a few lines long.  Do I still need to create an additional array?

Thanks
Diammond
The new code does not work on small file or big file.  The data.txt file is getting wiped out to zero bytes.  
Here is the modified code again.  The debugger is  reporting "value cannot be null. parameter name: input" at the following line:
swNewDataFile.Write(Regex.Replace(strDataFile, matchpattern, myEval))

If I create an array for the second file, the debugger reports the same error at the following line:
Dim mcNumbers As MatchCollection = reNumbers.Matches(strPhoneNumbers)

Not sure what to do at this point.  Can you help further?

Thanks
Diammond




Imports System.Text.RegularExpressions

Module Module1
    Public strNewPhoneNumber As String = """2126420001"""
    Public intRepCount As Integer = 0
    Sub Main()
        If IO.File.Exists("c:\reports\replaced.log") Then
            IO.File.Delete("c:\reports\replaced.log")
        End If
        Dim srDataFile As New StreamReader("c:\reports\data.txt")
        Dim strDataFile As String = ""
        Dim arrText As New ArrayList()

        Do
            strDataFile = srDataFile.ReadLine()
            If Not strDataFile Is Nothing Then
                arrText.Add(strDataFile)
            End If
        Loop Until strDataFile Is Nothing
        srDataFile.Close()
 
        Dim srPhoneNumbers As New System.IO.StreamReader("c:\realPhoneNumbers.txt")
        Dim strPhoneNumbers As String = srPhoneNumbers.ReadToEnd
        srPhoneNumbers.Close()
 
        Dim reNumbers As Regex = New Regex("(?<=#PhoneNumber\r\n"")\d+(?="")")
        Dim mcNumbers As MatchCollection = reNumbers.Matches(strPhoneNumbers)
        Dim strPhoneNumbersToExclude As String = ""
        For Each mNumber As Match In mcNumbers
            strPhoneNumbersToExclude = strPhoneNumbersToExclude & mNumber.Groups(0).Value & "|"
        Next
        strPhoneNumbersToExclude = Left(strPhoneNumbersToExclude, Len(strPhoneNumbersToExclude) - 1)
 
        Dim matchpattern As String = "(?<=#PhoneNumber\r\n)""(?:(?!" & strPhoneNumbersToExclude & ")\d+)"""
        Dim myEval As MatchEvaluator = New MatchEvaluator(AddressOf ReplaceMatch)
 
        Dim swNewDataFile As New System.IO.StreamWriter("c:\reports\data.txt")
        swNewDataFile.Write(Regex.Replace(strDataFile, matchpattern, myEval))
        swNewDataFile.Close()
        Console.WriteLine(intRepCount & " replacements, finished at " & Now())
    End Sub
    Public Function ReplaceMatch(ByVal m As Match) As String
        Dim swLogFile As New System.IO.StreamWriter("c:\reports\replaced.log", True)
        Dim strLogLine As String = "Replaced " & m.Groups(0).Value & " with " & strNewPhoneNumber & " " & Now()
        Console.WriteLine(strLogLine)
        swLogFile.WriteLine(strLogLine)
        swLogFile.Close()
        intRepCount = intRepCount + 1
        Return strNewPhoneNumber
    End Function
End Module
Avatar of Bob Learned
Youuuuu rannngggg?
ASKER CERTIFIED SOLUTION
Avatar of Diammond
Diammond

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial