Diammond
asked on
Visual Basic code to search and replace within a large data file
Hello. A few months ago an expert helped me develop visual basic code to perform a search and replace in a data file. At the time the file was about 100mb. The file has grown to over 200mb and the program does not work anymore. Is there a way to modify the code so that it won't bomb on files over 200mb? Any help would be greatly appreciated.
Thank you
Diammond
Thank you
Diammond
Imports System.Text.RegularExpressions
Module Module1
Public strNewPhoneNumber As String = """2126420001"""
Public intRepCount As Integer = 0
Sub Main()
If IO.File.Exists("c:\reports\replaced.log") Then
IO.File.Delete("c:\reports\replaced.log")
End If
Dim srDataFile As New System.IO.StreamReader("c:\reports\data.txt")
Dim strDataFile As String = srDataFile.ReadToEnd
srDataFile.Close()
Dim srPhoneNumbers As New System.IO.StreamReader("c:\realPhoneNumbers.txt")
Dim strPhoneNumbers As String = srPhoneNumbers.ReadToEnd
srDataFile.Close()
Dim reNumbers As Regex = New Regex("(?<=#PhoneNumber\r\n"")\d+(?="")")
Dim mcNumbers As MatchCollection = reNumbers.Matches(strPhoneNumbers)
Dim strPhoneNumbersToExclude As String = ""
For Each mNumber As Match In mcNumbers
strPhoneNumbersToExclude = strPhoneNumbersToExclude & mNumber.Groups(0).Value & "|"
Next
strPhoneNumbersToExclude = Left(strPhoneNumbersToExclude, Len(strPhoneNumbersToExclude) - 1)
Dim matchpattern As String = "(?<=#PhoneNumber\r\n)""(?:(?!" & strPhoneNumbersToExclude & ")\d+)"""
Dim myEval As MatchEvaluator = New MatchEvaluator(AddressOf ReplaceMatch)
Dim swNewDataFile As New System.IO.StreamWriter("c:\reports\data.txt")
swNewDataFile.Write(Regex.Replace(strDataFile, matchpattern, myEval))
swNewDataFile.Close()
Console.WriteLine(intRepCount & " replacements, finished at " & Now())
End Sub
Public Function ReplaceMatch(ByVal m As Match) As String
Dim swLogFile As New System.IO.StreamWriter("c:\reports\replaced.log", True)
Dim strLogLine As String = "Replaced " & m.Groups(0).Value & " with " & strNewPhoneNumber & " " & Now()
Console.WriteLine(strLogLine)
swLogFile.WriteLine(strLogLine)
swLogFile.Close()
intRepCount = intRepCount + 1
Return strNewPhoneNumber
End Function
End Module
ASKER
Thanks for your response. Here is my revised code. However it is not working. Any suggestions?
Imports System.Text.RegularExpress ions
Imports System.IO
Imports System.Collections
Module Module1
Public strNewPhoneNumber As String = """2126420001"""
Public intRepCount As Integer = 0
Sub Main()
If IO.File.Exists("c:\reports \replaced. log") Then
IO.File.Delete("c:\reports \replaced. log")
End If
Dim objReader As New StreamReader("c:\reports\d ata.txt")
Dim strDataFile As String = ""
Dim arrText As New ArrayList()
Do
strDataFile = objReader.ReadLine()
If Not strDataFile Is Nothing Then
arrText.Add(strDataFile)
End If
Loop Until strDataFile Is Nothing
objReader.Close()
Dim srPhoneNumbers As New System.IO.StreamReader("c: \realPhone Numbers.tx t")
Dim strPhoneNumbers As String = srPhoneNumbers.ReadToEnd
ObjReader.Close()
Dim reNumbers As Regex = New Regex("(?<=#PhoneNumber\r\ n"")\d+(?= "")")
Dim mcNumbers As MatchCollection = reNumbers.Matches(strPhone Numbers)
Dim strPhoneNumbersToExclude As String = ""
For Each mNumber As Match In mcNumbers
strPhoneNumbersToExclude = strPhoneNumbersToExclude & mNumber.Groups(0).Value & "|"
Next
strPhoneNumbersToExclude = Left(strPhoneNumbersToExcl ude, Len(strPhoneNumbersToExclu de) - 1)
Dim matchpattern As String = "(?<=#PhoneNumber\r\n)""(? :(?!" & strPhoneNumbersToExclude & ")\d+)"""
Dim myEval As MatchEvaluator = New MatchEvaluator(AddressOf ReplaceMatch)
Dim swNewDataFile As New System.IO.StreamWriter("c: \reports\d ata.txt")
swNewDataFile.Write(Regex. Replace(st rDataFile, matchpattern, myEval))
swNewDataFile.Close()
Console.WriteLine(intRepCo unt & " replacements, finished at " & Now())
End Sub
Public Function ReplaceMatch(ByVal m As Match) As String
Dim swLogFile As New System.IO.StreamWriter("c: \reports\r eplaced.lo g", True)
Dim strLogLine As String = "Replaced " & m.Groups(0).Value & " with " & strNewPhoneNumber & " " & Now()
Console.WriteLine(strLogLi ne)
swLogFile.WriteLine(strLog Line)
swLogFile.Close()
intRepCount = intRepCount + 1
Return strNewPhoneNumber
End Function
End Module
Imports System.Text.RegularExpress
Imports System.IO
Imports System.Collections
Module Module1
Public strNewPhoneNumber As String = """2126420001"""
Public intRepCount As Integer = 0
Sub Main()
If IO.File.Exists("c:\reports
IO.File.Delete("c:\reports
End If
Dim objReader As New StreamReader("c:\reports\d
Dim strDataFile As String = ""
Dim arrText As New ArrayList()
Do
strDataFile = objReader.ReadLine()
If Not strDataFile Is Nothing Then
arrText.Add(strDataFile)
End If
Loop Until strDataFile Is Nothing
objReader.Close()
Dim srPhoneNumbers As New System.IO.StreamReader("c:
Dim strPhoneNumbers As String = srPhoneNumbers.ReadToEnd
ObjReader.Close()
Dim reNumbers As Regex = New Regex("(?<=#PhoneNumber\r\
Dim mcNumbers As MatchCollection = reNumbers.Matches(strPhone
Dim strPhoneNumbersToExclude As String = ""
For Each mNumber As Match In mcNumbers
strPhoneNumbersToExclude = strPhoneNumbersToExclude & mNumber.Groups(0).Value & "|"
Next
strPhoneNumbersToExclude = Left(strPhoneNumbersToExcl
Dim matchpattern As String = "(?<=#PhoneNumber\r\n)""(?
Dim myEval As MatchEvaluator = New MatchEvaluator(AddressOf ReplaceMatch)
Dim swNewDataFile As New System.IO.StreamWriter("c:
swNewDataFile.Write(Regex.
swNewDataFile.Close()
Console.WriteLine(intRepCo
End Sub
Public Function ReplaceMatch(ByVal m As Match) As String
Dim swLogFile As New System.IO.StreamWriter("c:
Dim strLogLine As String = "Replaced " & m.Groups(0).Value & " with " & strNewPhoneNumber & " " & Now()
Console.WriteLine(strLogLi
swLogFile.WriteLine(strLog
swLogFile.Close()
intRepCount = intRepCount + 1
Return strNewPhoneNumber
End Function
End Module
ASKER
Any more takers on this?
you say "However it is not working" -- what is happening? Can you single step the code, and see what happens after you fill the array?
you will have the same problem with this code:
Dim strPhoneNumbers As String = srPhoneNumbers.ReadToEnd
ObjReader.Close()
you cannot read the entire Phone number list into a single string - you will need a second array for the Phone numbers.
AW
you will have the same problem with this code:
Dim strPhoneNumbers As String = srPhoneNumbers.ReadToEnd
ObjReader.Close()
you cannot read the entire Phone number list into a single string - you will need a second array for the Phone numbers.
AW
ASKER
The file is too big to step through. It takes forever to fill the array. I am going to run it on a smaller file to see if the new code works at all.
The Phone number list is only a few lines long. Do I still need to create an additional array?
Thanks
Diammond
The Phone number list is only a few lines long. Do I still need to create an additional array?
Thanks
Diammond
ASKER
The new code does not work on small file or big file. The data.txt file is getting wiped out to zero bytes.
ASKER
Here is the modified code again. The debugger is reporting "value cannot be null. parameter name: input" at the following line:
swNewDataFile.Write(Regex. Replace(st rDataFile, matchpattern, myEval))
If I create an array for the second file, the debugger reports the same error at the following line:
Dim mcNumbers As MatchCollection = reNumbers.Matches(strPhone Numbers)
Not sure what to do at this point. Can you help further?
Thanks
Diammond
Imports System.Text.RegularExpress ions
Module Module1
Public strNewPhoneNumber As String = """2126420001"""
Public intRepCount As Integer = 0
Sub Main()
If IO.File.Exists("c:\reports \replaced. log") Then
IO.File.Delete("c:\reports \replaced. log")
End If
Dim srDataFile As New StreamReader("c:\reports\d ata.txt")
Dim strDataFile As String = ""
Dim arrText As New ArrayList()
Do
strDataFile = srDataFile.ReadLine()
If Not strDataFile Is Nothing Then
arrText.Add(strDataFile)
End If
Loop Until strDataFile Is Nothing
srDataFile.Close()
Dim srPhoneNumbers As New System.IO.StreamReader("c: \realPhone Numbers.tx t")
Dim strPhoneNumbers As String = srPhoneNumbers.ReadToEnd
srPhoneNumbers.Close()
Dim reNumbers As Regex = New Regex("(?<=#PhoneNumber\r\ n"")\d+(?= "")")
Dim mcNumbers As MatchCollection = reNumbers.Matches(strPhone Numbers)
Dim strPhoneNumbersToExclude As String = ""
For Each mNumber As Match In mcNumbers
strPhoneNumbersToExclude = strPhoneNumbersToExclude & mNumber.Groups(0).Value & "|"
Next
strPhoneNumbersToExclude = Left(strPhoneNumbersToExcl ude, Len(strPhoneNumbersToExclu de) - 1)
Dim matchpattern As String = "(?<=#PhoneNumber\r\n)""(? :(?!" & strPhoneNumbersToExclude & ")\d+)"""
Dim myEval As MatchEvaluator = New MatchEvaluator(AddressOf ReplaceMatch)
Dim swNewDataFile As New System.IO.StreamWriter("c: \reports\d ata.txt")
swNewDataFile.Write(Regex. Replace(st rDataFile, matchpattern, myEval))
swNewDataFile.Close()
Console.WriteLine(intRepCo unt & " replacements, finished at " & Now())
End Sub
Public Function ReplaceMatch(ByVal m As Match) As String
Dim swLogFile As New System.IO.StreamWriter("c: \reports\r eplaced.lo g", True)
Dim strLogLine As String = "Replaced " & m.Groups(0).Value & " with " & strNewPhoneNumber & " " & Now()
Console.WriteLine(strLogLi ne)
swLogFile.WriteLine(strLog Line)
swLogFile.Close()
intRepCount = intRepCount + 1
Return strNewPhoneNumber
End Function
End Module
swNewDataFile.Write(Regex.
If I create an array for the second file, the debugger reports the same error at the following line:
Dim mcNumbers As MatchCollection = reNumbers.Matches(strPhone
Not sure what to do at this point. Can you help further?
Thanks
Diammond
Imports System.Text.RegularExpress
Module Module1
Public strNewPhoneNumber As String = """2126420001"""
Public intRepCount As Integer = 0
Sub Main()
If IO.File.Exists("c:\reports
IO.File.Delete("c:\reports
End If
Dim srDataFile As New StreamReader("c:\reports\d
Dim strDataFile As String = ""
Dim arrText As New ArrayList()
Do
strDataFile = srDataFile.ReadLine()
If Not strDataFile Is Nothing Then
arrText.Add(strDataFile)
End If
Loop Until strDataFile Is Nothing
srDataFile.Close()
Dim srPhoneNumbers As New System.IO.StreamReader("c:
Dim strPhoneNumbers As String = srPhoneNumbers.ReadToEnd
srPhoneNumbers.Close()
Dim reNumbers As Regex = New Regex("(?<=#PhoneNumber\r\
Dim mcNumbers As MatchCollection = reNumbers.Matches(strPhone
Dim strPhoneNumbersToExclude As String = ""
For Each mNumber As Match In mcNumbers
strPhoneNumbersToExclude = strPhoneNumbersToExclude & mNumber.Groups(0).Value & "|"
Next
strPhoneNumbersToExclude = Left(strPhoneNumbersToExcl
Dim matchpattern As String = "(?<=#PhoneNumber\r\n)""(?
Dim myEval As MatchEvaluator = New MatchEvaluator(AddressOf ReplaceMatch)
Dim swNewDataFile As New System.IO.StreamWriter("c:
swNewDataFile.Write(Regex.
swNewDataFile.Close()
Console.WriteLine(intRepCo
End Sub
Public Function ReplaceMatch(ByVal m As Match) As String
Dim swLogFile As New System.IO.StreamWriter("c:
Dim strLogLine As String = "Replaced " & m.Groups(0).Value & " with " & strNewPhoneNumber & " " & Now()
Console.WriteLine(strLogLi
swLogFile.WriteLine(strLog
swLogFile.Close()
intRepCount = intRepCount + 1
Return strNewPhoneNumber
End Function
End Module
Youuuuu rannngggg?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Dim strDataFile As String = srDataFile.ReadToEnd
you are reading the ENTIRE file into memory (into a single string object, by the way), in one fell swoop.
you can break this down, so it reads and process one line at a time - may be slow, but it will not choke.
simply wrap the processing in a Loop, that reads one line from the input file, proicesses that line, writes the processed line to the output file, and reads the next line, untill you have processed the entire original file.
If you need help, just ask.
AW