Solved

# Need Help With Regular Expressions & VB.NET

Posted on 2010-09-14
566 Views
Greetings!  I am working on a vb.net application to extract and format text from a text file.
The basic goal: remove certain text and create 4 separate text files from the remaining file.

I've not gotten very far and am probably in over my head, but I'll post the little bit of code I do have.

I am willing to learn, but I will need help.
'remove 1st column and its trailing white space

Sub StepOne()
Dim strCurrent As String = ""
Dim strRoot As String = ""
Try
strCurrent = Directory.GetCurrentDirectory()
'strRoot = Directory.GetDirectoryRoot(strCurrent)
'strRoot = Directory.GetDirectoryRoot("\")
Catch E As Exception
'Console.WriteLine("Error determining root directory")
MessageBox.Show(E.Message)
End Try
Dim strfileName As String
strfileName = strCurrent & "\" & "hotlist.txt"

Replace(strfileName, "\n1 ", "\n")

End Sub

'GENERIC REGEX SEARCH & REPLACE.  Use Regular Expressions to search/replace within file
Function Replace(ByRef file As String, ByRef searchFor As String, ByRef replaceWith As String) As Boolean
'function used during Validation
'use Regular Expressions to search for and replace user defined variables
Try
'use regular expressions to search and replace text
contents = Regex.Replace(contents, searchFor, replaceWith, RegexOptions.IgnoreCase Or RegexOptions.Compiled)
Dim writer As New StreamWriter(file)            'get a StreamWriter for writing the new text to the file
writer.Write(contents)                          'write the contents
writer.Close()
writer.Dispose()                                'close up and dispose
Return True                                     'return successful
Catch generatedExceptionName As Exception
MessageBox.Show(generatedExceptionName.Message)
Return False
End Try
End Function

0
Question by:asc2010
• 12
• 7
• 6
• +2

Author Comment

ID: 33675998
Ok, I've got the first part done...remove the first two characters of each line.  It seems a bit crude though:

Replace(strfileName, "^1 ", "")
Replace(strfileName, "\r|\n1 ", Chr(10))

0

Author Comment

ID: 33677071
No takers here?  Well, I am continuing to work on this, here is the code I have so far:
Option Explicit On

Imports System
Imports System.IO
Imports System.IO.File
Imports System.Text.RegularExpressions
Imports System.Text.RegularExpressions.Match
Imports System.Text.RegularExpressions.MatchCollection
'
'requirements:
'invisible form (will be called via batch file)
'Remove the 1st column and its trailing white space
'Remove all rows that are not Ohio
'Remove columns 3, 4, and 9
'Create text files based on a letter-filter: V=Stolen.txt, W=Wanted.txt, P=LP.txt, M=MP.txt
'Create all text files as comma sperated

Public Class DOJ_Parser_Hilliard

Public strFileName As String = "hotlist.txt"
Public strCurrent As String = Directory.GetCurrentDirectory()
Public strGlobalFilePath As String = strCurrent & "\" & strFileName

Private Sub DOJ_Parser_Hilliard_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

If VerifyFileExists() = True Then
StepOne()
StepTwo()
Else
Me.Close()
End If

End Sub

Function VerifyFileExists()
'this function runs on form load
Try
strCurrent = Directory.GetCurrentDirectory()
Catch E As Exception
MessageBox.Show(E.Message)
End Try
If Not Exists(strGlobalFilePath) Then
Return False
Else
Return True
End If
End Function

Sub StepOne()
'remove 1st column and its trailing white space
'regex replace
Replace(strGlobalFilePath, "^1 ", "")
Replace(strGlobalFilePath, "\r|\n1 ", Chr(10))
End Sub

Sub StepTwo()
'remove all rows that are not OH
'regex match
Dim strOhioCollection As String = ""
Dim strRegexMatch As String = Nothing 'prepare a string for regex matching
strRegexMatch = "^.*OH.*$" If Regex.IsMatch(strGlobalFilePath, strRegexMatch) Then 'if a match is found in the text file, return it to the requestor method Dim theMatchedVariable = Nothing Dim r As New Regex(strRegexMatch, RegexOptions.IgnoreCase Or RegexOptions.Compiled) theMatchedVariable = r.Match(strTextFileContents).Result("^.*OH.*$")
'open a streamwriter
Dim sw As StreamWriter = File.AppendText(strGlobalFilePath)
sw.Write(vbCrLf & theMatchedVariable) 'write to file
sw.Flush() 'update file
sw.Close()
sw.Dispose()

Else 'if file does not exist, display error notice
MessageBox.Show("WARNING! ERROR!", "ERROR")
End If

End Sub

Function StepThree(ByVal strMatchThis As String)
're-write hostlist.txt with Ohio content
'streamwriter
Dim theMessage As String = Nothing
Dim theMessageTitle As String = Nothing

Dim strRegexMatch As String = Nothing                               'prepare a string for regex matching

Select Case strMatchThis
Case "deployment"                                               'if passed variable is deployment, do the following variable assignment
strRegexMatch = "\$LOCAL::DEPLOYMENTS\$=(?<matchvariable>.+)"
Case "enterprise"                                               'if passed variable is enterprise, do the following variable assignment
strRegexMatch = "\$LOCAL::ENTERPRISE\$=\$LOCAL::DEPLOYMENTS\$(?<matchvariable>.+)"
Case "system"                                                   'if passed variable is system, do the following variable assignment
strRegexMatch = "\$LOCAL::SYSTEM\$=(?<matchvariable>.+)"
End Select

If Regex.IsMatch(strTextFileContents, strRegexMatch) Then           'if a match is found in the text file, return it to the requestor method
Dim theMatchedVariable = Nothing
Dim r As New Regex(strRegexMatch, RegexOptions.IgnoreCase Or RegexOptions.Compiled)
theMatchedVariable = r.Match(strTextFileContents).Result("\${matchvariable}")
Return theMatchedVariable
Else                                                                'if file does not exist, display error notice
MessageBox.Show(theMessage, theMessageTitle)
Return False
End If
End Function

'GENERIC REGEX SEARCH & REPLACE.  Use Regular Expressions to search/replace within file
Function Replace(ByRef file As String, ByRef searchFor As String, ByRef replaceWith As String) As Boolean
'function used during Validation
'use Regular Expressions to search for and replace user defined variables
Try
'use regular expressions to search and replace text
contents = Regex.Replace(contents, searchFor, replaceWith, RegexOptions.IgnoreCase Or RegexOptions.Compiled)
Dim writer As New StreamWriter(file)            'get a StreamWriter for writing the new text to the file
writer.Write(contents)                          'write the contents
writer.Close()
writer.Dispose()                                'close up and dispose
Return True                                     'return successful
Catch generatedExceptionName As Exception
MessageBox.Show(generatedExceptionName.Message)
Return False
End Try
End Function

0

LVL 53

Expert Comment

ID: 33679715
>> The basic goal: remove certain text and create 4 separate text files from the remaining file.

You say a goal, but you never ask a question. What is the problem that you still have ? From what are you starting and what's the aim ?
0

LVL 75

Expert Comment

ID: 33679730
Ah, a taker :)

I think the question is "How to use RegEx in C# to manipulate text and write 4 files with the result"
0

LVL 53

Expert Comment

ID: 33679751
>> How to use RegEx in C# to manipulate text and write 4 files with the result

I understand that but there is nothing mentioned of how the starting text looks like, based on what information the text needs to split to multiple files, ...
0

Author Comment

ID: 33682664
First, I apologize for offering monetary compensation.  My offer was not intended to offend anyone, only expedite my request for help.

@ Dhaest: My question should have been stated as mplungjan put it - "How to use RegEx in C# or VB.NET to manipulate text and write 4 files with the result"

@ mplungjan: Thank you for clarifying my request.

@ aikimark: I put this request into the C# zone because I can figure out how to work with C# if someone here can help.  I have experience with VB.NET so I started with that language.

Attached is an excerpt of the text file I am working with.  There are  11 columns in the file, however, some are blank and I need to account for them as well.  I only need the rows that contain the text "OH" in the 3rd column.  Out of those, I only need columns 2, 3, 6, 7, 8, 9, and 11.

Once I get the correct rows and columns, I need to comma separate them and write them to separate text files based on column 11.
If column 11 starts with "V", create a file called "Stolen.txt"
If column 11 starts with "W", create a file called "Wanted.txt"
If column 11 starts with "P", create a file called "LP.txt"
If column 11 starts with "M", create a file called "MP.txt"
example.txt
0

LVL 53

Expert Comment

ID: 33682733
I'll take a look at it tomorrow ...
0

LVL 53

Expert Comment

ID: 33682815
Another question: why are you so eager to use regular expressions ? There are easier ways to handle this.
I'm thinking out loud: reading the file into an list of objects (with 11 properties) and use linq to get the correct results to store into the files.

Is the file a fixed-length file (or not like the example you provided, or is there a fault in the 4th line) ?
0

Author Comment

ID: 33682878
@ Dhaest: I'm going the regex route because I could not think of anything else...I am open for anything simpler.  This program will be placed on a shared computer and called from a batch file.  The user should never even know it exists, so as long it gets the job done, I am open to suggestions!
0

LVL 53

Expert Comment

ID: 33682933
Is there a file-format known ? Does it have fixed lengths, separator, ... ?
0

Author Comment

ID: 33683051
I'm not sure what you mean.  The attached file is exactly what I'm working with...it sucks I know.  The end result - the 4 text files - will be comma separated for columns.
0

LVL 85

Expert Comment

ID: 33683062
It looks fixed width but the 4th line doesn't line up properly...

We'd need to know EXACTLY how the file is formatted to be able to help you!
VariableFixedWidth.jpg
0

LVL 53

Expert Comment

ID: 33683072
I'll create the query's tomorrow. Here you already have an example of how to load the file into a list of objects

   Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
parseFile()

end sub
Private Sub parseFile()
Dim TextLine As String

Dim fileLines As List(Of FileLine) = New List(Of FileLine)()
Dim myLine As FileLine

myLine = ParseLine(TextLine)
If Not myLine Is Nothing Then
End If
Loop

End Sub

Private Function ParseLine(ByVal textline As String)
Dim textSplit() As String = textline.Split(" ")
Dim line As FileLine = New FileLine(textSplit)

If line.column3.Contains("OH") Then
Return line
End If
Return Nothing
End Function

Public Class FileLine
Public column1 As String
Public column2 As String
Public column3 As String
Public column4 As String
Public column5 As String
Public column6 As String
Public column7 As String
Public column8 As String
Public column9 As String
Public column10 As String
Public column11 As String

Public Sub New(ByVal params() As String)
column1 = params(0)
column2 = params(1)
column3 = params(2)
column4 = params(3)
column5 = params(4)
column6 = params(5)
column7 = params(6)
column8 = params(7)
column9 = params(8)
column10 = params(9)
column11 = params(10)
End Sub

End Class

0

Author Comment

ID: 33683093
Idle_Mind,
The file is very ugly and not fixed-width.  My approach was going to be finding the white space and try to format it based on that...sorry
0

LVL 85

Expert Comment

ID: 33683131
We really can't help unless you can explain in plain English how the "records" are structured.

Reading lines and writing lines in a file is simple...but if we can't discern where the columns start/stop then we are of no use to you.

Yes...you'd need to explain in EXCRUCIATING detail how the file works...   =\
0

Author Comment

ID: 33683251
Idle_Mind,

Each white space starts/ends a new column.  The data originates from a website.  someone extracts this data and it gets placed into the text file exactly as I have sent it.  Each record originally has 11 columns.  Some of those columns are empty and some are not.
0

LVL 85

Expert Comment

ID: 33683291
Ah...gotcha!

...approximately how many lines in the actual file?

*We need to decide if the whole thing will be read into memory at once or if it should be processed only one line at a time.
0

Author Comment

ID: 33683365
Idle_Mind,

The number of lines in the file vary by day.  The example file I provided contained only the first few lines out of 1,249.  Unfortunately, this will be an unknown amount as it changes on a daily basis.
0

LVL 85

Expert Comment

ID: 33683438
Ok...

1,000 lines is no problem.

10,000 lines would probably be fine too on most systems.

100,000 lines we might want to consider a different approach.

1,000,000 lines definitely needs a something else.

Is there a reasonable upper bound for the max # of lines that might be in the daily file?
0

Author Comment

ID: 33683466
Idle_Mind,

lol I see your point there.  I do not believe there will be more than 5,000 lines on any given day.  If there are, than this agency has got way bigger issues than processing power!
0

LVL 85

Expert Comment

ID: 33683914
Here's an in-elegant solution that simply appends the lines to the existing data in the output files.

It is based on your rules:

I only need the rows that contain the text "OH" in the 3rd column.
Out of those, I only need columns 2, 3, 6, 7, 8, 9, and 11.

Once I get the correct rows and columns, I need to comma separate them and write them to separate text files based on column 11.
If column 11 starts with "V", create a file called "Stolen.txt"
If column 11 starts with "W", create a file called "Wanted.txt"
If column 11 starts with "P", create a file called "LP.txt"
If column 11 starts with "M", create a file called "MP.txt"

*I didn't do any bounds checking!  If lines might exist with LESS than 11 columns or it there are complement blank lines then you'll need to do some extra checking:
Public Class Form1

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim DataFile As String = "Example.txt"
Dim V_File As String = "Stolen.txt"
Dim W_File As String = "Wanted.txt"
Dim LP_File As String = "LP.txt"
Dim MP_File As String = "MP.txt"

Using sr As New System.IO.StreamReader(System.IO.Path.Combine(Path, DataFile))
While Not sr.EndOfStream
Dim values As New List(Of String)
If values(2).ToUpper = "OH" Then ' process lines with "OH" in the 3rd column
' remove columns 10,5,4,1
values.RemoveAt(9)
values.RemoveAt(5)
values.RemoveAt(4)
values.RemoveAt(0)
Dim output As String = String.Join(",", values.ToArray) & Environment.NewLine

' output to file based on first character of value in last colmun
Select Case values(values.Count - 1).Substring(0, 1).ToUpper
Case "V"
My.Computer.FileSystem.WriteAllText(System.IO.Path.Combine(Path, V_File), output, True)

Case "W"
My.Computer.FileSystem.WriteAllText(System.IO.Path.Combine(Path, W_File), output, True)

Case "P"
My.Computer.FileSystem.WriteAllText(System.IO.Path.Combine(Path, LP_File), output, True)

Case "M"
My.Computer.FileSystem.WriteAllText(System.IO.Path.Combine(Path, MP_File), output, True)

End Select
End If
End While
End Using

End Sub

End Class

0

Author Comment

ID: 33684348
Idle_Mind,

WOW!  This works like a charm!  I can't believe how quickly you came up with this!!  I understand the WriteAllText() has a true/false parameter to append or overwrite.  But is there a way to overwrite any existing data in the files just prior to writing the new data to it?  I can throw in a a quick function to erase any existing data, but am curious to see your opinion.
0

LVL 85

Expert Comment

ID: 33684506
Quick fix...
    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim DataFile As String = "Example.txt"
Dim V_File As String = "Stolen.txt"
Dim W_File As String = "Wanted.txt"
Dim LP_File As String = "LP.txt"
Dim MP_File As String = "MP.txt"

Using sw_V As New System.IO.StreamWriter(System.IO.Path.Combine(Path, V_File), False)
Using sw_W As New System.IO.StreamWriter(System.IO.Path.Combine(Path, W_File), False)
Using sw_LP As New System.IO.StreamWriter(System.IO.Path.Combine(Path, LP_File), False)
Using sw_MP As New System.IO.StreamWriter(System.IO.Path.Combine(Path, MP_File), False)

Using sr As New System.IO.StreamReader(System.IO.Path.Combine(Path, DataFile))
While Not sr.EndOfStream
Dim values As New List(Of String)
If values(2).ToUpper = "OH" Then ' process lines with "OH" in the 3rd column
' remove columns 10,5,4,1
values.RemoveAt(9)
values.RemoveAt(5)
values.RemoveAt(4)
values.RemoveAt(0)
Dim output As String = String.Join(",", values.ToArray)

' output to file based on first character of value in last colmun
Select Case values(values.Count - 1).Substring(0, 1).ToUpper
Case "V"
sw_V.WriteLine(output)

Case "W"
sw_W.WriteLine(output)

Case "P"
sw_LP.WriteLine(output)

Case "M"
sw_MP.WriteLine(output)

End Select
End If
End While
End Using

End Using
End Using
End Using
End Using
End Sub

0

LVL 45

Expert Comment

ID: 33684575
It might be possible to bring the data down from the web page.  What is the URL?
0

Author Comment

ID: 33684636
@ aikimark:  I do not have access to the website, nor will I be granted access to the website.  I am only allowed access to the data provided in the text file because all personal information has been removed from it.
0

LVL 85

Accepted Solution

Mike Tomlinson earned 500 total points
ID: 33685341
Did you miss the fix back here to overwrite the file each time?
http://www.experts-exchange.com/Programming/Languages/.NET/Q_26472802.html#33684506
0

Author Closing Comment

ID: 33685386
Idle_Mind,

I did get that, thank you.  I have been testing it on a couple of older computers and cannot find fault with your code.  You sir, are a genius!!  I will accept your answer as the solution.

Thank you so much!!
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

For those of you who don't follow the news, or just happen to live under rocks, Microsoft Research released a beta SDK (http://www.microsoft.com/en-us/download/details.aspx?id=27876) for the Xbox 360 Kinect. If you don't know what a Kinect is (http:…
Wouldn’t it be nice if you could test whether an element is contained in an array by using a Contains method just like the one available on List objects? Wouldn’t it be good if you could write code like this? (CODE) In .NET 3.5, this is possible…
Established in 1997, Technology Architects has become one of the most reputable technology solutions companies in the country. TA have been providing businesses with cost effective state-of-the-art solutions and unparalleled service that is designed…
Progress