Solved

Need Help With Regular Expressions & VB.NET

Posted on 2010-09-14
30
563 Views
Last Modified: 2012-08-13
Greetings!  I am working on a vb.net application to extract and format text from a text file.  
The basic goal: remove certain text and create 4 separate text files from the remaining file.

I've not gotten very far and am probably in over my head, but I'll post the little bit of code I do have.

I am willing to learn, but I will need help.
'remove 1st column and its trailing white space

    Sub StepOne()
        Dim strCurrent As String = ""
        Dim strRoot As String = ""
        Try
            strCurrent = Directory.GetCurrentDirectory()
            'strRoot = Directory.GetDirectoryRoot(strCurrent)
            'strRoot = Directory.GetDirectoryRoot("\")
        Catch E As Exception
            'Console.WriteLine("Error determining root directory")
            MessageBox.Show(E.Message)
        End Try
        'read hostsfile.txt
        Dim strfileName As String
        strfileName = strCurrent & "\" & "hotlist.txt"

        Replace(strfileName, "\n1 ", "\n")

    End Sub


    'GENERIC REGEX SEARCH & REPLACE.  Use Regular Expressions to search/replace within file
    Function Replace(ByRef file As String, ByRef searchFor As String, ByRef replaceWith As String) As Boolean
        'function used during Validation
        'use Regular Expressions to search for and replace user defined variables
        Try
            Dim reader As New StreamReader(file)            'get a StreamReader for reading the file
            Dim contents As String = reader.ReadToEnd()     'read the entire file at once
            reader.Close()
            reader.Dispose()                                'close up and dispose
            'use regular expressions to search and replace text
            contents = Regex.Replace(contents, searchFor, replaceWith, RegexOptions.IgnoreCase Or RegexOptions.Compiled)
            Dim writer As New StreamWriter(file)            'get a StreamWriter for writing the new text to the file
            writer.Write(contents)                          'write the contents
            writer.Close()
            writer.Dispose()                                'close up and dispose
            Return True                                     'return successful
        Catch generatedExceptionName As Exception
            MessageBox.Show(generatedExceptionName.Message)
            Return False
        End Try
    End Function

Open in new window

0
Comment
Question by:asc2010
  • 12
  • 7
  • 6
  • +2
30 Comments
 

Author Comment

by:asc2010
ID: 33675998
Ok, I've got the first part done...remove the first two characters of each line.  It seems a bit crude though:

Replace(strfileName, "^1 ", "")
Replace(strfileName, "\r|\n1 ", Chr(10))

Open in new window

0
 

Author Comment

by:asc2010
ID: 33677071
No takers here?  Well, I am continuing to work on this, here is the code I have so far:
Option Explicit On

Imports System
Imports System.IO
Imports System.IO.File
Imports System.Text.RegularExpressions
Imports System.Text.RegularExpressions.Match
Imports System.Text.RegularExpressions.MatchCollection
'
'requirements:
'invisible form (will be called via batch file)
'Remove the 1st column and its trailing white space
'Remove all rows that are not Ohio
'Remove columns 3, 4, and 9
'Create text files based on a letter-filter: V=Stolen.txt, W=Wanted.txt, P=LP.txt, M=MP.txt
'Create all text files as comma sperated

Public Class DOJ_Parser_Hilliard

    Public strFileName As String = "hotlist.txt"
    Public strCurrent As String = Directory.GetCurrentDirectory()
    Public strGlobalFilePath As String = strCurrent & "\" & strFileName

    Private Sub DOJ_Parser_Hilliard_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

        Label1.Text = "Loading . . ."

        If VerifyFileExists() = True Then
            StepOne()
            StepTwo()
        Else
            Me.Close()
        End If


    End Sub

    Function VerifyFileExists()
        'this function runs on form load
        Try
            strCurrent = Directory.GetCurrentDirectory()
        Catch E As Exception
            MessageBox.Show(E.Message)
        End Try
        If Not Exists(strGlobalFilePath) Then
            MessageBox.Show("File Not Found.", "WARNING!")
            Return False
        Else
            Return True
        End If
    End Function

    Sub StepOne()
        'remove 1st column and its trailing white space
        'regex replace
        Replace(strGlobalFilePath, "^1 ", "")
        Replace(strGlobalFilePath, "\r|\n1 ", Chr(10))
    End Sub
    
    Sub StepTwo()
        'remove all rows that are not OH
        'regex match
        Dim strOhioCollection As String = ""
        Dim strRegexMatch As String = Nothing 'prepare a string for regex matching
        Dim srTextFile As New StreamReader(strGlobalFilePath)   'initialize streamreader
        Dim strTextFileContents As String = srTextFile.ReadToEnd()          'read in entire text file
        srTextFile.Close()                                                  'close stream reader
        strRegexMatch = "^.*OH.*$"

        If Regex.IsMatch(strGlobalFilePath, strRegexMatch) Then           'if a match is found in the text file, return it to the requestor method
            Dim theMatchedVariable = Nothing
            Dim r As New Regex(strRegexMatch, RegexOptions.IgnoreCase Or RegexOptions.Compiled)
            theMatchedVariable = r.Match(strTextFileContents).Result("^.*OH.*$")
            'open a streamwriter
            Dim sw As StreamWriter = File.AppendText(strGlobalFilePath)
            sw.Write(vbCrLf & theMatchedVariable) 'write to file
            sw.Flush() 'update file 
            sw.Close()
            sw.Dispose()

        Else 'if file does not exist, display error notice
            MessageBox.Show("WARNING! ERROR!", "ERROR")
        End If

    End Sub


    Function StepThree(ByVal strMatchThis As String)
        're-write hostlist.txt with Ohio content
        'sreamreader
        'streamwriter
        Dim theMessage As String = Nothing
        Dim theMessageTitle As String = Nothing

        Dim srTextFile As New StreamReader(strGlobalFilePath)   'initialize streamreader
        Dim strTextFileContents As String = srTextFile.ReadToEnd()          'read in entire text file
        srTextFile.Close()                                                  'close stream reader

        Dim strRegexMatch As String = Nothing                               'prepare a string for regex matching

        Select Case strMatchThis
            Case "deployment"                                               'if passed variable is deployment, do the following variable assignment
                strRegexMatch = "\$LOCAL::DEPLOYMENTS\$=(?<matchvariable>.+)"
                theMessage = "WARNING!   PlateScanDeploymentInstallationPath Not Found."
                theMessageTitle = "File Not Found"
            Case "enterprise"                                               'if passed variable is enterprise, do the following variable assignment
                strRegexMatch = "\$LOCAL::ENTERPRISE\$=\$LOCAL::DEPLOYMENTS\$(?<matchvariable>.+)"
                theMessage = "WARNING!   PlateScanEnterpriseInstallationPath Not Found."
                theMessageTitle = "File Not Found"
            Case "system"                                                   'if passed variable is system, do the following variable assignment
                strRegexMatch = "\$LOCAL::SYSTEM\$=(?<matchvariable>.+)"
                theMessage = "WARNING!   PlateScanSystemInstallationPath Not Found."
                theMessageTitle = "File Not Found"
        End Select

        If Regex.IsMatch(strTextFileContents, strRegexMatch) Then           'if a match is found in the text file, return it to the requestor method
            Dim theMatchedVariable = Nothing
            Dim r As New Regex(strRegexMatch, RegexOptions.IgnoreCase Or RegexOptions.Compiled)
            theMatchedVariable = r.Match(strTextFileContents).Result("${matchvariable}")
            Return theMatchedVariable
        Else                                                                'if file does not exist, display error notice
            MessageBox.Show(theMessage, theMessageTitle)
            Return False
        End If
    End Function

    'GENERIC REGEX SEARCH & REPLACE.  Use Regular Expressions to search/replace within file
    Function Replace(ByRef file As String, ByRef searchFor As String, ByRef replaceWith As String) As Boolean
        'function used during Validation
        'use Regular Expressions to search for and replace user defined variables
        Try
            Dim reader As New StreamReader(file)            'get a StreamReader for reading the file
            Dim contents As String = reader.ReadToEnd()     'read the entire file at once
            reader.Close()
            reader.Dispose()                                'close up and dispose
            'use regular expressions to search and replace text
            contents = Regex.Replace(contents, searchFor, replaceWith, RegexOptions.IgnoreCase Or RegexOptions.Compiled)
            Dim writer As New StreamWriter(file)            'get a StreamWriter for writing the new text to the file
            writer.Write(contents)                          'write the contents
            writer.Close()
            writer.Dispose()                                'close up and dispose
            Return True                                     'return successful
        Catch generatedExceptionName As Exception
            MessageBox.Show(generatedExceptionName.Message)
            Return False
        End Try
    End Function

Open in new window

0
 
LVL 53

Expert Comment

by:Dhaest
ID: 33679715
>> The basic goal: remove certain text and create 4 separate text files from the remaining file.

You say a goal, but you never ask a question. What is the problem that you still have ? From what are you starting and what's the aim ?
0
 
LVL 75

Expert Comment

by:Michel Plungjan
ID: 33679730
Ah, a taker :)

I think the question is "How to use RegEx in C# to manipulate text and write 4 files with the result"
0
 
LVL 53

Expert Comment

by:Dhaest
ID: 33679751
>> How to use RegEx in C# to manipulate text and write 4 files with the result

I understand that but there is nothing mentioned of how the starting text looks like, based on what information the text needs to split to multiple files, ...
0
 

Author Comment

by:asc2010
ID: 33682664
First, I apologize for offering monetary compensation.  My offer was not intended to offend anyone, only expedite my request for help.

@ Dhaest: My question should have been stated as mplungjan put it - "How to use RegEx in C# or VB.NET to manipulate text and write 4 files with the result"

@ mplungjan: Thank you for clarifying my request.

@ aikimark: I put this request into the C# zone because I can figure out how to work with C# if someone here can help.  I have experience with VB.NET so I started with that language.

Attached is an excerpt of the text file I am working with.  There are  11 columns in the file, however, some are blank and I need to account for them as well.  I only need the rows that contain the text "OH" in the 3rd column.  Out of those, I only need columns 2, 3, 6, 7, 8, 9, and 11.

Once I get the correct rows and columns, I need to comma separate them and write them to separate text files based on column 11.
If column 11 starts with "V", create a file called "Stolen.txt"
If column 11 starts with "W", create a file called "Wanted.txt"
If column 11 starts with "P", create a file called "LP.txt"
If column 11 starts with "M", create a file called "MP.txt"
example.txt
0
 
LVL 53

Expert Comment

by:Dhaest
ID: 33682733
I'll take a look at it tomorrow ...
0
 
LVL 53

Expert Comment

by:Dhaest
ID: 33682815
Another question: why are you so eager to use regular expressions ? There are easier ways to handle this.
I'm thinking out loud: reading the file into an list of objects (with 11 properties) and use linq to get the correct results to store into the files.

Is the file a fixed-length file (or not like the example you provided, or is there a fault in the 4th line) ?
0
 

Author Comment

by:asc2010
ID: 33682878
@ Dhaest: I'm going the regex route because I could not think of anything else...I am open for anything simpler.  This program will be placed on a shared computer and called from a batch file.  The user should never even know it exists, so as long it gets the job done, I am open to suggestions!
0
 
LVL 53

Expert Comment

by:Dhaest
ID: 33682933
Is there a file-format known ? Does it have fixed lengths, separator, ... ?
0
 

Author Comment

by:asc2010
ID: 33683051
I'm not sure what you mean.  The attached file is exactly what I'm working with...it sucks I know.  The end result - the 4 text files - will be comma separated for columns.
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 33683062
It looks fixed width but the 4th line doesn't line up properly...

We'd need to know EXACTLY how the file is formatted to be able to help you!
VariableFixedWidth.jpg
0
 
LVL 53

Expert Comment

by:Dhaest
ID: 33683072
I'll create the query's tomorrow. Here you already have an example of how to load the file into a list of objects


   Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

        parseFile()



end sub

    Private Sub parseFile()

        Dim objReader As New System.IO.StreamReader("c:\ee.txt")

        Dim TextLine As String



        Dim fileLines As List(Of FileLine) = New List(Of FileLine)()

        Dim myLine As FileLine



        Do While objReader.Peek() <> -1

            TextLine = objReader.ReadLine()

            myLine = ParseLine(TextLine)

            If Not myLine Is Nothing Then

                fileLines.Add(ParseLine(TextLine))

            End If

        Loop



    End Sub



    Private Function ParseLine(ByVal textline As String)

        Dim textSplit() As String = textline.Split(" ")

        Dim line As FileLine = New FileLine(textSplit)



        If line.column3.Contains("OH") Then

            Return line

        End If

        Return Nothing

    End Function





Public Class FileLine

    Public column1 As String

    Public column2 As String

    Public column3 As String

    Public column4 As String

    Public column5 As String

    Public column6 As String

    Public column7 As String

    Public column8 As String

    Public column9 As String

    Public column10 As String

    Public column11 As String







    Public Sub New(ByVal params() As String)

        column1 = params(0)

        column2 = params(1)

        column3 = params(2)

        column4 = params(3)

        column5 = params(4)

        column6 = params(5)

        column7 = params(6)

        column8 = params(7)

        column9 = params(8)

        column10 = params(9)

        column11 = params(10)

    End Sub



End Class

Open in new window

0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 

Author Comment

by:asc2010
ID: 33683093
Idle_Mind,
The file is very ugly and not fixed-width.  My approach was going to be finding the white space and try to format it based on that...sorry
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 33683131
We really can't help unless you can explain in plain English how the "records" are structured.

Reading lines and writing lines in a file is simple...but if we can't discern where the columns start/stop then we are of no use to you.

Yes...you'd need to explain in EXCRUCIATING detail how the file works...   =\
0
 

Author Comment

by:asc2010
ID: 33683251
Idle_Mind,

Each white space starts/ends a new column.  The data originates from a website.  someone extracts this data and it gets placed into the text file exactly as I have sent it.  Each record originally has 11 columns.  Some of those columns are empty and some are not.  
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 33683291
Ah...gotcha!

...approximately how many lines in the actual file?

*We need to decide if the whole thing will be read into memory at once or if it should be processed only one line at a time.
0
 

Author Comment

by:asc2010
ID: 33683365
Idle_Mind,

The number of lines in the file vary by day.  The example file I provided contained only the first few lines out of 1,249.  Unfortunately, this will be an unknown amount as it changes on a daily basis.
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 33683438
Ok...

1,000 lines is no problem.

10,000 lines would probably be fine too on most systems.

100,000 lines we might want to consider a different approach.

1,000,000 lines definitely needs a something else.

Is there a reasonable upper bound for the max # of lines that might be in the daily file?
0
 

Author Comment

by:asc2010
ID: 33683466
Idle_Mind,

lol I see your point there.  I do not believe there will be more than 5,000 lines on any given day.  If there are, than this agency has got way bigger issues than processing power!
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 33683914
Here's an in-elegant solution that simply appends the lines to the existing data in the output files.

It is based on your rules:

    I only need the rows that contain the text "OH" in the 3rd column.
    Out of those, I only need columns 2, 3, 6, 7, 8, 9, and 11.

    Once I get the correct rows and columns, I need to comma separate them and write them to separate text files based on column 11.
    If column 11 starts with "V", create a file called "Stolen.txt"
    If column 11 starts with "W", create a file called "Wanted.txt"
    If column 11 starts with "P", create a file called "LP.txt"
    If column 11 starts with "M", create a file called "MP.txt"

*I didn't do any bounds checking!  If lines might exist with LESS than 11 columns or it there are complement blank lines then you'll need to do some extra checking:
Public Class Form1



    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

        Dim Path As String = "C:\Users\Mike\Documents\Downloads"

        Dim DataFile As String = "Example.txt"

        Dim V_File As String = "Stolen.txt"

        Dim W_File As String = "Wanted.txt"

        Dim LP_File As String = "LP.txt"

        Dim MP_File As String = "MP.txt"



        Using sr As New System.IO.StreamReader(System.IO.Path.Combine(Path, DataFile))

            While Not sr.EndOfStream

                Dim values As New List(Of String)

                values.AddRange(sr.ReadLine.Split(" "))

                If values(2).ToUpper = "OH" Then ' process lines with "OH" in the 3rd column

                    ' remove columns 10,5,4,1

                    values.RemoveAt(9)

                    values.RemoveAt(5)

                    values.RemoveAt(4)

                    values.RemoveAt(0)

                    Dim output As String = String.Join(",", values.ToArray) & Environment.NewLine



                    ' output to file based on first character of value in last colmun

                    Select Case values(values.Count - 1).Substring(0, 1).ToUpper

                        Case "V"

                            My.Computer.FileSystem.WriteAllText(System.IO.Path.Combine(Path, V_File), output, True)



                        Case "W"

                            My.Computer.FileSystem.WriteAllText(System.IO.Path.Combine(Path, W_File), output, True)



                        Case "P"

                            My.Computer.FileSystem.WriteAllText(System.IO.Path.Combine(Path, LP_File), output, True)



                        Case "M"

                            My.Computer.FileSystem.WriteAllText(System.IO.Path.Combine(Path, MP_File), output, True)



                    End Select

                End If

            End While

        End Using



    End Sub



End Class

Open in new window

0
 

Author Comment

by:asc2010
ID: 33684348
Idle_Mind,

WOW!  This works like a charm!  I can't believe how quickly you came up with this!!  I understand the WriteAllText() has a true/false parameter to append or overwrite.  But is there a way to overwrite any existing data in the files just prior to writing the new data to it?  I can throw in a a quick function to erase any existing data, but am curious to see your opinion.
0
 
LVL 85

Expert Comment

by:Mike Tomlinson
ID: 33684506
Quick fix...
    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

        Dim Path As String = "C:\Users\Mike\Documents\Downloads"

        Dim DataFile As String = "Example.txt"

        Dim V_File As String = "Stolen.txt"

        Dim W_File As String = "Wanted.txt"

        Dim LP_File As String = "LP.txt"

        Dim MP_File As String = "MP.txt"



        Using sw_V As New System.IO.StreamWriter(System.IO.Path.Combine(Path, V_File), False)

            Using sw_W As New System.IO.StreamWriter(System.IO.Path.Combine(Path, W_File), False)

                Using sw_LP As New System.IO.StreamWriter(System.IO.Path.Combine(Path, LP_File), False)

                    Using sw_MP As New System.IO.StreamWriter(System.IO.Path.Combine(Path, MP_File), False)



                        Using sr As New System.IO.StreamReader(System.IO.Path.Combine(Path, DataFile))

                            While Not sr.EndOfStream

                                Dim values As New List(Of String)

                                values.AddRange(sr.ReadLine.Split(" "))

                                If values(2).ToUpper = "OH" Then ' process lines with "OH" in the 3rd column

                                    ' remove columns 10,5,4,1

                                    values.RemoveAt(9)

                                    values.RemoveAt(5)

                                    values.RemoveAt(4)

                                    values.RemoveAt(0)

                                    Dim output As String = String.Join(",", values.ToArray)



                                    ' output to file based on first character of value in last colmun

                                    Select Case values(values.Count - 1).Substring(0, 1).ToUpper

                                        Case "V"

                                            sw_V.WriteLine(output)



                                        Case "W"

                                            sw_W.WriteLine(output)



                                        Case "P"

                                            sw_LP.WriteLine(output)



                                        Case "M"

                                            sw_MP.WriteLine(output)



                                    End Select

                                End If

                            End While

                        End Using



                    End Using

                End Using

            End Using

        End Using

    End Sub

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
ID: 33684575
It might be possible to bring the data down from the web page.  What is the URL?
0
 

Author Comment

by:asc2010
ID: 33684636
@ aikimark:  I do not have access to the website, nor will I be granted access to the website.  I am only allowed access to the data provided in the text file because all personal information has been removed from it.
0
 
LVL 85

Accepted Solution

by:
Mike Tomlinson earned 500 total points
ID: 33685341
Did you miss the fix back here to overwrite the file each time?
http://www.experts-exchange.com/Programming/Languages/.NET/Q_26472802.html#33684506
0
 

Author Closing Comment

by:asc2010
ID: 33685386
Idle_Mind,

I did get that, thank you.  I have been testing it on a couple of older computers and cannot find fault with your code.  You sir, are a genius!!  I will accept your answer as the solution.

Thank you so much!!
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

More often than not, we developers are confronted with a need: a need to make some kind of magic happen via code. Whether it is for a client, for the boss, or for our own personal projects, the need must be satisfied. Most of the time, the Framework…
Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
This video explains how to create simple products associated to Magento configurable product and offers fast way of their generation with Store Manager for Magento tool.

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now