Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Need Help With Regular Expressions & VB.NET

Posted on 2010-09-14
30
Medium Priority
?
573 Views
Last Modified: 2012-08-13
Greetings!  I am working on a vb.net application to extract and format text from a text file.  
The basic goal: remove certain text and create 4 separate text files from the remaining file.

I've not gotten very far and am probably in over my head, but I'll post the little bit of code I do have.

I am willing to learn, but I will need help.
'remove 1st column and its trailing white space

    Sub StepOne()
        Dim strCurrent As String = ""
        Dim strRoot As String = ""
        Try
            strCurrent = Directory.GetCurrentDirectory()
            'strRoot = Directory.GetDirectoryRoot(strCurrent)
            'strRoot = Directory.GetDirectoryRoot("\")
        Catch E As Exception
            'Console.WriteLine("Error determining root directory")
            MessageBox.Show(E.Message)
        End Try
        'read hostsfile.txt
        Dim strfileName As String
        strfileName = strCurrent & "\" & "hotlist.txt"

        Replace(strfileName, "\n1 ", "\n")

    End Sub


    'GENERIC REGEX SEARCH & REPLACE.  Use Regular Expressions to search/replace within file
    Function Replace(ByRef file As String, ByRef searchFor As String, ByRef replaceWith As String) As Boolean
        'function used during Validation
        'use Regular Expressions to search for and replace user defined variables
        Try
            Dim reader As New StreamReader(file)            'get a StreamReader for reading the file
            Dim contents As String = reader.ReadToEnd()     'read the entire file at once
            reader.Close()
            reader.Dispose()                                'close up and dispose
            'use regular expressions to search and replace text
            contents = Regex.Replace(contents, searchFor, replaceWith, RegexOptions.IgnoreCase Or RegexOptions.Compiled)
            Dim writer As New StreamWriter(file)            'get a StreamWriter for writing the new text to the file
            writer.Write(contents)                          'write the contents
            writer.Close()
            writer.Dispose()                                'close up and dispose
            Return True                                     'return successful
        Catch generatedExceptionName As Exception
            MessageBox.Show(generatedExceptionName.Message)
            Return False
        End Try
    End Function

Open in new window

0
Comment
Question by:asc2010
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 12
  • 7
  • 6
  • +2
30 Comments
 

Author Comment

by:asc2010
ID: 33675998
Ok, I've got the first part done...remove the first two characters of each line.  It seems a bit crude though:

Replace(strfileName, "^1 ", "")
Replace(strfileName, "\r|\n1 ", Chr(10))

Open in new window

0
 

Author Comment

by:asc2010
ID: 33677071
No takers here?  Well, I am continuing to work on this, here is the code I have so far:
Option Explicit On

Imports System
Imports System.IO
Imports System.IO.File
Imports System.Text.RegularExpressions
Imports System.Text.RegularExpressions.Match
Imports System.Text.RegularExpressions.MatchCollection
'
'requirements:
'invisible form (will be called via batch file)
'Remove the 1st column and its trailing white space
'Remove all rows that are not Ohio
'Remove columns 3, 4, and 9
'Create text files based on a letter-filter: V=Stolen.txt, W=Wanted.txt, P=LP.txt, M=MP.txt
'Create all text files as comma sperated

Public Class DOJ_Parser_Hilliard

    Public strFileName As String = "hotlist.txt"
    Public strCurrent As String = Directory.GetCurrentDirectory()
    Public strGlobalFilePath As String = strCurrent & "\" & strFileName

    Private Sub DOJ_Parser_Hilliard_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load

        Label1.Text = "Loading . . ."

        If VerifyFileExists() = True Then
            StepOne()
            StepTwo()
        Else
            Me.Close()
        End If


    End Sub

    Function VerifyFileExists()
        'this function runs on form load
        Try
            strCurrent = Directory.GetCurrentDirectory()
        Catch E As Exception
            MessageBox.Show(E.Message)
        End Try
        If Not Exists(strGlobalFilePath) Then
            MessageBox.Show("File Not Found.", "WARNING!")
            Return False
        Else
            Return True
        End If
    End Function

    Sub StepOne()
        'remove 1st column and its trailing white space
        'regex replace
        Replace(strGlobalFilePath, "^1 ", "")
        Replace(strGlobalFilePath, "\r|\n1 ", Chr(10))
    End Sub
    
    Sub StepTwo()
        'remove all rows that are not OH
        'regex match
        Dim strOhioCollection As String = ""
        Dim strRegexMatch As String = Nothing 'prepare a string for regex matching
        Dim srTextFile As New StreamReader(strGlobalFilePath)   'initialize streamreader
        Dim strTextFileContents As String = srTextFile.ReadToEnd()          'read in entire text file
        srTextFile.Close()                                                  'close stream reader
        strRegexMatch = "^.*OH.*$"

        If Regex.IsMatch(strGlobalFilePath, strRegexMatch) Then           'if a match is found in the text file, return it to the requestor method
            Dim theMatchedVariable = Nothing
            Dim r As New Regex(strRegexMatch, RegexOptions.IgnoreCase Or RegexOptions.Compiled)
            theMatchedVariable = r.Match(strTextFileContents).Result("^.*OH.*$")
            'open a streamwriter
            Dim sw As StreamWriter = File.AppendText(strGlobalFilePath)
            sw.Write(vbCrLf & theMatchedVariable) 'write to file
            sw.Flush() 'update file 
            sw.Close()
            sw.Dispose()

        Else 'if file does not exist, display error notice
            MessageBox.Show("WARNING! ERROR!", "ERROR")
        End If

    End Sub


    Function StepThree(ByVal strMatchThis As String)
        're-write hostlist.txt with Ohio content
        'sreamreader
        'streamwriter
        Dim theMessage As String = Nothing
        Dim theMessageTitle As String = Nothing

        Dim srTextFile As New StreamReader(strGlobalFilePath)   'initialize streamreader
        Dim strTextFileContents As String = srTextFile.ReadToEnd()          'read in entire text file
        srTextFile.Close()                                                  'close stream reader

        Dim strRegexMatch As String = Nothing                               'prepare a string for regex matching

        Select Case strMatchThis
            Case "deployment"                                               'if passed variable is deployment, do the following variable assignment
                strRegexMatch = "\$LOCAL::DEPLOYMENTS\$=(?<matchvariable>.+)"
                theMessage = "WARNING!   PlateScanDeploymentInstallationPath Not Found."
                theMessageTitle = "File Not Found"
            Case "enterprise"                                               'if passed variable is enterprise, do the following variable assignment
                strRegexMatch = "\$LOCAL::ENTERPRISE\$=\$LOCAL::DEPLOYMENTS\$(?<matchvariable>.+)"
                theMessage = "WARNING!   PlateScanEnterpriseInstallationPath Not Found."
                theMessageTitle = "File Not Found"
            Case "system"                                                   'if passed variable is system, do the following variable assignment
                strRegexMatch = "\$LOCAL::SYSTEM\$=(?<matchvariable>.+)"
                theMessage = "WARNING!   PlateScanSystemInstallationPath Not Found."
                theMessageTitle = "File Not Found"
        End Select

        If Regex.IsMatch(strTextFileContents, strRegexMatch) Then           'if a match is found in the text file, return it to the requestor method
            Dim theMatchedVariable = Nothing
            Dim r As New Regex(strRegexMatch, RegexOptions.IgnoreCase Or RegexOptions.Compiled)
            theMatchedVariable = r.Match(strTextFileContents).Result("${matchvariable}")
            Return theMatchedVariable
        Else                                                                'if file does not exist, display error notice
            MessageBox.Show(theMessage, theMessageTitle)
            Return False
        End If
    End Function

    'GENERIC REGEX SEARCH & REPLACE.  Use Regular Expressions to search/replace within file
    Function Replace(ByRef file As String, ByRef searchFor As String, ByRef replaceWith As String) As Boolean
        'function used during Validation
        'use Regular Expressions to search for and replace user defined variables
        Try
            Dim reader As New StreamReader(file)            'get a StreamReader for reading the file
            Dim contents As String = reader.ReadToEnd()     'read the entire file at once
            reader.Close()
            reader.Dispose()                                'close up and dispose
            'use regular expressions to search and replace text
            contents = Regex.Replace(contents, searchFor, replaceWith, RegexOptions.IgnoreCase Or RegexOptions.Compiled)
            Dim writer As New StreamWriter(file)            'get a StreamWriter for writing the new text to the file
            writer.Write(contents)                          'write the contents
            writer.Close()
            writer.Dispose()                                'close up and dispose
            Return True                                     'return successful
        Catch generatedExceptionName As Exception
            MessageBox.Show(generatedExceptionName.Message)
            Return False
        End Try
    End Function

Open in new window

0
 
LVL 53

Expert Comment

by:Dhaest
ID: 33679715
>> The basic goal: remove certain text and create 4 separate text files from the remaining file.

You say a goal, but you never ask a question. What is the problem that you still have ? From what are you starting and what's the aim ?
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 75

Expert Comment

by:Michel Plungjan
ID: 33679730
Ah, a taker :)

I think the question is "How to use RegEx in C# to manipulate text and write 4 files with the result"
0
 
LVL 53

Expert Comment

by:Dhaest
ID: 33679751
>> How to use RegEx in C# to manipulate text and write 4 files with the result

I understand that but there is nothing mentioned of how the starting text looks like, based on what information the text needs to split to multiple files, ...
0
 

Author Comment

by:asc2010
ID: 33682664
First, I apologize for offering monetary compensation.  My offer was not intended to offend anyone, only expedite my request for help.

@ Dhaest: My question should have been stated as mplungjan put it - "How to use RegEx in C# or VB.NET to manipulate text and write 4 files with the result"

@ mplungjan: Thank you for clarifying my request.

@ aikimark: I put this request into the C# zone because I can figure out how to work with C# if someone here can help.  I have experience with VB.NET so I started with that language.

Attached is an excerpt of the text file I am working with.  There are  11 columns in the file, however, some are blank and I need to account for them as well.  I only need the rows that contain the text "OH" in the 3rd column.  Out of those, I only need columns 2, 3, 6, 7, 8, 9, and 11.

Once I get the correct rows and columns, I need to comma separate them and write them to separate text files based on column 11.
If column 11 starts with "V", create a file called "Stolen.txt"
If column 11 starts with "W", create a file called "Wanted.txt"
If column 11 starts with "P", create a file called "LP.txt"
If column 11 starts with "M", create a file called "MP.txt"
example.txt
0
 
LVL 53

Expert Comment

by:Dhaest
ID: 33682733
I'll take a look at it tomorrow ...
0
 
LVL 53

Expert Comment

by:Dhaest
ID: 33682815
Another question: why are you so eager to use regular expressions ? There are easier ways to handle this.
I'm thinking out loud: reading the file into an list of objects (with 11 properties) and use linq to get the correct results to store into the files.

Is the file a fixed-length file (or not like the example you provided, or is there a fault in the 4th line) ?
0
 

Author Comment

by:asc2010
ID: 33682878
@ Dhaest: I'm going the regex route because I could not think of anything else...I am open for anything simpler.  This program will be placed on a shared computer and called from a batch file.  The user should never even know it exists, so as long it gets the job done, I am open to suggestions!
0
 
LVL 53

Expert Comment

by:Dhaest
ID: 33682933
Is there a file-format known ? Does it have fixed lengths, separator, ... ?
0
 

Author Comment

by:asc2010
ID: 33683051
I'm not sure what you mean.  The attached file is exactly what I'm working with...it sucks I know.  The end result - the 4 text files - will be comma separated for columns.
0
 
LVL 86

Expert Comment

by:Mike Tomlinson
ID: 33683062
It looks fixed width but the 4th line doesn't line up properly...

We'd need to know EXACTLY how the file is formatted to be able to help you!
VariableFixedWidth.jpg
0
 
LVL 53

Expert Comment

by:Dhaest
ID: 33683072
I'll create the query's tomorrow. Here you already have an example of how to load the file into a list of objects


   Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        parseFile()

end sub
    Private Sub parseFile()
        Dim objReader As New System.IO.StreamReader("c:\ee.txt")
        Dim TextLine As String

        Dim fileLines As List(Of FileLine) = New List(Of FileLine)()
        Dim myLine As FileLine

        Do While objReader.Peek() <> -1
            TextLine = objReader.ReadLine()
            myLine = ParseLine(TextLine)
            If Not myLine Is Nothing Then
                fileLines.Add(ParseLine(TextLine))
            End If
        Loop

    End Sub

    Private Function ParseLine(ByVal textline As String)
        Dim textSplit() As String = textline.Split(" ")
        Dim line As FileLine = New FileLine(textSplit)

        If line.column3.Contains("OH") Then
            Return line
        End If
        Return Nothing
    End Function


Public Class FileLine
    Public column1 As String
    Public column2 As String
    Public column3 As String
    Public column4 As String
    Public column5 As String
    Public column6 As String
    Public column7 As String
    Public column8 As String
    Public column9 As String
    Public column10 As String
    Public column11 As String



    Public Sub New(ByVal params() As String)
        column1 = params(0)
        column2 = params(1)
        column3 = params(2)
        column4 = params(3)
        column5 = params(4)
        column6 = params(5)
        column7 = params(6)
        column8 = params(7)
        column9 = params(8)
        column10 = params(9)
        column11 = params(10)
    End Sub

End Class

Open in new window

0
 

Author Comment

by:asc2010
ID: 33683093
Idle_Mind,
The file is very ugly and not fixed-width.  My approach was going to be finding the white space and try to format it based on that...sorry
0
 
LVL 86

Expert Comment

by:Mike Tomlinson
ID: 33683131
We really can't help unless you can explain in plain English how the "records" are structured.

Reading lines and writing lines in a file is simple...but if we can't discern where the columns start/stop then we are of no use to you.

Yes...you'd need to explain in EXCRUCIATING detail how the file works...   =\
0
 

Author Comment

by:asc2010
ID: 33683251
Idle_Mind,

Each white space starts/ends a new column.  The data originates from a website.  someone extracts this data and it gets placed into the text file exactly as I have sent it.  Each record originally has 11 columns.  Some of those columns are empty and some are not.  
0
 
LVL 86

Expert Comment

by:Mike Tomlinson
ID: 33683291
Ah...gotcha!

...approximately how many lines in the actual file?

*We need to decide if the whole thing will be read into memory at once or if it should be processed only one line at a time.
0
 

Author Comment

by:asc2010
ID: 33683365
Idle_Mind,

The number of lines in the file vary by day.  The example file I provided contained only the first few lines out of 1,249.  Unfortunately, this will be an unknown amount as it changes on a daily basis.
0
 
LVL 86

Expert Comment

by:Mike Tomlinson
ID: 33683438
Ok...

1,000 lines is no problem.

10,000 lines would probably be fine too on most systems.

100,000 lines we might want to consider a different approach.

1,000,000 lines definitely needs a something else.

Is there a reasonable upper bound for the max # of lines that might be in the daily file?
0
 

Author Comment

by:asc2010
ID: 33683466
Idle_Mind,

lol I see your point there.  I do not believe there will be more than 5,000 lines on any given day.  If there are, than this agency has got way bigger issues than processing power!
0
 
LVL 86

Expert Comment

by:Mike Tomlinson
ID: 33683914
Here's an in-elegant solution that simply appends the lines to the existing data in the output files.

It is based on your rules:

    I only need the rows that contain the text "OH" in the 3rd column.
    Out of those, I only need columns 2, 3, 6, 7, 8, 9, and 11.

    Once I get the correct rows and columns, I need to comma separate them and write them to separate text files based on column 11.
    If column 11 starts with "V", create a file called "Stolen.txt"
    If column 11 starts with "W", create a file called "Wanted.txt"
    If column 11 starts with "P", create a file called "LP.txt"
    If column 11 starts with "M", create a file called "MP.txt"

*I didn't do any bounds checking!  If lines might exist with LESS than 11 columns or it there are complement blank lines then you'll need to do some extra checking:
Public Class Form1

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim Path As String = "C:\Users\Mike\Documents\Downloads"
        Dim DataFile As String = "Example.txt"
        Dim V_File As String = "Stolen.txt"
        Dim W_File As String = "Wanted.txt"
        Dim LP_File As String = "LP.txt"
        Dim MP_File As String = "MP.txt"

        Using sr As New System.IO.StreamReader(System.IO.Path.Combine(Path, DataFile))
            While Not sr.EndOfStream
                Dim values As New List(Of String)
                values.AddRange(sr.ReadLine.Split(" "))
                If values(2).ToUpper = "OH" Then ' process lines with "OH" in the 3rd column
                    ' remove columns 10,5,4,1
                    values.RemoveAt(9)
                    values.RemoveAt(5)
                    values.RemoveAt(4)
                    values.RemoveAt(0)
                    Dim output As String = String.Join(",", values.ToArray) & Environment.NewLine

                    ' output to file based on first character of value in last colmun
                    Select Case values(values.Count - 1).Substring(0, 1).ToUpper
                        Case "V"
                            My.Computer.FileSystem.WriteAllText(System.IO.Path.Combine(Path, V_File), output, True)

                        Case "W"
                            My.Computer.FileSystem.WriteAllText(System.IO.Path.Combine(Path, W_File), output, True)

                        Case "P"
                            My.Computer.FileSystem.WriteAllText(System.IO.Path.Combine(Path, LP_File), output, True)

                        Case "M"
                            My.Computer.FileSystem.WriteAllText(System.IO.Path.Combine(Path, MP_File), output, True)

                    End Select
                End If
            End While
        End Using

    End Sub

End Class

Open in new window

0
 

Author Comment

by:asc2010
ID: 33684348
Idle_Mind,

WOW!  This works like a charm!  I can't believe how quickly you came up with this!!  I understand the WriteAllText() has a true/false parameter to append or overwrite.  But is there a way to overwrite any existing data in the files just prior to writing the new data to it?  I can throw in a a quick function to erase any existing data, but am curious to see your opinion.
0
 
LVL 86

Expert Comment

by:Mike Tomlinson
ID: 33684506
Quick fix...
    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim Path As String = "C:\Users\Mike\Documents\Downloads"
        Dim DataFile As String = "Example.txt"
        Dim V_File As String = "Stolen.txt"
        Dim W_File As String = "Wanted.txt"
        Dim LP_File As String = "LP.txt"
        Dim MP_File As String = "MP.txt"

        Using sw_V As New System.IO.StreamWriter(System.IO.Path.Combine(Path, V_File), False)
            Using sw_W As New System.IO.StreamWriter(System.IO.Path.Combine(Path, W_File), False)
                Using sw_LP As New System.IO.StreamWriter(System.IO.Path.Combine(Path, LP_File), False)
                    Using sw_MP As New System.IO.StreamWriter(System.IO.Path.Combine(Path, MP_File), False)

                        Using sr As New System.IO.StreamReader(System.IO.Path.Combine(Path, DataFile))
                            While Not sr.EndOfStream
                                Dim values As New List(Of String)
                                values.AddRange(sr.ReadLine.Split(" "))
                                If values(2).ToUpper = "OH" Then ' process lines with "OH" in the 3rd column
                                    ' remove columns 10,5,4,1
                                    values.RemoveAt(9)
                                    values.RemoveAt(5)
                                    values.RemoveAt(4)
                                    values.RemoveAt(0)
                                    Dim output As String = String.Join(",", values.ToArray)

                                    ' output to file based on first character of value in last colmun
                                    Select Case values(values.Count - 1).Substring(0, 1).ToUpper
                                        Case "V"
                                            sw_V.WriteLine(output)

                                        Case "W"
                                            sw_W.WriteLine(output)

                                        Case "P"
                                            sw_LP.WriteLine(output)

                                        Case "M"
                                            sw_MP.WriteLine(output)

                                    End Select
                                End If
                            End While
                        End Using

                    End Using
                End Using
            End Using
        End Using
    End Sub

Open in new window

0
 
LVL 46

Expert Comment

by:aikimark
ID: 33684575
It might be possible to bring the data down from the web page.  What is the URL?
0
 

Author Comment

by:asc2010
ID: 33684636
@ aikimark:  I do not have access to the website, nor will I be granted access to the website.  I am only allowed access to the data provided in the text file because all personal information has been removed from it.
0
 
LVL 86

Accepted Solution

by:
Mike Tomlinson earned 2000 total points
ID: 33685341
Did you miss the fix back here to overwrite the file each time?
http://www.experts-exchange.com/Programming/Languages/.NET/Q_26472802.html#33684506
0
 

Author Closing Comment

by:asc2010
ID: 33685386
Idle_Mind,

I did get that, thank you.  I have been testing it on a couple of older computers and cannot find fault with your code.  You sir, are a genius!!  I will accept your answer as the solution.

Thank you so much!!
0

Featured Post

Amazon Web Services EC2 Cheat Sheet

AWS EC2 is a core part of AWS’s cloud platform, allowing users to spin up virtual machines for a variety of tasks; however, EC2’s offerings can be overwhelming. Learn the basics with our new AWS cheat sheet – this time on EC2!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Today I had a very interesting conundrum that had to get solved quickly. Needless to say, it wasn't resolved quickly because when we needed it we were very rushed, but as soon as the conference call was over and I took a step back I saw the correct …
This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
Video by: ITPro.TV
In this episode Don builds upon the troubleshooting techniques by demonstrating how to properly monitor a vSphere deployment to detect problems before they occur. He begins the show using tools found within the vSphere suite as ends the show demonst…
This tutorial will teach you the special effect of super speed similar to the fictional character Wally West aka "The Flash" After Shake : http://www.videocopilot.net/presets/after_shake/ All lightning effects with instructions : http://www.mediaf…

721 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question