Solved

VB.net REGULAR Expression

Posted on 2011-03-14
17
515 Views
Last Modified: 2012-05-11
I have a text file that contains the following line of text. Everything has a starting tag. I am looking for a way to get the values from the date, year and agency. One problem I have encountered is that some of the values will have html tags as well. I have not had much experince with regular expressions. Any help is appreciated.


"<PRESOL> <DATE>0622 <YEAR>99 <AGENCY>General Services Administration <OFFICE>Public Buildings Service (PBS) <LOCATION>Spokane Customer Services Center (10PM3) <ZIP>99201-1075 <CLASSCOD>Z <OFFADD>General Services Administration, Public Buildings Service (PBS), Spokane Customer Services Center (10PM3), 920 West Riverside Avenue, Room 120, U. S. Courthouse, Spokane, WA  99201-1075 <SUBJECT>EXTERIOR PAINTING, FB/USPO, SPOKANE, WASHINGTON <SOLNBR>10PM3XX990138 <RESPDATE>081199 <CONTACT>Cheryl O'Donnell, Contract Specialist, Phone (509) 353-2457, Fax (509) 353-2359, Email cheryl.odonnell@gsa.gov - Eva Hutchison, Procurement Technician, Phone (509) 353-2457, Fax (509) 353-2359, Email eva.hutchison@gsa.gov <DESC>Contractor shall furnish all labor, materials and equipment to paint all previously painted workwork and exterior metal on the FB/USPO, 904 West Riverside Avenue, Spokane, Washington.  Building is five [5] stories.  Repair/replace missing, loose, cracked or defective caulking and glazing compound from glass, frames and trim of exterior windows.  All old paint contains lead.  Sic Code 1721.  All responsible sources may submit a quotation which, if timely received, may be considered by the Government.  This procurement is set aside for small business concerns.  Price range $100,000 - $250,000.  Please fax requests for solicitations to 509-353-2359. <LINK> <URL>http://www.fbo.gov/spg/GSA/PBS/10PM3/10PM3XX990138/listing.html <DESC>Link to FedBizOpps document. <EMAIL> <ADDRESS>cheryl.odonnell@gsa.gov <DESC>Cheryl O'Donnell </PRESOL>"
0
Comment
Question by:jimseiwert
  • 10
  • 7
17 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134215
Please provide an example of html tags being included in the values
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134222
Without the html tags, it can be done with a pattern like this:
(?:<date>)([^<]*)(?:<year>)([^<]*)(?:<agency>)([^<]*)
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134223
It can include all of them from <a href , mailto: <strong> etc as the descriptions come from a htmleditor. It is unkown what tags will be included where. All I know is the start tags for my fields
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134232
Are the fields always in that order?
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134234
If yes, something like this might work nicely:
(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134236
Yes but they may not always be there. They are only there if they have data. If it helps the html tags are always in the DESC fields
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134242
to use that expression would i do something like the below?
Dim MatchObj = Regex.Split(str, "(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>) ").ToList

Open in new window

0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134257
Something like this:
(see http://www.myregextester.com/?r=4c3ad42c)

    Dim re As Regex = New Regex("(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)",RegexOptions.IgnoreCase)
    Dim mc as MatchCollection = re.Matches(sourcestring)
    Dim mIdx as Integer = 0
    For each m as Match in mc
      For groupIdx As Integer = 0 To m.Groups.Count - 1
        Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
      Next
      mIdx=mIdx+1
    Next
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134289
If sometimes a field is missing, then it would be best to do an individual search for each one. Also, if there are never html tags in the values for the fields you want, it simplifies things. Eg using separate patterns:
(?<=<date>)[^<]*
(?<=<year>)[^<]*
(?<=<agency>)[^<]*
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134299
right now i have the below code and i am passing the below string in and it does not find any matches

"<STATUS>PRESOL <DATE>0622 <YEAR>99 <AGENCY>General Services Administration <OFFICE>Public Buildings Service (PBS) <LOCATION>Spokane Customer Services Center (10PM3) <ZIP>99201-1075 <CLASSCOD>Z <OFFADD>General Services Administration, Public Buildings Service (PBS), Spokane Customer Services Center (10PM3), 920 West Riverside Avenue, Room 120, U. S. Courthouse, Spokane, WA  99201-1075 <SUBJECT>EXTERIOR PAINTING, FB/USPO, SPOKANE, WASHINGTON <SOLNBR>10PM3XX990138 <RESPDATE>081199 <CONTACT>Cheryl O'Donnell, Contract Specialist, Phone (509) 353-2457, Fax (509) 353-2359, Email cheryl.odonnell@gsa.gov - Eva Hutchison, Procurement Technician, Phone (509) 353-2457, Fax (509) 353-2359, Email eva.hutchison@gsa.gov <DESC>Contractor shall furnish all labor, materials and equipment to paint all previously painted workwork and exterior metal on the FB/USPO, 904 West Riverside Avenue, Spokane, Washington.  Building is five [5] stories.  Repair/replace missing, loose, cracked or defective caulking and glazing compound from glass, frames and trim of exterior windows.  All old paint contains lead.  Sic Code 1721.  All responsible sources may submit a quotation which, if timely received, may be considered by the Government.  This procurement is set aside for small business concerns.  Price range $100,000 - $250,000.  Please fax requests for solicitations to 509-353-2359. <LINK> <URL>http://www.fbo.gov/spg/GSA/PBS/10PM3/10PM3XX990138/listing.html <DESC>Link to FedBizOpps document. <EMAIL> <ADDRESS>cheryl.odonnell@gsa.gov <DESC>Cheryl O'Donnell "
Sub fixstr(ByVal str As String)
        Try
            'Dim RegexObj As Regex = New Regex("(?:<date>)([^<]*)(?:<year>)([^<]*)(?:<agency>)([^<]*)")
            'Dim MatchObj = Regex.Split(str, "(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)").ToList
            'Dim dte As String = Trim(MatchObj(1))

            Dim re As Regex = New Regex("(?:<status>)(.*?)(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)", RegexOptions.IgnoreCase)
            Dim mc As MatchCollection = re.Matches(str)
            Dim mIdx As Integer = 0
            For Each m As Match In mc
                For groupIdx As Integer = 0 To m.Groups.Count - 1
                    Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
                Next
                mIdx = mIdx + 1
            Next

        Catch ex As Exception
            Console.WriteLine(ex.Message)
        End Try
    End Sub

Open in new window

0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134347
one example of html tags is in this contact field it has the following

<CONTACT>Mona Morin, 401-275-4248  <a href="mailto:mona.morin@us.army.mil">USPFO for Rhode Island</a>
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134393
Have a go with the patterns in http://5134289 and let me know how you go
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134396
Sorry, that was meant to be a link to my post with number 35134289
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134455
can you post a direct link as i cant seem to find question id 35134289
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134464
It's just my latest pattern suggested above! I'll try again with the link: http://#35134289
0
 
LVL 35

Accepted Solution

by:
Terry Woods earned 500 total points
ID: 35134467
ie This post:

If sometimes a field is missing, then it would be best to do an individual search for each one. Also, if there are never html tags in the values for the fields you want, it simplifies things. Eg using separate patterns:
(?<=<date>)[^<]*
(?<=<year>)[^<]*
(?<=<agency>)[^<]*
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134502
That worked. Thank you! I was hoping to do it all at once but one at a time will work also
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Welcome my friends to the second instalment and follow-up to our Minify and Concatenate Your Scripts and Stylesheets (http://www.experts-exchange.com/Programming/Languages/.NET/ASP.NET/A_4334-Minify-and-Concatenate-Your-Scripts-and-Stylesheets.html)…
In my previous two articles we discussed Binary Serialization (http://www.experts-exchange.com/A_4362.html) and XML Serialization (http://www.experts-exchange.com/A_4425.html). In this article we will try to know more about SOAP (Simple Object Acces…
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…
This video demonstrates how to create an example email signature rule for a department in a company using CodeTwo Exchange Rules. The signature will be inserted beneath users' latest emails in conversations and will be displayed in users' Sent Items…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now