?
Solved

VB.net REGULAR Expression

Posted on 2011-03-14
17
Medium Priority
?
531 Views
Last Modified: 2012-05-11
I have a text file that contains the following line of text. Everything has a starting tag. I am looking for a way to get the values from the date, year and agency. One problem I have encountered is that some of the values will have html tags as well. I have not had much experince with regular expressions. Any help is appreciated.


"<PRESOL> <DATE>0622 <YEAR>99 <AGENCY>General Services Administration <OFFICE>Public Buildings Service (PBS) <LOCATION>Spokane Customer Services Center (10PM3) <ZIP>99201-1075 <CLASSCOD>Z <OFFADD>General Services Administration, Public Buildings Service (PBS), Spokane Customer Services Center (10PM3), 920 West Riverside Avenue, Room 120, U. S. Courthouse, Spokane, WA  99201-1075 <SUBJECT>EXTERIOR PAINTING, FB/USPO, SPOKANE, WASHINGTON <SOLNBR>10PM3XX990138 <RESPDATE>081199 <CONTACT>Cheryl O'Donnell, Contract Specialist, Phone (509) 353-2457, Fax (509) 353-2359, Email cheryl.odonnell@gsa.gov - Eva Hutchison, Procurement Technician, Phone (509) 353-2457, Fax (509) 353-2359, Email eva.hutchison@gsa.gov <DESC>Contractor shall furnish all labor, materials and equipment to paint all previously painted workwork and exterior metal on the FB/USPO, 904 West Riverside Avenue, Spokane, Washington.  Building is five [5] stories.  Repair/replace missing, loose, cracked or defective caulking and glazing compound from glass, frames and trim of exterior windows.  All old paint contains lead.  Sic Code 1721.  All responsible sources may submit a quotation which, if timely received, may be considered by the Government.  This procurement is set aside for small business concerns.  Price range $100,000 - $250,000.  Please fax requests for solicitations to 509-353-2359. <LINK> <URL>http://www.fbo.gov/spg/GSA/PBS/10PM3/10PM3XX990138/listing.html <DESC>Link to FedBizOpps document. <EMAIL> <ADDRESS>cheryl.odonnell@gsa.gov <DESC>Cheryl O'Donnell </PRESOL>"
0
Comment
Question by:jimseiwert
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 7
17 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134215
Please provide an example of html tags being included in the values
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134222
Without the html tags, it can be done with a pattern like this:
(?:<date>)([^<]*)(?:<year>)([^<]*)(?:<agency>)([^<]*)
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134223
It can include all of them from <a href , mailto: <strong> etc as the descriptions come from a htmleditor. It is unkown what tags will be included where. All I know is the start tags for my fields
0
Learn by Doing. Anytime. Anywhere.

Do you like to learn by doing?
Our labs and exercises give you the chance to do just that: Learn by performing actions on real environments.

Hands-on, scenario-based labs give you experience on real environments provided by us so you don't have to worry about breaking anything.

 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134232
Are the fields always in that order?
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134234
If yes, something like this might work nicely:
(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134236
Yes but they may not always be there. They are only there if they have data. If it helps the html tags are always in the DESC fields
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134242
to use that expression would i do something like the below?
Dim MatchObj = Regex.Split(str, "(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>) ").ToList

Open in new window

0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134257
Something like this:
(see http://www.myregextester.com/?r=4c3ad42c)

    Dim re As Regex = New Regex("(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)",RegexOptions.IgnoreCase)
    Dim mc as MatchCollection = re.Matches(sourcestring)
    Dim mIdx as Integer = 0
    For each m as Match in mc
      For groupIdx As Integer = 0 To m.Groups.Count - 1
        Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
      Next
      mIdx=mIdx+1
    Next
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134289
If sometimes a field is missing, then it would be best to do an individual search for each one. Also, if there are never html tags in the values for the fields you want, it simplifies things. Eg using separate patterns:
(?<=<date>)[^<]*
(?<=<year>)[^<]*
(?<=<agency>)[^<]*
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134299
right now i have the below code and i am passing the below string in and it does not find any matches

"<STATUS>PRESOL <DATE>0622 <YEAR>99 <AGENCY>General Services Administration <OFFICE>Public Buildings Service (PBS) <LOCATION>Spokane Customer Services Center (10PM3) <ZIP>99201-1075 <CLASSCOD>Z <OFFADD>General Services Administration, Public Buildings Service (PBS), Spokane Customer Services Center (10PM3), 920 West Riverside Avenue, Room 120, U. S. Courthouse, Spokane, WA  99201-1075 <SUBJECT>EXTERIOR PAINTING, FB/USPO, SPOKANE, WASHINGTON <SOLNBR>10PM3XX990138 <RESPDATE>081199 <CONTACT>Cheryl O'Donnell, Contract Specialist, Phone (509) 353-2457, Fax (509) 353-2359, Email cheryl.odonnell@gsa.gov - Eva Hutchison, Procurement Technician, Phone (509) 353-2457, Fax (509) 353-2359, Email eva.hutchison@gsa.gov <DESC>Contractor shall furnish all labor, materials and equipment to paint all previously painted workwork and exterior metal on the FB/USPO, 904 West Riverside Avenue, Spokane, Washington.  Building is five [5] stories.  Repair/replace missing, loose, cracked or defective caulking and glazing compound from glass, frames and trim of exterior windows.  All old paint contains lead.  Sic Code 1721.  All responsible sources may submit a quotation which, if timely received, may be considered by the Government.  This procurement is set aside for small business concerns.  Price range $100,000 - $250,000.  Please fax requests for solicitations to 509-353-2359. <LINK> <URL>http://www.fbo.gov/spg/GSA/PBS/10PM3/10PM3XX990138/listing.html <DESC>Link to FedBizOpps document. <EMAIL> <ADDRESS>cheryl.odonnell@gsa.gov <DESC>Cheryl O'Donnell "
Sub fixstr(ByVal str As String)
        Try
            'Dim RegexObj As Regex = New Regex("(?:<date>)([^<]*)(?:<year>)([^<]*)(?:<agency>)([^<]*)")
            'Dim MatchObj = Regex.Split(str, "(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)").ToList
            'Dim dte As String = Trim(MatchObj(1))

            Dim re As Regex = New Regex("(?:<status>)(.*?)(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)", RegexOptions.IgnoreCase)
            Dim mc As MatchCollection = re.Matches(str)
            Dim mIdx As Integer = 0
            For Each m As Match In mc
                For groupIdx As Integer = 0 To m.Groups.Count - 1
                    Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
                Next
                mIdx = mIdx + 1
            Next

        Catch ex As Exception
            Console.WriteLine(ex.Message)
        End Try
    End Sub

Open in new window

0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134347
one example of html tags is in this contact field it has the following

<CONTACT>Mona Morin, 401-275-4248  <a href="mailto:mona.morin@us.army.mil">USPFO for Rhode Island</a>
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134393
Have a go with the patterns in http://5134289 and let me know how you go
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134396
Sorry, that was meant to be a link to my post with number 35134289
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134455
can you post a direct link as i cant seem to find question id 35134289
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134464
It's just my latest pattern suggested above! I'll try again with the link: http://#35134289
0
 
LVL 35

Accepted Solution

by:
Terry Woods earned 2000 total points
ID: 35134467
ie This post:

If sometimes a field is missing, then it would be best to do an individual search for each one. Also, if there are never html tags in the values for the fields you want, it simplifies things. Eg using separate patterns:
(?<=<date>)[^<]*
(?<=<year>)[^<]*
(?<=<agency>)[^<]*
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134502
That worked. Thank you! I was hoping to do it all at once but one at a time will work also
0

Featured Post

New benefit for Premium Members - Upgrade now!

Ready to get started with anonymous questions today? It's easy! Learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This document covers how to connect to SQL Server and browse its contents.  It is meant for those new to Visual Studio and/or working with Microsoft SQL Server.  It is not a guide to building SQL Server database connections in your code.  This is mo…
For those of you who don't follow the news, or just happen to live under rocks, Microsoft Research released a beta SDK (http://www.microsoft.com/en-us/download/details.aspx?id=27876) for the Xbox 360 Kinect. If you don't know what a Kinect is (http:…
Michael from AdRem Software explains how to view the most utilized and worst performing nodes in your network, by accessing the Top Charts view in NetCrunch network monitor (https://www.adremsoft.com/). Top Charts is a view in which you can set seve…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…
Suggested Courses

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question