Solved

VB.net REGULAR Expression

Posted on 2011-03-14
17
522 Views
Last Modified: 2012-05-11
I have a text file that contains the following line of text. Everything has a starting tag. I am looking for a way to get the values from the date, year and agency. One problem I have encountered is that some of the values will have html tags as well. I have not had much experince with regular expressions. Any help is appreciated.


"<PRESOL> <DATE>0622 <YEAR>99 <AGENCY>General Services Administration <OFFICE>Public Buildings Service (PBS) <LOCATION>Spokane Customer Services Center (10PM3) <ZIP>99201-1075 <CLASSCOD>Z <OFFADD>General Services Administration, Public Buildings Service (PBS), Spokane Customer Services Center (10PM3), 920 West Riverside Avenue, Room 120, U. S. Courthouse, Spokane, WA  99201-1075 <SUBJECT>EXTERIOR PAINTING, FB/USPO, SPOKANE, WASHINGTON <SOLNBR>10PM3XX990138 <RESPDATE>081199 <CONTACT>Cheryl O'Donnell, Contract Specialist, Phone (509) 353-2457, Fax (509) 353-2359, Email cheryl.odonnell@gsa.gov - Eva Hutchison, Procurement Technician, Phone (509) 353-2457, Fax (509) 353-2359, Email eva.hutchison@gsa.gov <DESC>Contractor shall furnish all labor, materials and equipment to paint all previously painted workwork and exterior metal on the FB/USPO, 904 West Riverside Avenue, Spokane, Washington.  Building is five [5] stories.  Repair/replace missing, loose, cracked or defective caulking and glazing compound from glass, frames and trim of exterior windows.  All old paint contains lead.  Sic Code 1721.  All responsible sources may submit a quotation which, if timely received, may be considered by the Government.  This procurement is set aside for small business concerns.  Price range $100,000 - $250,000.  Please fax requests for solicitations to 509-353-2359. <LINK> <URL>http://www.fbo.gov/spg/GSA/PBS/10PM3/10PM3XX990138/listing.html <DESC>Link to FedBizOpps document. <EMAIL> <ADDRESS>cheryl.odonnell@gsa.gov <DESC>Cheryl O'Donnell </PRESOL>"
0
Comment
Question by:jimseiwert
  • 10
  • 7
17 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134215
Please provide an example of html tags being included in the values
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134222
Without the html tags, it can be done with a pattern like this:
(?:<date>)([^<]*)(?:<year>)([^<]*)(?:<agency>)([^<]*)
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134223
It can include all of them from <a href , mailto: <strong> etc as the descriptions come from a htmleditor. It is unkown what tags will be included where. All I know is the start tags for my fields
0
Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134232
Are the fields always in that order?
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134234
If yes, something like this might work nicely:
(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134236
Yes but they may not always be there. They are only there if they have data. If it helps the html tags are always in the DESC fields
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134242
to use that expression would i do something like the below?
Dim MatchObj = Regex.Split(str, "(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>) ").ToList

Open in new window

0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134257
Something like this:
(see http://www.myregextester.com/?r=4c3ad42c)

    Dim re As Regex = New Regex("(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)",RegexOptions.IgnoreCase)
    Dim mc as MatchCollection = re.Matches(sourcestring)
    Dim mIdx as Integer = 0
    For each m as Match in mc
      For groupIdx As Integer = 0 To m.Groups.Count - 1
        Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
      Next
      mIdx=mIdx+1
    Next
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134289
If sometimes a field is missing, then it would be best to do an individual search for each one. Also, if there are never html tags in the values for the fields you want, it simplifies things. Eg using separate patterns:
(?<=<date>)[^<]*
(?<=<year>)[^<]*
(?<=<agency>)[^<]*
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134299
right now i have the below code and i am passing the below string in and it does not find any matches

"<STATUS>PRESOL <DATE>0622 <YEAR>99 <AGENCY>General Services Administration <OFFICE>Public Buildings Service (PBS) <LOCATION>Spokane Customer Services Center (10PM3) <ZIP>99201-1075 <CLASSCOD>Z <OFFADD>General Services Administration, Public Buildings Service (PBS), Spokane Customer Services Center (10PM3), 920 West Riverside Avenue, Room 120, U. S. Courthouse, Spokane, WA  99201-1075 <SUBJECT>EXTERIOR PAINTING, FB/USPO, SPOKANE, WASHINGTON <SOLNBR>10PM3XX990138 <RESPDATE>081199 <CONTACT>Cheryl O'Donnell, Contract Specialist, Phone (509) 353-2457, Fax (509) 353-2359, Email cheryl.odonnell@gsa.gov - Eva Hutchison, Procurement Technician, Phone (509) 353-2457, Fax (509) 353-2359, Email eva.hutchison@gsa.gov <DESC>Contractor shall furnish all labor, materials and equipment to paint all previously painted workwork and exterior metal on the FB/USPO, 904 West Riverside Avenue, Spokane, Washington.  Building is five [5] stories.  Repair/replace missing, loose, cracked or defective caulking and glazing compound from glass, frames and trim of exterior windows.  All old paint contains lead.  Sic Code 1721.  All responsible sources may submit a quotation which, if timely received, may be considered by the Government.  This procurement is set aside for small business concerns.  Price range $100,000 - $250,000.  Please fax requests for solicitations to 509-353-2359. <LINK> <URL>http://www.fbo.gov/spg/GSA/PBS/10PM3/10PM3XX990138/listing.html <DESC>Link to FedBizOpps document. <EMAIL> <ADDRESS>cheryl.odonnell@gsa.gov <DESC>Cheryl O'Donnell "
Sub fixstr(ByVal str As String)
        Try
            'Dim RegexObj As Regex = New Regex("(?:<date>)([^<]*)(?:<year>)([^<]*)(?:<agency>)([^<]*)")
            'Dim MatchObj = Regex.Split(str, "(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)").ToList
            'Dim dte As String = Trim(MatchObj(1))

            Dim re As Regex = New Regex("(?:<status>)(.*?)(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)", RegexOptions.IgnoreCase)
            Dim mc As MatchCollection = re.Matches(str)
            Dim mIdx As Integer = 0
            For Each m As Match In mc
                For groupIdx As Integer = 0 To m.Groups.Count - 1
                    Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
                Next
                mIdx = mIdx + 1
            Next

        Catch ex As Exception
            Console.WriteLine(ex.Message)
        End Try
    End Sub

Open in new window

0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134347
one example of html tags is in this contact field it has the following

<CONTACT>Mona Morin, 401-275-4248  <a href="mailto:mona.morin@us.army.mil">USPFO for Rhode Island</a>
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134393
Have a go with the patterns in http://5134289 and let me know how you go
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134396
Sorry, that was meant to be a link to my post with number 35134289
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134455
can you post a direct link as i cant seem to find question id 35134289
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134464
It's just my latest pattern suggested above! I'll try again with the link: http://#35134289
0
 
LVL 35

Accepted Solution

by:
Terry Woods earned 500 total points
ID: 35134467
ie This post:

If sometimes a field is missing, then it would be best to do an individual search for each one. Also, if there are never html tags in the values for the fields you want, it simplifies things. Eg using separate patterns:
(?<=<date>)[^<]*
(?<=<year>)[^<]*
(?<=<agency>)[^<]*
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134502
That worked. Thank you! I was hoping to do it all at once but one at a time will work also
0

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article describes relatively difficult and non-obvious issues that are likely to arise when creating COM class in Visual Studio and deploying it by professional MSI-authoring tools. It is assumed that the reader is already familiar with the cla…
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
This Micro Tutorial will teach you how to censor certain areas of your screen. The example in this video will show a little boy's face being blurred. This will be demonstrated using Adobe Premiere Pro CS6.
Established in 1997, Technology Architects has become one of the most reputable technology solutions companies in the country. TA have been providing businesses with cost effective state-of-the-art solutions and unparalleled service that is designed…

816 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now