Solved

VB.net REGULAR Expression

Posted on 2011-03-14
17
523 Views
Last Modified: 2012-05-11
I have a text file that contains the following line of text. Everything has a starting tag. I am looking for a way to get the values from the date, year and agency. One problem I have encountered is that some of the values will have html tags as well. I have not had much experince with regular expressions. Any help is appreciated.


"<PRESOL> <DATE>0622 <YEAR>99 <AGENCY>General Services Administration <OFFICE>Public Buildings Service (PBS) <LOCATION>Spokane Customer Services Center (10PM3) <ZIP>99201-1075 <CLASSCOD>Z <OFFADD>General Services Administration, Public Buildings Service (PBS), Spokane Customer Services Center (10PM3), 920 West Riverside Avenue, Room 120, U. S. Courthouse, Spokane, WA  99201-1075 <SUBJECT>EXTERIOR PAINTING, FB/USPO, SPOKANE, WASHINGTON <SOLNBR>10PM3XX990138 <RESPDATE>081199 <CONTACT>Cheryl O'Donnell, Contract Specialist, Phone (509) 353-2457, Fax (509) 353-2359, Email cheryl.odonnell@gsa.gov - Eva Hutchison, Procurement Technician, Phone (509) 353-2457, Fax (509) 353-2359, Email eva.hutchison@gsa.gov <DESC>Contractor shall furnish all labor, materials and equipment to paint all previously painted workwork and exterior metal on the FB/USPO, 904 West Riverside Avenue, Spokane, Washington.  Building is five [5] stories.  Repair/replace missing, loose, cracked or defective caulking and glazing compound from glass, frames and trim of exterior windows.  All old paint contains lead.  Sic Code 1721.  All responsible sources may submit a quotation which, if timely received, may be considered by the Government.  This procurement is set aside for small business concerns.  Price range $100,000 - $250,000.  Please fax requests for solicitations to 509-353-2359. <LINK> <URL>http://www.fbo.gov/spg/GSA/PBS/10PM3/10PM3XX990138/listing.html <DESC>Link to FedBizOpps document. <EMAIL> <ADDRESS>cheryl.odonnell@gsa.gov <DESC>Cheryl O'Donnell </PRESOL>"
0
Comment
Question by:jimseiwert
  • 10
  • 7
17 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134215
Please provide an example of html tags being included in the values
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134222
Without the html tags, it can be done with a pattern like this:
(?:<date>)([^<]*)(?:<year>)([^<]*)(?:<agency>)([^<]*)
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134223
It can include all of them from <a href , mailto: <strong> etc as the descriptions come from a htmleditor. It is unkown what tags will be included where. All I know is the start tags for my fields
0
Space-Age Communications Transitions to DevOps

ViaSat, a global provider of satellite and wireless communications, securely connects businesses, governments, and organizations to the Internet. Learn how ViaSat’s Network Solutions Engineer, drove the transition from a traditional network support to a DevOps-centric model.

 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134232
Are the fields always in that order?
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134234
If yes, something like this might work nicely:
(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134236
Yes but they may not always be there. They are only there if they have data. If it helps the html tags are always in the DESC fields
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134242
to use that expression would i do something like the below?
Dim MatchObj = Regex.Split(str, "(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>) ").ToList

Open in new window

0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134257
Something like this:
(see http://www.myregextester.com/?r=4c3ad42c)

    Dim re As Regex = New Regex("(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)",RegexOptions.IgnoreCase)
    Dim mc as MatchCollection = re.Matches(sourcestring)
    Dim mIdx as Integer = 0
    For each m as Match in mc
      For groupIdx As Integer = 0 To m.Groups.Count - 1
        Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
      Next
      mIdx=mIdx+1
    Next
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134289
If sometimes a field is missing, then it would be best to do an individual search for each one. Also, if there are never html tags in the values for the fields you want, it simplifies things. Eg using separate patterns:
(?<=<date>)[^<]*
(?<=<year>)[^<]*
(?<=<agency>)[^<]*
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134299
right now i have the below code and i am passing the below string in and it does not find any matches

"<STATUS>PRESOL <DATE>0622 <YEAR>99 <AGENCY>General Services Administration <OFFICE>Public Buildings Service (PBS) <LOCATION>Spokane Customer Services Center (10PM3) <ZIP>99201-1075 <CLASSCOD>Z <OFFADD>General Services Administration, Public Buildings Service (PBS), Spokane Customer Services Center (10PM3), 920 West Riverside Avenue, Room 120, U. S. Courthouse, Spokane, WA  99201-1075 <SUBJECT>EXTERIOR PAINTING, FB/USPO, SPOKANE, WASHINGTON <SOLNBR>10PM3XX990138 <RESPDATE>081199 <CONTACT>Cheryl O'Donnell, Contract Specialist, Phone (509) 353-2457, Fax (509) 353-2359, Email cheryl.odonnell@gsa.gov - Eva Hutchison, Procurement Technician, Phone (509) 353-2457, Fax (509) 353-2359, Email eva.hutchison@gsa.gov <DESC>Contractor shall furnish all labor, materials and equipment to paint all previously painted workwork and exterior metal on the FB/USPO, 904 West Riverside Avenue, Spokane, Washington.  Building is five [5] stories.  Repair/replace missing, loose, cracked or defective caulking and glazing compound from glass, frames and trim of exterior windows.  All old paint contains lead.  Sic Code 1721.  All responsible sources may submit a quotation which, if timely received, may be considered by the Government.  This procurement is set aside for small business concerns.  Price range $100,000 - $250,000.  Please fax requests for solicitations to 509-353-2359. <LINK> <URL>http://www.fbo.gov/spg/GSA/PBS/10PM3/10PM3XX990138/listing.html <DESC>Link to FedBizOpps document. <EMAIL> <ADDRESS>cheryl.odonnell@gsa.gov <DESC>Cheryl O'Donnell "
Sub fixstr(ByVal str As String)
        Try
            'Dim RegexObj As Regex = New Regex("(?:<date>)([^<]*)(?:<year>)([^<]*)(?:<agency>)([^<]*)")
            'Dim MatchObj = Regex.Split(str, "(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)").ToList
            'Dim dte As String = Trim(MatchObj(1))

            Dim re As Regex = New Regex("(?:<status>)(.*?)(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)", RegexOptions.IgnoreCase)
            Dim mc As MatchCollection = re.Matches(str)
            Dim mIdx As Integer = 0
            For Each m As Match In mc
                For groupIdx As Integer = 0 To m.Groups.Count - 1
                    Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
                Next
                mIdx = mIdx + 1
            Next

        Catch ex As Exception
            Console.WriteLine(ex.Message)
        End Try
    End Sub

Open in new window

0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134347
one example of html tags is in this contact field it has the following

<CONTACT>Mona Morin, 401-275-4248  <a href="mailto:mona.morin@us.army.mil">USPFO for Rhode Island</a>
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134393
Have a go with the patterns in http://5134289 and let me know how you go
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134396
Sorry, that was meant to be a link to my post with number 35134289
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134455
can you post a direct link as i cant seem to find question id 35134289
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35134464
It's just my latest pattern suggested above! I'll try again with the link: http://#35134289
0
 
LVL 35

Accepted Solution

by:
Terry Woods earned 500 total points
ID: 35134467
ie This post:

If sometimes a field is missing, then it would be best to do an individual search for each one. Also, if there are never html tags in the values for the fields you want, it simplifies things. Eg using separate patterns:
(?<=<date>)[^<]*
(?<=<year>)[^<]*
(?<=<agency>)[^<]*
0
 
LVL 2

Author Comment

by:jimseiwert
ID: 35134502
That worked. Thank you! I was hoping to do it all at once but one at a time will work also
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It’s quite interesting for me as I worked with Excel using vb.net for some time. Here are some topics which I know want to share with others whom this might help. First of all if you are working with Excel then you need to Download the Following …
A long time ago (May 2011), I have written an article showing you how to create a DLL using Visual Studio 2005 to be hosted in SQL Server 2005. That was valid at that time and it is still valid if you are still using these versions. You can still re…
Nobody understands Phishing better than an anti-spam company. That’s why we are providing Phishing Awareness Training to our customers. According to a report by Verizon, only 3% of targeted users report malicious emails to management. With compan…
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…

837 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question