[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 548
  • Last Modified:

VB.net REGULAR Expression

I have a text file that contains the following line of text. Everything has a starting tag. I am looking for a way to get the values from the date, year and agency. One problem I have encountered is that some of the values will have html tags as well. I have not had much experince with regular expressions. Any help is appreciated.


"<PRESOL> <DATE>0622 <YEAR>99 <AGENCY>General Services Administration <OFFICE>Public Buildings Service (PBS) <LOCATION>Spokane Customer Services Center (10PM3) <ZIP>99201-1075 <CLASSCOD>Z <OFFADD>General Services Administration, Public Buildings Service (PBS), Spokane Customer Services Center (10PM3), 920 West Riverside Avenue, Room 120, U. S. Courthouse, Spokane, WA  99201-1075 <SUBJECT>EXTERIOR PAINTING, FB/USPO, SPOKANE, WASHINGTON <SOLNBR>10PM3XX990138 <RESPDATE>081199 <CONTACT>Cheryl O'Donnell, Contract Specialist, Phone (509) 353-2457, Fax (509) 353-2359, Email cheryl.odonnell@gsa.gov - Eva Hutchison, Procurement Technician, Phone (509) 353-2457, Fax (509) 353-2359, Email eva.hutchison@gsa.gov <DESC>Contractor shall furnish all labor, materials and equipment to paint all previously painted workwork and exterior metal on the FB/USPO, 904 West Riverside Avenue, Spokane, Washington.  Building is five [5] stories.  Repair/replace missing, loose, cracked or defective caulking and glazing compound from glass, frames and trim of exterior windows.  All old paint contains lead.  Sic Code 1721.  All responsible sources may submit a quotation which, if timely received, may be considered by the Government.  This procurement is set aside for small business concerns.  Price range $100,000 - $250,000.  Please fax requests for solicitations to 509-353-2359. <LINK> <URL>http://www.fbo.gov/spg/GSA/PBS/10PM3/10PM3XX990138/listing.html <DESC>Link to FedBizOpps document. <EMAIL> <ADDRESS>cheryl.odonnell@gsa.gov <DESC>Cheryl O'Donnell </PRESOL>"
0
jimseiwert
Asked:
jimseiwert
  • 10
  • 7
1 Solution
 
Terry WoodsIT GuruCommented:
Please provide an example of html tags being included in the values
0
 
Terry WoodsIT GuruCommented:
Without the html tags, it can be done with a pattern like this:
(?:<date>)([^<]*)(?:<year>)([^<]*)(?:<agency>)([^<]*)
0
 
jimseiwertAuthor Commented:
It can include all of them from <a href , mailto: <strong> etc as the descriptions come from a htmleditor. It is unkown what tags will be included where. All I know is the start tags for my fields
0
Fill in the form and get your FREE NFR key NOW!

Veeam is happy to provide a FREE NFR server license to certified engineers, trainers, and bloggers.  It allows for the non‑production use of Veeam Agent for Microsoft Windows. This license is valid for five workstations and two servers.

 
Terry WoodsIT GuruCommented:
Are the fields always in that order?
0
 
Terry WoodsIT GuruCommented:
If yes, something like this might work nicely:
(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)
0
 
jimseiwertAuthor Commented:
Yes but they may not always be there. They are only there if they have data. If it helps the html tags are always in the DESC fields
0
 
jimseiwertAuthor Commented:
to use that expression would i do something like the below?
Dim MatchObj = Regex.Split(str, "(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>) ").ToList

Open in new window

0
 
Terry WoodsIT GuruCommented:
Something like this:
(see http://www.myregextester.com/?r=4c3ad42c)

    Dim re As Regex = New Regex("(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)",RegexOptions.IgnoreCase)
    Dim mc as MatchCollection = re.Matches(sourcestring)
    Dim mIdx as Integer = 0
    For each m as Match in mc
      For groupIdx As Integer = 0 To m.Groups.Count - 1
        Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
      Next
      mIdx=mIdx+1
    Next
0
 
Terry WoodsIT GuruCommented:
If sometimes a field is missing, then it would be best to do an individual search for each one. Also, if there are never html tags in the values for the fields you want, it simplifies things. Eg using separate patterns:
(?<=<date>)[^<]*
(?<=<year>)[^<]*
(?<=<agency>)[^<]*
0
 
jimseiwertAuthor Commented:
right now i have the below code and i am passing the below string in and it does not find any matches

"<STATUS>PRESOL <DATE>0622 <YEAR>99 <AGENCY>General Services Administration <OFFICE>Public Buildings Service (PBS) <LOCATION>Spokane Customer Services Center (10PM3) <ZIP>99201-1075 <CLASSCOD>Z <OFFADD>General Services Administration, Public Buildings Service (PBS), Spokane Customer Services Center (10PM3), 920 West Riverside Avenue, Room 120, U. S. Courthouse, Spokane, WA  99201-1075 <SUBJECT>EXTERIOR PAINTING, FB/USPO, SPOKANE, WASHINGTON <SOLNBR>10PM3XX990138 <RESPDATE>081199 <CONTACT>Cheryl O'Donnell, Contract Specialist, Phone (509) 353-2457, Fax (509) 353-2359, Email cheryl.odonnell@gsa.gov - Eva Hutchison, Procurement Technician, Phone (509) 353-2457, Fax (509) 353-2359, Email eva.hutchison@gsa.gov <DESC>Contractor shall furnish all labor, materials and equipment to paint all previously painted workwork and exterior metal on the FB/USPO, 904 West Riverside Avenue, Spokane, Washington.  Building is five [5] stories.  Repair/replace missing, loose, cracked or defective caulking and glazing compound from glass, frames and trim of exterior windows.  All old paint contains lead.  Sic Code 1721.  All responsible sources may submit a quotation which, if timely received, may be considered by the Government.  This procurement is set aside for small business concerns.  Price range $100,000 - $250,000.  Please fax requests for solicitations to 509-353-2359. <LINK> <URL>http://www.fbo.gov/spg/GSA/PBS/10PM3/10PM3XX990138/listing.html <DESC>Link to FedBizOpps document. <EMAIL> <ADDRESS>cheryl.odonnell@gsa.gov <DESC>Cheryl O'Donnell "
Sub fixstr(ByVal str As String)
        Try
            'Dim RegexObj As Regex = New Regex("(?:<date>)([^<]*)(?:<year>)([^<]*)(?:<agency>)([^<]*)")
            'Dim MatchObj = Regex.Split(str, "(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)").ToList
            'Dim dte As String = Trim(MatchObj(1))

            Dim re As Regex = New Regex("(?:<status>)(.*?)(?:<date>)(.*?)(?:<year>)(.*?)(?:<agency>)(.*?)(?:<office>)", RegexOptions.IgnoreCase)
            Dim mc As MatchCollection = re.Matches(str)
            Dim mIdx As Integer = 0
            For Each m As Match In mc
                For groupIdx As Integer = 0 To m.Groups.Count - 1
                    Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value)
                Next
                mIdx = mIdx + 1
            Next

        Catch ex As Exception
            Console.WriteLine(ex.Message)
        End Try
    End Sub

Open in new window

0
 
jimseiwertAuthor Commented:
one example of html tags is in this contact field it has the following

<CONTACT>Mona Morin, 401-275-4248  <a href="mailto:mona.morin@us.army.mil">USPFO for Rhode Island</a>
0
 
Terry WoodsIT GuruCommented:
Have a go with the patterns in http://5134289 and let me know how you go
0
 
Terry WoodsIT GuruCommented:
Sorry, that was meant to be a link to my post with number 35134289
0
 
jimseiwertAuthor Commented:
can you post a direct link as i cant seem to find question id 35134289
0
 
Terry WoodsIT GuruCommented:
It's just my latest pattern suggested above! I'll try again with the link: http://#35134289
0
 
Terry WoodsIT GuruCommented:
ie This post:

If sometimes a field is missing, then it would be best to do an individual search for each one. Also, if there are never html tags in the values for the fields you want, it simplifies things. Eg using separate patterns:
(?<=<date>)[^<]*
(?<=<year>)[^<]*
(?<=<agency>)[^<]*
0
 
jimseiwertAuthor Commented:
That worked. Thank you! I was hoping to do it all at once but one at a time will work also
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 10
  • 7
Tackle projects and never again get stuck behind a technical roadblock.
Join Now