Solved

obtaining multiple substrings from text using regular expressions

Posted on 2008-10-03
9
802 Views
Last Modified: 2012-05-05
After the success (and ease for which it was answered) of my last regular expression question, I am taking it to the next level. The html code below is part of a larger source page and I want to retrieve two values from a given line without the surrounding junk.

From the code below, I want the regex to produce:

#7 in IV.44

(refer code for the rest of the question)
...

 <tr>

  <td class="column">Battle:</td>

  <td>#7 in <a href='comp.asp?bID=234'>IV.44</a></td>

 </tr>

...

 

#I can get the first part using: #[0-9]\sin

#and the second part using:      [IV]+\.[0-9]+

 

#is there a way I can concatenate these together 

#in the one expression whereby omitting the 

#superfluous link in between?

Open in new window

0
Comment
Question by:madivad2
  • 5
  • 4
9 Comments
 
LVL 27

Accepted Solution

by:
ddrudik earned 500 total points
ID: 22633902
You can use groups:
(#\d\sin).*?([IV]+\.\d+)

Note that [IV]+ matches one or more I,V characters in any order if that's what you wanted.
For each m as Match in mc

        Console.Writeline(m.Groups(1).Value & m.Groups(2).Value)

Next

Open in new window

0
 
LVL 2

Author Comment

by:madivad2
ID: 22634045
I could, but I am trying to avoid using groups at the moment but I simply want to apply the regex and implement it on the fly without additional code.

My primary reason for this is if the format of the page changes I want to be able to change the regular expression in a config file without having to recompile my app to accommodate new groups numbers that may be required.

I was hoping something like your other solution would be possible where the text is omitted from the result: (?<=user'>)[^<]*

Someone how omit the link/url information inbetween
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22634079
You can't skip over text in the middle of a match without the use of groups.  Another option is is to do a replace, but you are replacing the match with m.Groups(1).Value & m.Groups(2).Value so it's about the same as the above code.
0
 
LVL 2

Author Comment

by:madivad2
ID: 22634262
I just thought I'd clarify, I don't have a problem with groups in the expression, just that I don't want to rely on them in code since the expression can be set outside the application. Can you do an on-the-fly substitution of the undesirable tag?

Also, to clarify an earlier point, the [IV]+ is what I was after, I just didn't explain it well. It is to include all roman numerals from I to VIII
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 27

Expert Comment

by:ddrudik
ID: 22634742
I might not understand what you are asking for, do you want to do a replacement on the string?
Raw Match Pattern:

(#\d\sin).*?([IV]+\.\d+)
 

Capture Groups:

    [0] => #7 in <a href='comp.asp?bID=234'>IV.44

    [1] => #7 in

    [2] => IV.44

Open in new window

0
 
LVL 2

Author Comment

by:madivad2
ID: 22635398
I know the code below is invalid, but this is along the lines of what I am after. I want the middle group to be removed in much the same way you got rid of the beginning of the line when we were searching for userIDs (and left off the userID part at the front)

I am on the verge of just thinking it's not possible, but I am trying to canvass all options
Raw Match Pattern:

(#\d\sin)(?<=.*?)([IV]+\.\d+)  ' I know this is invalid

 

Capture Groups:

    [0] => #7 in IV.44

    [1] => #7 in

    [2] => IV.44

Open in new window

0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22636415
Unfortunately that is invalid, regular expressions do not work in that fashion:
Imports System.Text.RegularExpressions

Module Module1

    Sub Main()

        Dim sourcestring As String = "<td>#7 in <a href='comp.asp?bID=234'>IV.44</a></td>"

        Dim re As Regex = New Regex("(#\d\sin )(?:.*?)([IV]+\.\d+)")

        Dim mc As MatchCollection = re.Matches(sourcestring)

        For Each m As Match In mc

            Console.WriteLine(m.Groups(1).Value & m.Groups(2).Value)

        Next

    End Sub

End Module

Open in new window

0
 
LVL 2

Author Closing Comment

by:madivad2
ID: 31502751
Thanks ddrudik! Although not quite the answer I was looking for (and here I was thinking you were a God), you have come thru once again. Thanks for your efforts! I have taken on board much of what you have given me and I have even succumbed to the groups in my application. Regards
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22641751
Thanks for the question and the points.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

If you're writing a .NET application to connect to an Access .mdb database and use pre-existing queries that require parameters, you've come to the right place! Let's say the pre-existing query(qryCust) in Access takes a Date as a parameter and l…
Introduction When many people think of the WebBrowser (http://msdn.microsoft.com/en-us/library/2te2y1x6%28v=VS.85%29.aspx) control, they immediately think of a control which allows the viewing and navigation of web pages. While this is true, it's a…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now