We help IT Professionals succeed at work.

obtaining multiple substrings from text using regular expressions

840 Views
Last Modified: 2012-05-05
After the success (and ease for which it was answered) of my last regular expression question, I am taking it to the next level. The html code below is part of a larger source page and I want to retrieve two values from a given line without the surrounding junk.

From the code below, I want the regex to produce:

#7 in IV.44

(refer code for the rest of the question)
...
 <tr>
  <td class="column">Battle:</td>
  <td>#7 in <a href='comp.asp?bID=234'>IV.44</a></td>
 </tr>
...
 
#I can get the first part using: #[0-9]\sin
#and the second part using:      [IV]+\.[0-9]+
 
#is there a way I can concatenate these together 
#in the one expression whereby omitting the 
#superfluous link in between?

Open in new window

Comment
Watch Question

CERTIFIED EXPERT
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION

Author

Commented:
I could, but I am trying to avoid using groups at the moment but I simply want to apply the regex and implement it on the fly without additional code.

My primary reason for this is if the format of the page changes I want to be able to change the regular expression in a config file without having to recompile my app to accommodate new groups numbers that may be required.

I was hoping something like your other solution would be possible where the text is omitted from the result: (?<=user'>)[^<]*

Someone how omit the link/url information inbetween
CERTIFIED EXPERT

Commented:
You can't skip over text in the middle of a match without the use of groups.  Another option is is to do a replace, but you are replacing the match with m.Groups(1).Value & m.Groups(2).Value so it's about the same as the above code.

Author

Commented:
I just thought I'd clarify, I don't have a problem with groups in the expression, just that I don't want to rely on them in code since the expression can be set outside the application. Can you do an on-the-fly substitution of the undesirable tag?

Also, to clarify an earlier point, the [IV]+ is what I was after, I just didn't explain it well. It is to include all roman numerals from I to VIII
CERTIFIED EXPERT

Commented:
I might not understand what you are asking for, do you want to do a replacement on the string?
Raw Match Pattern:
(#\d\sin).*?([IV]+\.\d+)
 
Capture Groups:
    [0] => #7 in <a href='comp.asp?bID=234'>IV.44
    [1] => #7 in
    [2] => IV.44

Open in new window

Author

Commented:
I know the code below is invalid, but this is along the lines of what I am after. I want the middle group to be removed in much the same way you got rid of the beginning of the line when we were searching for userIDs (and left off the userID part at the front)

I am on the verge of just thinking it's not possible, but I am trying to canvass all options
Raw Match Pattern:
(#\d\sin)(?<=.*?)([IV]+\.\d+)  ' I know this is invalid
 
Capture Groups:
    [0] => #7 in IV.44
    [1] => #7 in
    [2] => IV.44

Open in new window

CERTIFIED EXPERT

Commented:
Unfortunately that is invalid, regular expressions do not work in that fashion:
Imports System.Text.RegularExpressions
Module Module1
    Sub Main()
        Dim sourcestring As String = "<td>#7 in <a href='comp.asp?bID=234'>IV.44</a></td>"
        Dim re As Regex = New Regex("(#\d\sin )(?:.*?)([IV]+\.\d+)")
        Dim mc As MatchCollection = re.Matches(sourcestring)
        For Each m As Match In mc
            Console.WriteLine(m.Groups(1).Value & m.Groups(2).Value)
        Next
    End Sub
End Module

Open in new window

Author

Commented:
Thanks ddrudik! Although not quite the answer I was looking for (and here I was thinking you were a God), you have come thru once again. Thanks for your efforts! I have taken on board much of what you have given me and I have even succumbed to the groups in my application. Regards
CERTIFIED EXPERT

Commented:
Thanks for the question and the points.

Gain unlimited access to on-demand training courses with an Experts Exchange subscription.

Get Access
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Empower Your Career
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE

Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.