Christian de Bellefeuille
asked on
RegEx with optional part
I've few HTML pattern that i would like to capture from a single RegEx expression, but i have difficulties to find the proper expression because some part are optionals.
Yet i've this Expression:
<label>([\w\s]+)<\/label>\ s*<span>\s *<a href=(.*?)>\s*\+?(\d*)<\/a >\s*<\/spa n>\s*<\/li >\s*?<li class=\"dialInAdditionalIn fo\">\s?<l abel>(.*?) <\/label>
From what i've understood of RegEx, by adding (...)? to an optional part, it should become optional. But it's not working. It would be like the following expression. This part shouldn't be capture either, i just need the <label> section so i need to add ?: at the beginning of this group:
<label>([\w\s]+)<\/label>\ s*<span>\s *<a href=(.*?)>\s*\+?(\d*)<\/a >\s*<\/spa n>\s*<\/li >\s*?(?:<l i class=\"dialInAdditionalIn fo\">\s?<l abel>(.*?) <\/label>) ?
I'm doing my tests with This website, and i've checked the I-M-S Flags (Case insensitive, Multilines, Dot matches all chars including newline)
Anyone can help me to figure out what should be my expression?
thanks
Yet i've this Expression:
<label>([\w\s]+)<\/label>\
- I need to capture the first label between <label> and </label>
- I need to capture the "href" url
- I need to capture the text between this <a href...> and </a>
- And finally, if the <li class="dialInAdditionalInf
o"> is there, i must capture what's between it's <label> and </label>
From what i've understood of RegEx, by adding (...)? to an optional part, it should become optional. But it's not working. It would be like the following expression. This part shouldn't be capture either, i just need the <label> section so i need to add ?: at the beginning of this group:
<label>([\w\s]+)<\/label>\
I'm doing my tests with This website, and i've checked the I-M-S Flags (Case insensitive, Multilines, Dot matches all chars including newline)
Pattern 1 (dialInAdditionalInfo provided)
<ul class="gdin-additional-numbers-list">
<li class="toll-free"><label>Toll Free</label>
<span>
<a href="1111111111111" >1111111111111</a>
</span></li>
<li class="dialInAdditionalInfo"><label>
( Mobile )
</label> <span></span></li>
</ul>
Pattern 2 (dialInAdditionalInfo provided but no value given)
<ul class="gdin-additional-numbers-list">
<li class="toll-free">
<label>Toll Free</label>
<span>
<a href="1111111111111" >1111111111111</a>
</span>
</li>
<li class="dialInAdditionalInfo">
<label></label>
<span></span>
</li>
</ul>
Pattern 3 (no dialInAdditionalInfo provided)
<ul class="gdin-additional-numbers-list">
<li><label>Munich </label>
<span>
<a href='+1111111111111' > 1111111111111</a>
</span></li>
</ul>
Anyone can help me to figure out what should be my expression?
thanks
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
You have "dialInAdditionalInfo" and so on in a non capturing group
ASKER
Yes, but your solution with ?: at the beginning of this group also work :
<label>([\w\s]+)<\/label>\ s*<span>\s *<a href=(.*?)>\s*\+?(\d*)<\/a >\s*<\/spa n>\s*<\/li >(?:\s*?<l i class=\"dialInAdditionalIn fo\">\s*?< label>[\s] *([\s\S]*? )\s*?<\/la bel>)?
That's why i'm confused. Specifying that \s*? (0 or more spaces\new lines, the least possible) within the Non-capturing & optional group or outside of it shouldn't have changed anything since the spaces are there no mather if the dialInAdditionalInfo is present or not in the text. It shouldn't need to be in the "Optional" group.
I don't want to spend more of your time. I'll try to find article speaking of non-capturing group to find what i'm missing. I'm just trying to understand because i've spent litteraly a week trying to figure out what's going on. That idea of placing this within the non-capturing group never came to my mind... because it didn't had to be that way.
<label>([\w\s]+)<\/label>\
That's why i'm confused. Specifying that \s*? (0 or more spaces\new lines, the least possible) within the Non-capturing & optional group or outside of it shouldn't have changed anything since the spaces are there no mather if the dialInAdditionalInfo is present or not in the text. It shouldn't need to be in the "Optional" group.
I don't want to spend more of your time. I'll try to find article speaking of non-capturing group to find what i'm missing. I'm just trying to understand because i've spent litteraly a week trying to figure out what's going on. That idea of placing this within the non-capturing group never came to my mind... because it didn't had to be that way.
ASKER
My version:
<label>([\w\s]+)<\/label>\
Your version:
<label>([\w\s]+)<\/label>\
Thanks!