Link to home
Start Free TrialLog in
Avatar of Christian de Bellefeuille
Christian de BellefeuilleFlag for Canada

asked on

RegEx with optional part

I've few HTML pattern that i would like to capture from a single RegEx expression, but i have difficulties to find the proper expression because some part are optionals.

Yet i've this Expression:
<label>([\w\s]+)<\/label>\s*<span>\s*<a href=(.*?)>\s*\+?(\d*)<\/a>\s*<\/span>\s*<\/li>\s*?<li class=\"dialInAdditionalInfo\">\s?<label>(.*?)<\/label>

  • I need to capture the first label between <label> and </label>
  • I need to capture the "href" url
  • I need to capture the text between this <a href...> and </a>
  • And finally, if the <li class="dialInAdditionalInfo"> is there, i must capture what's between it's <label> and </label>

From what i've understood of RegEx, by adding (...)? to an optional part, it should become optional.  But it's not working.  It would be like the following expression.   This part shouldn't be capture either, i just need the <label> section so i need to add ?: at the beginning of this group:
<label>([\w\s]+)<\/label>\s*<span>\s*<a href=(.*?)>\s*\+?(\d*)<\/a>\s*<\/span>\s*<\/li>\s*?(?:<li class=\"dialInAdditionalInfo\">\s?<label>(.*?)<\/label>)?

I'm doing my tests with This website, and i've checked the I-M-S Flags (Case insensitive, Multilines, Dot matches all chars including newline)

Pattern 1 (dialInAdditionalInfo provided)

<ul class="gdin-additional-numbers-list">
   <li class="toll-free"><label>Toll Free</label> 
   <span>
	   
	  <a href="1111111111111" >1111111111111</a>
	  
	</span></li>
	
	<li class="dialInAdditionalInfo"><label>
	 ( Mobile )
   </label> <span></span></li>
   
</ul>

Open in new window


Pattern 2 (dialInAdditionalInfo provided but no value given)

<ul class="gdin-additional-numbers-list">
   
   <li class="toll-free">
		 <label>Toll Free</label> 
		 <span>
			<a href="1111111111111" >1111111111111</a>
		</span>
   </li>
   
		<li class="dialInAdditionalInfo">
			<label></label> 
			<span></span>
	</li>
</ul>

Open in new window


Pattern 3 (no dialInAdditionalInfo provided)

<ul class="gdin-additional-numbers-list">
 <li><label>Munich                   </label> 
	<span>
	
	   <a href='+1111111111111' >  1111111111111</a>
	   
	</span></li>
	
</ul>

Open in new window


Anyone can help me to figure out what should be my expression?

thanks
ASKER CERTIFIED SOLUTION
Avatar of Rgonzo1971
Rgonzo1971

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Christian de Bellefeuille

ASKER

Excellent!  It work!   But i don't understand what's the difference between your version and mine.  The difference i see is the bold part.  But why it have an impact to have it in the group or not?

My version:
<label>([\w\s]+)<\/label>\s*<span>\s*<a href=(.*?)>\s*\+?(\d*)<\/a>\s*<\/span>\s*<\/li>\s*?(?:<li class=\"dialInAdditionalInfo\">\s?<label>(.*?)<\/label>)?

Your version:
<label>([\w\s]+)<\/label>\s*<span>\s*<a href=(.*?)>\s*\+?(\d*)<\/a>\s*<\/span>\s*<\/li>(\s*?<li class=\"dialInAdditionalInfo\">\s*?<label>[\s]*([\s\S]*?)\s*?<\/label>)?

Thanks!
Avatar of Rgonzo1971
Rgonzo1971

You have "dialInAdditionalInfo" and so on in a non capturing group
Yes, but your solution with ?: at the beginning of this group also work :

<label>([\w\s]+)<\/label>\s*<span>\s*<a href=(.*?)>\s*\+?(\d*)<\/a>\s*<\/span>\s*<\/li>(?:\s*?<li class=\"dialInAdditionalInfo\">\s*?<label>[\s]*([\s\S]*?)\s*?<\/label>)?

That's why i'm confused.  Specifying that \s*? (0 or more spaces\new lines, the least possible) within the Non-capturing & optional group or outside of it shouldn't have changed anything since the spaces are there no mather if the dialInAdditionalInfo is present or not in the text.  It shouldn't need to be in the "Optional" group.

I don't want to spend more of your time.   I'll try to find article speaking of non-capturing group to find what i'm missing.  I'm just trying to understand because i've spent litteraly a week trying to figure out what's going on.  That idea of placing this within the non-capturing group never came to my mind... because it didn't had to be that way.