• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 406
  • Last Modified:

.net regular expressions

I am trying to use regular expressions to pull the value inside of <STATUS> I was trying to use the code below but it is not returning a result.

<STATUS>Lorem</STATUS>
re = New Regex("(?<=<status>)[^</status>]*", RegexOptions.IgnoreCase)

Open in new window

0
jimseiwert
Asked:
jimseiwert
  • 4
  • 3
1 Solution
 
käµfm³d 👽Commented:
That's because the constrct [ ... ] looks at each character within the brackets, not the phrase or string as whole. What you said above was, "not any character that is <, /, s, t, a, u, or >". What I believe you are after would be more along the lines of:
re = New Regex("(?<=<status>)(?:[^<]|<(?!/status))*", RegexOptions.IgnoreCase)

Open in new window

0
 
jimseiwertAuthor Commented:
That was it. Can you explain what each piece means in this (?<=<status>)(?:[^<]|<(?!/status))*  so I can learn for future use?
0
 
käµfm³d 👽Commented:
>>  That's because the constrct [ ... ] looks at each character

That's of course excluding the ^ at the beginning of the expression since that makes it a "NOT" expression  = )
0
Cloud Class® Course: Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

 
käµfm³d 👽Commented:
>>  Can you explain what each piece means in this (?<=<status>)(?:[^<]|<(?!/status))*  so I can learn for future use?

Sure  :  )

(?<=<status>)

Open in new window

Positive lookbehind. Look backwards from the current location and try to find the string "<status>"



(?: ... )*

Open in new window

Non-capturing group modified with *. Find zero-or-more of the thing the group described.



[^<]|<(?!/status)

Open in new window

Find either any character not an opening chevron OR find an opening chevron that is not followed by the string "/status". The construct (?! ... ) is a negative lookahead, meaning the match succeeds if the string described by the lookahead is NOT found.


I have an article describing lookaround (lookbehind and lookahead) which you may find useful:  http://www.experts-exchange.com/A_4318.html
0
 
Todd GerbertIT ConsultantCommented:
Kaufmed,

Couldn't that be simplified slightly? since < isn't valid in HTML (or XML, if I'm correct, would need to be encoded as &lt; in both cases):

New Regex("<status>([^<]*)</status>", RegexOptions.IgnoreCase)
0
 
Todd GerbertIT ConsultantCommented:
No, nevermind - that wouldn't make sense. ;)
0
 
käµfm³d 👽Commented:
@tgerbert

>>  Couldn't that be simplified slightly? since < isn't valid in HTML

I agree, and I originally was going to post something like that, but I figured on the off-chance the author had a tag in between (since I don't know what the source data looks like), I took the long-winded approach  = )
0
 
Todd GerbertIT ConsultantCommented:
Yeah, embedded tags crossed mind just a split second after I clicked the Post button.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: SQL Server Core 2016

This course will introduce you to SQL Server Core 2016, as well as teach you about SSMS, data tools, installation, server configuration, using Management Studio, and writing and executing queries.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now