Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 405
  • Last Modified:

.net regular expressions

I am trying to use regular expressions to pull the value inside of <STATUS> I was trying to use the code below but it is not returning a result.

<STATUS>Lorem</STATUS>
re = New Regex("(?<=<status>)[^</status>]*", RegexOptions.IgnoreCase)

Open in new window

0
jimseiwert
Asked:
jimseiwert
  • 4
  • 3
1 Solution
 
käµfm³d 👽Commented:
That's because the constrct [ ... ] looks at each character within the brackets, not the phrase or string as whole. What you said above was, "not any character that is <, /, s, t, a, u, or >". What I believe you are after would be more along the lines of:
re = New Regex("(?<=<status>)(?:[^<]|<(?!/status))*", RegexOptions.IgnoreCase)

Open in new window

0
 
jimseiwertAuthor Commented:
That was it. Can you explain what each piece means in this (?<=<status>)(?:[^<]|<(?!/status))*  so I can learn for future use?
0
 
käµfm³d 👽Commented:
>>  That's because the constrct [ ... ] looks at each character

That's of course excluding the ^ at the beginning of the expression since that makes it a "NOT" expression  = )
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
käµfm³d 👽Commented:
>>  Can you explain what each piece means in this (?<=<status>)(?:[^<]|<(?!/status))*  so I can learn for future use?

Sure  :  )

(?<=<status>)

Open in new window

Positive lookbehind. Look backwards from the current location and try to find the string "<status>"



(?: ... )*

Open in new window

Non-capturing group modified with *. Find zero-or-more of the thing the group described.



[^<]|<(?!/status)

Open in new window

Find either any character not an opening chevron OR find an opening chevron that is not followed by the string "/status". The construct (?! ... ) is a negative lookahead, meaning the match succeeds if the string described by the lookahead is NOT found.


I have an article describing lookaround (lookbehind and lookahead) which you may find useful:  http://www.experts-exchange.com/A_4318.html
0
 
Todd GerbertIT ConsultantCommented:
Kaufmed,

Couldn't that be simplified slightly? since < isn't valid in HTML (or XML, if I'm correct, would need to be encoded as &lt; in both cases):

New Regex("<status>([^<]*)</status>", RegexOptions.IgnoreCase)
0
 
Todd GerbertIT ConsultantCommented:
No, nevermind - that wouldn't make sense. ;)
0
 
käµfm³d 👽Commented:
@tgerbert

>>  Couldn't that be simplified slightly? since < isn't valid in HTML

I agree, and I originally was going to post something like that, but I figured on the off-chance the author had a tag in between (since I don't know what the source data looks like), I took the long-winded approach  = )
0
 
Todd GerbertIT ConsultantCommented:
Yeah, embedded tags crossed mind just a split second after I clicked the Post button.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now