Solved

RegEx issue

Posted on 2004-08-13
6
185 Views
Last Modified: 2010-04-15
ok, so I have a class which I've created that inherits from the class Regex [code shown below] and so far the expression has worked well for most cases except one.

Heres what its supposed to do:  Looks through any input tag (eg. <input type="text"...>) and find the attribute I specify followed by its value which I would like stored in the submatch.  In cases where I have a quoted value (tpe="text"), everything works fine... where it flunks is when the attribute has no quotes (sigle or double quotes, eg. type=TEXT).

Can anyone see what I'm doing wrong here?

protected class AttributeExpression : Regex
{
private static readonly RegexOptions PreDefOptions = RegexOptions.IgnoreCase | RegexOptions.Singleline | RegexOptions.Compiled;

public AttributeExpression(string attributeName) : base(" " + attributeName + "=\"([^\"]+)\"| " + attributeName + "='([^']+)'| " + attributeName + "=([^\\s]+)[\\s]", PreDefOptions){}
}
0
Comment
Question by:yleviel
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
  • 2
6 Comments
 
LVL 2

Expert Comment

by:davidastle
ID: 11797986
Your problem is with the last section,
attributeName + "=([^\\s]+)[\\s]",

What your first group in this snippet, ([^\\s]+), does is seach for one or more non white space characters.  Therefore, it will only stop when you get to a white space.  After that, it tries to match [\\s], which is also looking for a non white space.  So you get to a white space, and try to match it with a non white space, and your match fails.
0
 
LVL 19

Accepted Solution

by:
drichards earned 120 total points
ID: 11798924
No, the final [\s] looks FOR whitespace.

When I test your last expression it works - almost.  If the input looks like this:

    <input text=TEXT>

then the '>' is included in the capture.  I changed to this:

<att name>=([^>\s]+)[>\s]

Seems to work.  What was your test text?  Mine was dirt simple:

"<html><head><title>My Doc</title></head><body><form><input text=myText></input><input text='Some Text'></input></form></body></html>"

Your class with my small mod picked out both text= attributes and correctly captured "myText" and "Some Text".  I also found it a bit easier to name the groups:
----------------------------------------
protected class AttributeExpression : Regex
{
private static readonly RegexOptions PreDefOptions = RegexOptions.IgnoreCase | RegexOptions.Singleline | RegexOptions.Compiled;

public AttributeExpression(string attributeName) : base(" " + attributeName + "=\"(?<val>[^\"]+)\"| " + attributeName + "='(?<val>[^']+)'| " + attributeName + "=(?<val>[^>\\s]+)[>\\s]", PreDefOptions){}
}
0
 
LVL 2

Expert Comment

by:davidastle
ID: 11799092
Oops, sorry
0
Instantly Create Instructional Tutorials

Contextual Guidance at the moment of need helps your employees adopt to new software or processes instantly. Boost knowledge retention and employee engagement step-by-step with one easy solution.

 
LVL 2

Author Comment

by:yleviel
ID: 11811467
drichards,

The regex you made works almost flawlessly... the only case where I've had problems is when the case of value="" shows up.  When this happens, the submatch returns "\"\"" (as in two quote symbols).  Anything you can think of to remedy this issue?

Thanks!
0
 
LVL 2

Author Comment

by:yleviel
ID: 11811640
ok, I changed the expression to handle zero or more chars in the quotes. and this has worked for all my test cases.  If you see nothing wrong with the new expression I'll award you the points.

public AttributeExpression(string attributeName) : base(" " + attributeName + "=\"(?<val>[^\"]*)\"| " + attributeName + "='(?<val>[^']+)'| " + attributeName + "=(?<val>[^>\\s]+)[>\\s]", PreDefOptions){}
0
 
LVL 19

Expert Comment

by:drichards
ID: 11812457
Just that you'll probably want the same change in the single quote expression (zero or more instead or 1 or more) and make sure there are no other terminal cases in the no-quote expression (anything other than whitespace and '>' that would end the match?).
0

Featured Post

Creating Instructional Tutorials  

For Any Use & On Any Platform

Contextual Guidance at the moment of need helps your employees/users adopt software o& achieve even the most complex tasks instantly. Boost knowledge retention, software adoption & employee engagement with easy solution.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Article by: Ivo
C# And Nullable Types Since 2.0 C# has Nullable(T) Generic Structure. The idea behind is to allow value type objects to have null values just like reference types have. This concerns scenarios where not all data sources have values (like a databa…
Introduction Hi all and welcome to my first article on Experts Exchange. A while ago, someone asked me if i could do some tutorials on object oriented programming. I decided to do them on C#. Now you may ask me, why's that? Well, one of the re…
Nobody understands Phishing better than an anti-spam company. That’s why we are providing Phishing Awareness Training to our customers. According to a report by Verizon, only 3% of targeted users report malicious emails to management. With compan…
This video shows how to use Hyena, from SystemTools Software, to update 100 user accounts from an external text file. View in 1080p for best video quality.
Suggested Courses

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question