?
Solved

Regular expression

Posted on 2003-02-22
2
Medium Priority
?
365 Views
Last Modified: 2010-04-17
Hi!

How do you write a regular expression to get the value of the nameparameter from a html-string, like this

<input type="text" name="namedata">

I want the output to be: namedata

PHP or Perl thanx!
0
Comment
Question by:lorenz
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 1

Accepted Solution

by:
BernhardBrueck earned 200 total points
ID: 7998551
/<input type="text" name="([^"]*)"/;
print "$1"

Hope that helps,
  Bernhard Brueck
0
 

Expert Comment

by:webauk
ID: 12491996
I know this question was answered a looong time ago, but in case anyone else has a similar problem here's a  slightly more comprehensive solution.

Bernhard's solution above works fine BUT only if the text you're matching is EXACTLY
<input type="text" name="namedata">

It would not work for
<INPUT type="text" name="namedata">

Or
<input type = "text" name = "namedata">

Or
<  input type='text' name='namedata'  >

Or
<input type=text name=namedata>

Or
<input name="namedata" type="text" >

All of which are valid HTML form lines. Oh, you can also split a line of HTML over more than one line of a file / web page as well :-)

To tackle these complications one at a time:-

upper / lower case is easy - just use the " i " option.

Spaces between terms are handled by using  \s  which represents whitespace. However, you do not know how many whitespaces to match, or indeed if there are any at all. Using * tells the reg-ex engine to match zero or more whitespaces. The only-downside to this is that it can makes the regEx a little "messy" to look at later ;-)

Values can be enclosed in single or double quotes or have no quotes at all. I prefer to use "?'? which means match zero or one singles quotes and zero or one double quotes. This is not perfect because it would match "' (double quote followed by a single quote) as well, but in the work I do this would not occur. An alternative is to define a character class of ["'] which would match either a single quote or a double quote. If there is no quote mark to delimit the content then the data should terminate at the first white space. So this gives us a choice of two patterns to match.

One of them is ["'].*?["'] This breaks down into delimiter, a bunch of characters, delimiter. The "bunch of characters" match uses a ? to make it non-greedy otherwise it will try to match ALL the characters up to the very last quote.

The other possibility is where there is no quote delimiter. This means we want to match from the first non-whitespace character to the first whitespace character (you can also use word-boundary matches by the way but I'll let you look that up yourself) This gives .*?\s

Use round brackets to show a list of alternatives to match in regular expressions separating the alternatives using a pipe character. So the match is:

(["'].*?["']|.*?\s)

The last difficulty is how to handle the fact that its can be either "name=" then "type=" or "type=" then "name=".  There are different ways to handle this, the method you use will depend on what you want to do with data after you've matched it.

One possibility is to match alternatives. For example (name|type)=  However, this matches:
<input type="namedata" type="text" >  which may be suitable for your particular needs or not.

Another possibility is to match anything=  Again, this might or might not be suitable.

If you absolutely have to ensure that you have name= type= (or the other way around!) then you are forced into using more alternatives - this works, but will make for veeery big reg exes! For example:
(name=["'].*?["'] type=["'].*?["']|type=["'].*?["'] name=["'].*?["'])
Note: that, in order to simplify this, I have left out the checks for variable amounts of whitespace

Taking all of the above into account you could end up with the following Reg Ex:

/<\s*input\s*(name\s*=["']?.*?["']?\s*type\s*=\s*["']?.*?["']?|type\s*=\s*["']?.*?["']?\s*name\s*=\s*["']?.*?["']?)\s*>/i

Wow!

That's all very well, and will match a valid <INPUT type= name= > statement BUT if you want to extract the values of name and type then you will need to add more round brackets so that matches are copied to regular expression memory. This servers to further complicate the Reg Ex like so...

/<\s*input\s*(name\s*=["']?(.*?)["']?\s*type\s*=\s*["']?(.*?)["']?|type\s*=\s*["']?(.*?)["']?\s*name\s*=\s*["']?(.*?)["']?)\s*>/i;

When I run this in Perl I get values returned in either $2 & $3 OR $4 & $5 depending on whether the line was <INPUT type= name= >  or <INPUT name= type= > 

In conclusion. If you need it then reg exes can be made very complicated / comprehensive to catch all the valid variations of syntax of HTML lines. If you end up with a reg ex this complicated document it well (you can add comments directly into reg exes - go look it up)
If you end up with a reg ex this complicated you might choose to break parsing it into a number of steps. This could be slower to execute, but quicker to code and easier to maintain.

Oh, and there's always more than one way to do it.

webauk
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently I spent hours debugging an issue in a Rails project where ActiveRecord was causing MySQL errors trying to create a User object of a class at the top level of a Single Table Inheritance model structure.  It turns out `.create` behaves differ…
Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
With the power of JIRA, there's an unlimited number of ways you can customize it, use it and benefit from it. With that in mind, there's bound to be things that I wasn't able to cover in this course. With this summary we'll look at some places to go…
Simple Linear Regression

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question