Avatar of aaron900
aaron900

asked on 

Regular Expression - Matching HTML INPUT value - with no double quotes around it

So I guess this is why everybody advises against matching HTML with regexs ;-) Unfortunately, with my app, I don't have a choice to read it from the DOM... I have an IE add-on reading HTML and passing it on to me to parse it server-side, and I'm seeing some things I've never seen. For some reason IE is at times returning values and names without double quotes around them.

<INPUT onchange=setdirty(0); value=555-555-1212 type=text name=phn_Agent_Phone_CF15>
<INPUT onchange=setdirty(0); value=test@test.com type=text name=email_Agent_Email_CF15>
<INPUT onchange=setdirty(0); value=1/1/2010 type=text name=dt_HFTrip_from_date_CF15>

I don't need one gigantic regex that can capture all values - need one that I can define the name in, and get the value. But the trick is, sometimes the name has quotes around it, sometimes it doesn't (I made this optional for the name, see below). But the value of the tag is really throwing me off. I can't seem to define in my capture a way to tell it that if it has no double quotes around the value, to go ahead and allow it, but stop capturing after the first space it encounters.

One regular expression, that can capture the value of of an HTML INPUT tag based on the name I define it, with double quotes being totally optional, would be awesome, but not necessary... I have a regular expression that works great when there are quotes around it, but I can't seem to get the one to work that has no double quotes around the value.

Here's what I'm trying for the email address, *when it has double quotes* (it works great):
<input [^>]*(?<=name="?email_Agent_Email_CF15"?[^<]*)(?<=value="([^"]*)"[^<]*)>

But no matter what I try for accommodating for ones with no quotes, just stopping at the first space, I always end up getting extra markup that shouldn't be returned.

Your help is GREATLY appreciated - thanks!


Regular ExpressionsC#

Avatar of undefined
Last Comment
aaron900
Avatar of Terry Woods
Terry Woods
Flag of New Zealand image

Something like this maybe? Note that the value will have a different group number when there are no quotes.

<input [^>]*(?<=name="?email_Agent_Email_CF15"?[^<]*)(?<=value=(?:"([^"]*)"|(\S*))[^<]*)>
Avatar of aaron900
aaron900

ASKER

Hi, Terry -

Thanks for the quick response! I really like the idea - totally fine multiple groups.

However, when I load it up in RegexBuddy (.NET flavor), either way (with or without the double quotes), it tells me that group 1 did not participate in the match, and group 2 is blank??
Avatar of kaufmed
kaufmed
Flag of United States of America image

How about this? The value should be in capture group 1:
(?i)<input (?=.*?name="?email_Agent_Email_CF15)(?=.*?value="?([^">\s]*))

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
SOLUTION
Avatar of Terry Woods
Terry Woods
Flag of New Zealand image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
Avatar of aaron900
aaron900

ASKER

kaufmed - as normal, your understanding of these blows me away. I appreciate your help! For the record, I had something so close to what you came up with - I wish I would have saved it - not sure where I went wrong originally. But much appreciated.

TerryAtOpus - thanks for your input. That definitely did work for me. However, I ended up using kaufmed's because it captured it into the same group. I appreciate your quick and accurate suggestion, though!

BTW, either of you hireable for contract work??? haha! I've thoroughly enjoyed cutting my teeth on these, but when I spend hours on what someone could spend minutes on it, makes me realize I am probably still a bit too green ;-)
C#
C#

C# is an object-oriented programming language created in conjunction with Microsoft’s .NET framework. Compilation is usually done into the Microsoft Intermediate Language (MSIL), which is then JIT-compiled to native code (and cached) during execution in the Common Language Runtime (CLR).

98K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo