Avatar of JaeWebb
JaeWebb

asked on 

Using Regular Expression to Extract XML Nodes w/Specified Name, Irrespective of Node's Elements

I am parsing a string before it is turned into XML.  I need to find and delete all occurrences of XML nodes of a specific type, no matter what elements appear inside the XML tag.  For example, the regular expression should evaluate all of these strings as 'true':

<Column />
<Column Type="1.0" />
<Column Type="1.0" Name="AZ-0" />

I have a little book that gives me the impression that I should be using this symbol ".+", however, I'm unsure of how to do it.  I need an answer quickly!  Thanks.
Adobe FlashRegular ExpressionsJavaScript

Avatar of undefined
Last Comment
ddrudik
Avatar of abel
abel
Flag of Netherlands image

]+/>
ASKER CERTIFIED SOLUTION
Avatar of abel
abel
Flag of Netherlands image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of JaeWebb
JaeWebb

ASKER

Thank you - and thanks for the thorough explanation too.
Avatar of abel
abel
Flag of Netherlands image

The problems occur when elements are nested inside themselves . If that does not happen, then the following will work for you. The ".+?" means a non-greedy match of everything, looking for the shortest match possible. This is more likely to work correctly then ".+" (without question mark) because then the match would be greedy, trying to grab as much from the document as possible, in which case sibling elements may yield wrong results.

<Column[^>]+/>|<Column[^>]+>.+</Column>

Open in new window

Avatar of abel
abel
Flag of Netherlands image

>          Thank you - and thanks for the thorough explanation too.

oh, you're welcome. LOL, I was still typing the full answer (which came in two parts). Well, you get those extra bits for free :-)
Avatar of ddrudik
ddrudik
Flag of United States of America image

Note that the / in:
]+/>
can be left off of the pattern since / is already in [^>] character set.
Avatar of abel
abel
Flag of Netherlands image

@ddrudik: that was on purpose. Because you either:
  1. want to match an empty tag
  2. want to match an opening and a closing tag
and you don't want to accidentally match only an opening tag without a closing tag.

But, like I say in my comments, there are quite some prerequisites for an XML file to become parsable with such a simple regular expression.
Avatar of ddrudik
ddrudik
Flag of United States of America image

abel, I understand now, thanks.
JavaScript
JavaScript

JavaScript is a dynamic, object-based language commonly used for client-side scripting in web browsers. Recently, server side JavaScript frameworks have also emerged. JavaScript runs on nearly every operating system and in almost every mainstream web browser.

127K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo