Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

finding and manipulating multiple occurences in same line

Posted on 2003-12-01
6
Medium Priority
?
238 Views
Last Modified: 2010-03-04
Hello:

I have an XML file and do not want to use XML::DOM.
The data I need is in one line like this:
...<DAT>123A</DAT>..otherstuff ...<DAT>123B</DAT>......<DAT>123C</DAT>....

I neet to pull the content of <DAT>..</DAT> possibly to an arry.
So the arry would have 123A, 123B, 123C.
How do I do that using regex without doing splits and so on.  Thanks
Yours Truely.

0
Comment
Question by:basilo
  • 5
6 Comments
 
LVL 28

Accepted Solution

by:
FishMonger earned 500 total points
ID: 9851556
There are several ways to pull out the info; here's one method.

$str = '...<DAT>123A</DAT>..otherstuff ...<DAT>123B</DAT>......<DAT>123C</DAT>...';

(@dat) = $str =~ /<DAT>([^<]+)<\/DAT>/g;
print "$_\n" foreach @dat;
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 9851646
Since you're going to be reading in an xml file, you'd use something closer to this:

open XML, "xml filename" or die "couldn't open xml file $!";

while (<XML>) {
   while (/<DAT>([^<]+)<\/DAT>/g) {
      push @dat, $1;
   }
}
print "$_\n" foreach @dat;
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 9851730
If the info you want is broken up into seperate lines like this:

...<DAT>123A</DAT>..otherstuff ...<DAT>123B
</DAT>......<DAT>123C</DAT>...'
...<DAT>234A
</DAT>..otherstuff ...<DAT>234B</DAT>......<DAT>234C</DAT>...'
...<DAT>345A</DAT>..otherstuff ...<DAT>345B</DAT>......<DAT>
345C</DAT>...'

You can do something like this:

open XML, "xml filename" or die "couldn't open xml file $!";
{
   local $/;
   $dat = <XML>;
}
@dat = $dat =~ /<DAT>([^<]+)<\/DAT>/gs;
foreach (@dat) {s/\n//g}
print Dumper @dat;
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 

Author Comment

by:basilo
ID: 9852015
This is good.    I'm trying to understand the logic behined ([^<]+) and how it is pushed onto @dat.  Thank you.
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 9852132
[^<]
Is a negated character class that says to match any character that is not a <
The + tells it to repeat the match as mush as possible.
The (  ) surrounding it, captures the match into $1 var.
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 9852173
I forgot to add;

Since this is a direct assignment, $1 is assigned to the first element of the @dat array.
The g at the end of the regex tells it find all matches and since it's in list context, each match is assigned to the next element of the array.
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

885 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question