Solved

finding and manipulating multiple occurences in same line

Posted on 2003-12-01
6
219 Views
Last Modified: 2010-03-04
Hello:

I have an XML file and do not want to use XML::DOM.
The data I need is in one line like this:
...<DAT>123A</DAT>..otherstuff ...<DAT>123B</DAT>......<DAT>123C</DAT>....

I neet to pull the content of <DAT>..</DAT> possibly to an arry.
So the arry would have 123A, 123B, 123C.
How do I do that using regex without doing splits and so on.  Thanks
Yours Truely.

0
Comment
Question by:basilo
  • 5
6 Comments
 
LVL 28

Accepted Solution

by:
FishMonger earned 125 total points
ID: 9851556
There are several ways to pull out the info; here's one method.

$str = '...<DAT>123A</DAT>..otherstuff ...<DAT>123B</DAT>......<DAT>123C</DAT>...';

(@dat) = $str =~ /<DAT>([^<]+)<\/DAT>/g;
print "$_\n" foreach @dat;
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 9851646
Since you're going to be reading in an xml file, you'd use something closer to this:

open XML, "xml filename" or die "couldn't open xml file $!";

while (<XML>) {
   while (/<DAT>([^<]+)<\/DAT>/g) {
      push @dat, $1;
   }
}
print "$_\n" foreach @dat;
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 9851730
If the info you want is broken up into seperate lines like this:

...<DAT>123A</DAT>..otherstuff ...<DAT>123B
</DAT>......<DAT>123C</DAT>...'
...<DAT>234A
</DAT>..otherstuff ...<DAT>234B</DAT>......<DAT>234C</DAT>...'
...<DAT>345A</DAT>..otherstuff ...<DAT>345B</DAT>......<DAT>
345C</DAT>...'

You can do something like this:

open XML, "xml filename" or die "couldn't open xml file $!";
{
   local $/;
   $dat = <XML>;
}
@dat = $dat =~ /<DAT>([^<]+)<\/DAT>/gs;
foreach (@dat) {s/\n//g}
print Dumper @dat;
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 

Author Comment

by:basilo
ID: 9852015
This is good.    I'm trying to understand the logic behined ([^<]+) and how it is pushed onto @dat.  Thank you.
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 9852132
[^<]
Is a negated character class that says to match any character that is not a <
The + tells it to repeat the match as mush as possible.
The (  ) surrounding it, captures the match into $1 var.
0
 
LVL 28

Expert Comment

by:FishMonger
ID: 9852173
I forgot to add;

Since this is a direct assignment, $1 is assigned to the first element of the @dat array.
The g at the end of the regex tells it find all matches and since it's in list context, each match is assigned to the next element of the array.
0

Featured Post

Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question