asked on

Inserting text strings in parsed output using XML::Twig

I wrote this script (see below) to extract chosen data points from one file to create a new (smaller) xml file and the file itself extract the data fine, but I would like output to add additional tags. I would like the blocks <HEADER> & </HEADER> in-between the m174/m175 data and <PRODUCT> & </PRODUCT> in-between the a00.. and b00.. data

So I would like the output to be:

<HEADER>
<m174>From Calif</m174>
<m175>YYYYMMDD</m174>
</HEADER>

<PRODUCT>
<a001>Lamps</a001>
<b001>Green</b001>
<b002>Money Saver</b002>
</PRODUCT>

<PRODUCT>
<a001>Flashlight</a001>
<a002>Small</a002>
<b001>Yellow</b001>
<b002>Bargain</b002>
</PRODUCT>

Right now the output is:

<m174>From Calif</m174>
<m175>YYYYMMDD</m174>

<a001>Lamps</a001>
<b001>Green</b001>
<b002>Money Saver</b002>

<a001>Flashlight</a001>
<a002>Small</a002>
<b001>Yellow</b001>
<b002>Bargain</b002>

All the tags are suppose to be there, but there's no guarantee ... the person who will be running this script isn't a programmer, but the current structure is one that they can understand.

#!/usr/bin/perl

use strict;
use XML::Twig;
use diagnostics;
use Encode qw(encode decode);
use Time::HiRes qw(gettimeofday);
use File::Copy;

my $t0 = gettimeofday;
our @files = glob("*.xml");
foreach my $file(@files) {

                  open FILE, '<:encoding(UTF-8)', $file or warn "Can't open $file: $!";
                  open PARSED, '>:encoding(UTF-8)', ($file . "_PARSED.txt") or warn "Cannot open file for write: $!";


my $t= XML::Twig->new(
twig_roots => {

       'header/m174' => \&print_m174,
           'header/m175' => \&print_m175,

'product/a001' => \&print_a001,
       'product/a002' => \&print_a002,
           'product/b001' => \&print_b012,
           'product/b002' => \&print_b061,

}
);

eval {$t->parsefile( $file);};

print PARSED"\n\n";

close FILE;

}
      my $t1 = gettimeofday;
      my $elapsed = $t1 - $t0;

## SUB ROUTINES

            sub print_m174
                  { my( $t, $elt)= @_;
                        print PARSED "<m174>" . $elt->text . "<\/m174>\n";
      $t->purge; # frees the memory
}

            sub print_m175
                  { my( $t, $elt)= @_;
                        print PARSED "<m175>" . $elt->text . "<\/m175>\n";
      $t->purge; # frees the memory
}

            sub print_a001
                  { my( $t, $elt)= @_;
                        print PARSED "<a001>" . $elt->text . "<\/a001>\n";
      $t->purge; # frees the memory
}

            sub print_a002
                  { my( $t, $elt)= @_;
                        print PARSED "<a001>" . $elt->text . "<\/a001>\n";
      $t->purge; # frees the memory
}

            sub print_b001
                  { my( $t, $elt)= @_;
                        print PARSED "<a002>" . $elt->text . "<\/a002>\n";
      $t->purge; # frees the memory
}

            sub print_b002
                  { my( $t, $elt)= @_;
                        print PARSED "<a001>" . $elt->text . "<\/a001>\n";
      $t->purge; # frees the memory
}

## END OF SUB ROUTINES

close PARSED;

ozo

What was in the input file you used to get the output you have right now?

hadrons

ASKER

The input file basically looks like the desired output file (see below) ... I simplified the example a bit, but basically these files are huge and there's a lot of data this person doesn't need to see so I want to recreate the same type of file but without the extra data.

This input has extra tags, but I don't need to extract it so I don't call it;

<HEADER>
<m174>From Calif</m174>
<m175>YYYYMMDD</m174>
</HEADER>

<PRODUCT>
<a001>Lamps</a001>
<b001>Green</b001>
<b002>Money Saver</b002>
<d001>HUGE AMOUNT OF DATA</d001>
<j001>JUNK SHE DOESN'T CARE ABOUT</j001>
</PRODUCT>

<PRODUCT>
<a001>Flashlight</a001>
<a002>Small</a002>
<b001>Yellow</b001>
<b002>Bargain</b002>
<h001>SOMEONE ELSE'S CONCERN</h001>
</PRODUCT>

I can give an exact - but small - example if you wish

ASKER CERTIFIED SOLUTION

ozo

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

hadrons

ASKER

I've requested that this question be closed as follows:

Accepted answer: 0 points for hadrons's comment #a39511394

for the following reason:

The script provided as the answer not only did as I wanted, but it was more flexible and ultimately a great improvement on my own; thanks

hadrons

ASKER

The script not only did what was requested, but it was simpler and more flexible than my original script. Thanks for the great work.