Link to home
Start Free TrialLog in
Avatar of hadrons
hadrons

asked on

Inserting text strings in parsed output using XML::Twig

I wrote this script (see below) to extract chosen data points from one file to create a new (smaller) xml file and the file itself extract the data fine, but I would like output to add additional tags. I would like the blocks <HEADER> & </HEADER> in-between the m174/m175 data and <PRODUCT> & </PRODUCT> in-between the a00.. and b00.. data

So I would like the output to be:

<HEADER>
<m174>From Calif</m174>
<m175>YYYYMMDD</m174>
</HEADER>

<PRODUCT>
<a001>Lamps</a001>
<b001>Green</b001>
<b002>Money Saver</b002>
</PRODUCT>

<PRODUCT>
<a001>Flashlight</a001>
<a002>Small</a002>
<b001>Yellow</b001>
<b002>Bargain</b002>
</PRODUCT>

Right now the output is:

<m174>From Calif</m174>
<m175>YYYYMMDD</m174>

<a001>Lamps</a001>
<b001>Green</b001>
<b002>Money Saver</b002>

<a001>Flashlight</a001>
<a002>Small</a002>
<b001>Yellow</b001>
<b002>Bargain</b002>
 
All the tags are suppose to be there, but there's no guarantee ... the person who will be running this script isn't a programmer, but the current structure is one that they can understand.

#!/usr/bin/perl

use strict;
use XML::Twig;
use diagnostics;
use Encode qw(encode decode);
use Time::HiRes qw(gettimeofday);
use File::Copy;


my $t0 = gettimeofday;
our @files = glob("*.xml");      
foreach my $file(@files) {

                      open FILE, '<:encoding(UTF-8)', $file or warn "Can't open $file: $!";  
                      open PARSED, '>:encoding(UTF-8)', ($file . "_PARSED.txt") or warn "Cannot open file for write: $!";  

                      
         my $t= XML::Twig->new(
                  twig_roots   => {
                                   
                                'header/m174' => \&print_m174,
                             'header/m175' => \&print_m175,
                              
                                'product/a001' => \&print_a001,
                               'product/a002' => \&print_a002,
                             'product/b001' => \&print_b012,
                             'product/b002' => \&print_b061,
 

                                 
                 }
                            );
                           
        eval {$t->parsefile( $file);};

        print PARSED"\n\n";
       
        close FILE;



}            
            my $t1 = gettimeofday;
            my $elapsed = $t1 - $t0;


##  SUB ROUTINES  
         
                sub print_m174
                      { my( $t, $elt)= @_;
                            print PARSED "<m174>" . $elt->text . "<\/m174>\n";
                $t->purge;           # frees the memory
          }

                sub print_m175
                      { my( $t, $elt)= @_;
                            print PARSED "<m175>" . $elt->text . "<\/m175>\n";
                $t->purge;           # frees the memory
          }

                sub print_a001
                      { my( $t, $elt)= @_;
                            print PARSED "<a001>" . $elt->text . "<\/a001>\n";
                $t->purge;           # frees the memory
          }
         
                sub print_a002
                      { my( $t, $elt)= @_;
                            print PARSED "<a001>" . $elt->text . "<\/a001>\n";
                $t->purge;           # frees the memory
          }
         
                sub print_b001
                      { my( $t, $elt)= @_;
                            print PARSED "<a002>" . $elt->text . "<\/a002>\n";
                $t->purge;           # frees the memory
          }
         
                sub print_b002
                      { my( $t, $elt)= @_;
                            print PARSED "<a001>" . $elt->text . "<\/a001>\n";
                $t->purge;           # frees the memory
          }
         

##  END OF SUB ROUTINES  

            close PARSED;
Avatar of ozo
ozo
Flag of United States of America image

What was in the input file you used to get the output you have right now?
Avatar of hadrons
hadrons

ASKER

The input file basically looks like the desired output file (see below) ... I simplified the example a bit, but basically these files are huge and there's a lot of data this person doesn't need to see so I want to recreate the same type of file but without the extra data.

This input has extra tags, but I don't need to extract it so I don't call it;

<HEADER>
<m174>From Calif</m174>
<m175>YYYYMMDD</m174>
</HEADER>

<PRODUCT>
<a001>Lamps</a001>
<b001>Green</b001>
<b002>Money Saver</b002>
<d001>HUGE AMOUNT OF DATA</d001>
<j001>JUNK SHE DOESN'T CARE ABOUT</j001>
</PRODUCT>

<PRODUCT>
<a001>Flashlight</a001>
<a002>Small</a002>
<b001>Yellow</b001>
<b002>Bargain</b002>
<h001>SOMEONE ELSE'S CONCERN</h001>
</PRODUCT>

I can give an exact - but small - example if you wish
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of hadrons

ASKER

I've requested that this question be closed as follows:

Accepted answer: 0 points for hadrons's comment #a39511394

for the following reason:

The script provided as the answer not only did as I wanted, but it was more flexible and ultimately a great improvement on my own; thanks
Avatar of hadrons

ASKER

The script not only did what was requested, but it was simpler and more flexible than my original script. Thanks for the great work.