hadrons
asked on
Inserting text strings in parsed output using XML::Twig
I wrote this script (see below) to extract chosen data points from one file to create a new (smaller) xml file and the file itself extract the data fine, but I would like output to add additional tags. I would like the blocks <HEADER> & </HEADER> in-between the m174/m175 data and <PRODUCT> & </PRODUCT> in-between the a00.. and b00.. data
So I would like the output to be:
<HEADER>
<m174>From Calif</m174>
<m175>YYYYMMDD</m174>
</HEADER>
<PRODUCT>
<a001>Lamps</a001>
<b001>Green</b001>
<b002>Money Saver</b002>
</PRODUCT>
<PRODUCT>
<a001>Flashlight</a001>
<a002>Small</a002>
<b001>Yellow</b001>
<b002>Bargain</b002>
</PRODUCT>
Right now the output is:
<m174>From Calif</m174>
<m175>YYYYMMDD</m174>
<a001>Lamps</a001>
<b001>Green</b001>
<b002>Money Saver</b002>
<a001>Flashlight</a001>
<a002>Small</a002>
<b001>Yellow</b001>
<b002>Bargain</b002>
All the tags are suppose to be there, but there's no guarantee ... the person who will be running this script isn't a programmer, but the current structure is one that they can understand.
#!/usr/bin/perl
use strict;
use XML::Twig;
use diagnostics;
use Encode qw(encode decode);
use Time::HiRes qw(gettimeofday);
use File::Copy;
my $t0 = gettimeofday;
our @files = glob("*.xml");
foreach my $file(@files) {
open FILE, '<:encoding(UTF-8)', $file or warn "Can't open $file: $!";
open PARSED, '>:encoding(UTF-8)', ($file . "_PARSED.txt") or warn "Cannot open file for write: $!";
my $t= XML::Twig->new(
twig_roots => {
'header/m174' => \&print_m174,
'header/m175' => \&print_m175,
'product/a001' => \&print_a001,
'product/a002' => \&print_a002,
'product/b001' => \&print_b012,
'product/b002' => \&print_b061,
}
);
eval {$t->parsefile( $file);};
print PARSED"\n\n";
close FILE;
}
my $t1 = gettimeofday;
my $elapsed = $t1 - $t0;
## SUB ROUTINES
sub print_m174
{ my( $t, $elt)= @_;
print PARSED "<m174>" . $elt->text . "<\/m174>\n";
$t->purge; # frees the memory
}
sub print_m175
{ my( $t, $elt)= @_;
print PARSED "<m175>" . $elt->text . "<\/m175>\n";
$t->purge; # frees the memory
}
sub print_a001
{ my( $t, $elt)= @_;
print PARSED "<a001>" . $elt->text . "<\/a001>\n";
$t->purge; # frees the memory
}
sub print_a002
{ my( $t, $elt)= @_;
print PARSED "<a001>" . $elt->text . "<\/a001>\n";
$t->purge; # frees the memory
}
sub print_b001
{ my( $t, $elt)= @_;
print PARSED "<a002>" . $elt->text . "<\/a002>\n";
$t->purge; # frees the memory
}
sub print_b002
{ my( $t, $elt)= @_;
print PARSED "<a001>" . $elt->text . "<\/a001>\n";
$t->purge; # frees the memory
}
## END OF SUB ROUTINES
close PARSED;
So I would like the output to be:
<HEADER>
<m174>From Calif</m174>
<m175>YYYYMMDD</m174>
</HEADER>
<PRODUCT>
<a001>Lamps</a001>
<b001>Green</b001>
<b002>Money Saver</b002>
</PRODUCT>
<PRODUCT>
<a001>Flashlight</a001>
<a002>Small</a002>
<b001>Yellow</b001>
<b002>Bargain</b002>
</PRODUCT>
Right now the output is:
<m174>From Calif</m174>
<m175>YYYYMMDD</m174>
<a001>Lamps</a001>
<b001>Green</b001>
<b002>Money Saver</b002>
<a001>Flashlight</a001>
<a002>Small</a002>
<b001>Yellow</b001>
<b002>Bargain</b002>
All the tags are suppose to be there, but there's no guarantee ... the person who will be running this script isn't a programmer, but the current structure is one that they can understand.
#!/usr/bin/perl
use strict;
use XML::Twig;
use diagnostics;
use Encode qw(encode decode);
use Time::HiRes qw(gettimeofday);
use File::Copy;
my $t0 = gettimeofday;
our @files = glob("*.xml");
foreach my $file(@files) {
open FILE, '<:encoding(UTF-8)', $file or warn "Can't open $file: $!";
open PARSED, '>:encoding(UTF-8)', ($file . "_PARSED.txt") or warn "Cannot open file for write: $!";
my $t= XML::Twig->new(
twig_roots => {
'header/m174' => \&print_m174,
'header/m175' => \&print_m175,
'product/a001' => \&print_a001,
'product/a002' => \&print_a002,
'product/b001' => \&print_b012,
'product/b002' => \&print_b061,
}
);
eval {$t->parsefile( $file);};
print PARSED"\n\n";
close FILE;
}
my $t1 = gettimeofday;
my $elapsed = $t1 - $t0;
## SUB ROUTINES
sub print_m174
{ my( $t, $elt)= @_;
print PARSED "<m174>" . $elt->text . "<\/m174>\n";
$t->purge; # frees the memory
}
sub print_m175
{ my( $t, $elt)= @_;
print PARSED "<m175>" . $elt->text . "<\/m175>\n";
$t->purge; # frees the memory
}
sub print_a001
{ my( $t, $elt)= @_;
print PARSED "<a001>" . $elt->text . "<\/a001>\n";
$t->purge; # frees the memory
}
sub print_a002
{ my( $t, $elt)= @_;
print PARSED "<a001>" . $elt->text . "<\/a001>\n";
$t->purge; # frees the memory
}
sub print_b001
{ my( $t, $elt)= @_;
print PARSED "<a002>" . $elt->text . "<\/a002>\n";
$t->purge; # frees the memory
}
sub print_b002
{ my( $t, $elt)= @_;
print PARSED "<a001>" . $elt->text . "<\/a001>\n";
$t->purge; # frees the memory
}
## END OF SUB ROUTINES
close PARSED;
What was in the input file you used to get the output you have right now?
ASKER
The input file basically looks like the desired output file (see below) ... I simplified the example a bit, but basically these files are huge and there's a lot of data this person doesn't need to see so I want to recreate the same type of file but without the extra data.
This input has extra tags, but I don't need to extract it so I don't call it;
<HEADER>
<m174>From Calif</m174>
<m175>YYYYMMDD</m174>
</HEADER>
<PRODUCT>
<a001>Lamps</a001>
<b001>Green</b001>
<b002>Money Saver</b002>
<d001>HUGE AMOUNT OF DATA</d001>
<j001>JUNK SHE DOESN'T CARE ABOUT</j001>
</PRODUCT>
<PRODUCT>
<a001>Flashlight</a001>
<a002>Small</a002>
<b001>Yellow</b001>
<b002>Bargain</b002>
<h001>SOMEONE ELSE'S CONCERN</h001>
</PRODUCT>
I can give an exact - but small - example if you wish
This input has extra tags, but I don't need to extract it so I don't call it;
<HEADER>
<m174>From Calif</m174>
<m175>YYYYMMDD</m174>
</HEADER>
<PRODUCT>
<a001>Lamps</a001>
<b001>Green</b001>
<b002>Money Saver</b002>
<d001>HUGE AMOUNT OF DATA</d001>
<j001>JUNK SHE DOESN'T CARE ABOUT</j001>
</PRODUCT>
<PRODUCT>
<a001>Flashlight</a001>
<a002>Small</a002>
<b001>Yellow</b001>
<b002>Bargain</b002>
<h001>SOMEONE ELSE'S CONCERN</h001>
</PRODUCT>
I can give an exact - but small - example if you wish
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I've requested that this question be closed as follows:
Accepted answer: 0 points for hadrons's comment #a39511394
for the following reason:
The script provided as the answer not only did as I wanted, but it was more flexible and ultimately a great improvement on my own; thanks
Accepted answer: 0 points for hadrons's comment #a39511394
for the following reason:
The script provided as the answer not only did as I wanted, but it was more flexible and ultimately a great improvement on my own; thanks
ASKER
The script not only did what was requested, but it was simpler and more flexible than my original script. Thanks for the great work.