Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 205
  • Last Modified:

Looking to entire extract record set in XML file if value of one tag is > or = to a specific value

I have a xml file that basically formatted as such:

<product>
....
<a001>product name</a001>
<b002>product date</b002>
<c003>product price</c003>
....
</supplydetail></product>

The record sets start with <product> on its own line and while the </product> is suppose to be on its own line sometimes it isn't and the scheme is flexible that it doesn't have it, but this is just to describe what the file looks like; basically the record set starts at <product> and ends at </product>.

What I needs to a Perl script that will yank the entire record set from <product> to </product> with everything inbetween if the value of <b002>product date</b002> is greater or equal to a specific date. The date info would be in YYYYMMDD form.

So if the record set has a date greater or equal to 20150122 then it would be extracted from the source file. I do have similar scripts, but nothing like a function to evaluate the data value like that. Thanks
0
hadrons
Asked:
hadrons
  • 2
  • 2
1 Solution
 
wilcoxonCommented:
What is the full structure of the XML?  You do not give enough information about the XML.  You should always use an XML parsing module when dealing with XML data.  Here's a code piece that will get you most of the way there:
use XML::Simple;
my %opt = (); # may not be needed or may need some of the options set
my $ref = XMLin($filename, %opt) or die "could not parse $filename: $!";
if ($ref->{supplydetail}{b002} > YYYYMMDD) {
    print XMLout($ref, %opt);
}

Open in new window

0
 
ozoCommented:
perl -ne 'BEGIN{$/="</product>"}print $1 if m#(<product>.*<b002>(.*)</b002>.*)#s && $2 >= 20150122' file.xml
0
 
hadronsAuthor Commented:
The command line ozo supplied worked great, but I would like to follow on using a script. I'll attach a sample file to this comment to give you an idea of the XML structure - its just one record, but a file I normally work with can have thousands with additional tags in them. I do use a XML parser, but its usually Twig XML to extract specific parts of the record set whereas in this case I would need to entire record set.

Underneath is an example of what I would normally use to extract specific tags from all record sets, but I'm not sure how to rewrite to pluck out entire record set with a tag with a specific value.

my @files = glob("*.xml");      
foreach my $file(@files) {
          system ("echo currently processing file: $file");
                      open FILE, '<:encoding(UTF-8)', $file or warn "Can't open $file: $!";  
                      open PARSED, '>>:encoding(UTF-8)', ($file . "_isbneans.txt") or warn "Cannot open file for write: $!";  

   while (<FILE>) {
        $_=~ s/^\s+//;
        if (/ONIXmessage/) {
        my $t= XML::Twig->new(
                 twig_roots   => {
                                 'product/a001' => \&print_1,
                                 'product/b004' => \&print_1,
                                 'product/b005' => \&print_1,
                                         'product/productidentifier/b244' => \&print_1,
                 }
                            );

        eval {$t->parsefile( $file);};
        print PARSED;
}
}

##  SUB ROUTINES  
               
                sub print_1
                { my( $t, $elt)= @_;
                  eval{  print PARSED "\n" . $elt->text . "\n"; };
                  warn $@ if $@;
                  $t->purge;                                                                                                                                                                              
                }

}
sample.xml
0
 
wilcoxonCommented:
I've never been a fan of XML::Twig approach - I mostly use XML::Simple or XML::SAX these days.

One word of warning on ozo's command line, it uses regexes so has the same risks and concerns as any other approach that does not use an actual XML parser.  On the other hand, ozo is king of the one-liner.

I think this should work for a modified script (or at least be close):
my @files = glob("*.xml");      
foreach my $file(@files) {
    system ("echo currently processing file: $file"); 
    open FILE, '<:encoding(UTF-8)', $file or warn "Can't open $file: $!";  
    open PARSED, '>>:encoding(UTF-8)', ($file . "_isbneans.txt") or warn "Cannot open file for write: $!";  

    while (<FILE>) {
        s/^\s+//;
        if (/ONIXmessage/) {
            my $t= XML::Twig->new(
                twig_roots   => {
                    'product' => \&do_product,
                    'product/a001' => \&print_1,
                    'product/b004' => \&print_1,
                    'product/b005' => \&print_1,
                    'product/productidentifier/b244' => \&print_1,
                }
            );

            eval {$t->parsefile( $file);};
            print PARSED;
        }
    }
}

##  SUB ROUTINES  
sub do_product {
    my ($t, $elt) = @_;
    return unless ($elt->first_child_text("b002") <= YYYYMMDD);
    eval { print PARSED "\n" . $elt->text . "\n" };
    warn $@ if $@;
}

sub print_1 {
    my( $t, $elt)= @_;
    eval{  print PARSED "\n" . $elt->text . "\n"; };
    warn $@ if $@;
    $t->purge;                                                                                                                                                                               
} 

Open in new window

0
 
hadronsAuthor Commented:
Sorry for the delay in grading this solution, but I wanted to give the script a try and split the solution difference, but I wasn't able to get it to go, but I think it does provide a good template to move forward, so I'll tinker with it when I have more free time; thanks again.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now