Solved

Looking to entire extract record set in XML file if value of one tag is > or = to a specific value

Posted on 2014-11-21
5
197 Views
Last Modified: 2014-12-02
I have a xml file that basically formatted as such:

<product>
....
<a001>product name</a001>
<b002>product date</b002>
<c003>product price</c003>
....
</supplydetail></product>

The record sets start with <product> on its own line and while the </product> is suppose to be on its own line sometimes it isn't and the scheme is flexible that it doesn't have it, but this is just to describe what the file looks like; basically the record set starts at <product> and ends at </product>.

What I needs to a Perl script that will yank the entire record set from <product> to </product> with everything inbetween if the value of <b002>product date</b002> is greater or equal to a specific date. The date info would be in YYYYMMDD form.

So if the record set has a date greater or equal to 20150122 then it would be extracted from the source file. I do have similar scripts, but nothing like a function to evaluate the data value like that. Thanks
0
Comment
Question by:hadrons
  • 2
  • 2
5 Comments
 
LVL 26

Expert Comment

by:wilcoxon
ID: 40458007
What is the full structure of the XML?  You do not give enough information about the XML.  You should always use an XML parsing module when dealing with XML data.  Here's a code piece that will get you most of the way there:
use XML::Simple;
my %opt = (); # may not be needed or may need some of the options set
my $ref = XMLin($filename, %opt) or die "could not parse $filename: $!";
if ($ref->{supplydetail}{b002} > YYYYMMDD) {
    print XMLout($ref, %opt);
}

Open in new window

0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 40458022
perl -ne 'BEGIN{$/="</product>"}print $1 if m#(<product>.*<b002>(.*)</b002>.*)#s && $2 >= 20150122' file.xml
0
 

Author Comment

by:hadrons
ID: 40458303
The command line ozo supplied worked great, but I would like to follow on using a script. I'll attach a sample file to this comment to give you an idea of the XML structure - its just one record, but a file I normally work with can have thousands with additional tags in them. I do use a XML parser, but its usually Twig XML to extract specific parts of the record set whereas in this case I would need to entire record set.

Underneath is an example of what I would normally use to extract specific tags from all record sets, but I'm not sure how to rewrite to pluck out entire record set with a tag with a specific value.

my @files = glob("*.xml");      
foreach my $file(@files) {
          system ("echo currently processing file: $file");
                      open FILE, '<:encoding(UTF-8)', $file or warn "Can't open $file: $!";  
                      open PARSED, '>>:encoding(UTF-8)', ($file . "_isbneans.txt") or warn "Cannot open file for write: $!";  

   while (<FILE>) {
        $_=~ s/^\s+//;
        if (/ONIXmessage/) {
        my $t= XML::Twig->new(
                 twig_roots   => {
                                 'product/a001' => \&print_1,
                                 'product/b004' => \&print_1,
                                 'product/b005' => \&print_1,
                                         'product/productidentifier/b244' => \&print_1,
                 }
                            );

        eval {$t->parsefile( $file);};
        print PARSED;
}
}

##  SUB ROUTINES  
               
                sub print_1
                { my( $t, $elt)= @_;
                  eval{  print PARSED "\n" . $elt->text . "\n"; };
                  warn $@ if $@;
                  $t->purge;                                                                                                                                                                              
                }

}
sample.xml
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 40462884
I've never been a fan of XML::Twig approach - I mostly use XML::Simple or XML::SAX these days.

One word of warning on ozo's command line, it uses regexes so has the same risks and concerns as any other approach that does not use an actual XML parser.  On the other hand, ozo is king of the one-liner.

I think this should work for a modified script (or at least be close):
my @files = glob("*.xml");      
foreach my $file(@files) {
    system ("echo currently processing file: $file"); 
    open FILE, '<:encoding(UTF-8)', $file or warn "Can't open $file: $!";  
    open PARSED, '>>:encoding(UTF-8)', ($file . "_isbneans.txt") or warn "Cannot open file for write: $!";  

    while (<FILE>) {
        s/^\s+//;
        if (/ONIXmessage/) {
            my $t= XML::Twig->new(
                twig_roots   => {
                    'product' => \&do_product,
                    'product/a001' => \&print_1,
                    'product/b004' => \&print_1,
                    'product/b005' => \&print_1,
                    'product/productidentifier/b244' => \&print_1,
                }
            );

            eval {$t->parsefile( $file);};
            print PARSED;
        }
    }
}

##  SUB ROUTINES  
sub do_product {
    my ($t, $elt) = @_;
    return unless ($elt->first_child_text("b002") <= YYYYMMDD);
    eval { print PARSED "\n" . $elt->text . "\n" };
    warn $@ if $@;
}

sub print_1 {
    my( $t, $elt)= @_;
    eval{  print PARSED "\n" . $elt->text . "\n"; };
    warn $@ if $@;
    $t->purge;                                                                                                                                                                               
} 

Open in new window

0
 

Author Closing Comment

by:hadrons
ID: 40477631
Sorry for the delay in grading this solution, but I wanted to give the script a try and split the solution difference, but I wasn't able to get it to go, but I think it does provide a good template to move forward, so I'll tinker with it when I have more free time; thanks again.
0

Featured Post

Active Directory Webinar

We all know we need to protect and secure our privileges, but where to start? Join Experts Exchange and ManageEngine on Tuesday, April 11, 2017 10:00 AM PDT to learn how to track and secure privileged users in Active Directory.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
EvenOdd challenge 10 128
Re-position the objects 7 117
Bartender Integration Builder 3 45
Perl script to process a .csv file 18 45
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Whether you’re a college noob or a soon-to-be pro, these tips are sure to help you in your journey to becoming a programming ninja and stand out from the crowd.
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.

820 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question