Solved

Looking to entire extract record set in XML file if value of one tag is > or = to a specific value

Posted on 2014-11-21
5
192 Views
Last Modified: 2014-12-02
I have a xml file that basically formatted as such:

<product>
....
<a001>product name</a001>
<b002>product date</b002>
<c003>product price</c003>
....
</supplydetail></product>

The record sets start with <product> on its own line and while the </product> is suppose to be on its own line sometimes it isn't and the scheme is flexible that it doesn't have it, but this is just to describe what the file looks like; basically the record set starts at <product> and ends at </product>.

What I needs to a Perl script that will yank the entire record set from <product> to </product> with everything inbetween if the value of <b002>product date</b002> is greater or equal to a specific date. The date info would be in YYYYMMDD form.

So if the record set has a date greater or equal to 20150122 then it would be extracted from the source file. I do have similar scripts, but nothing like a function to evaluate the data value like that. Thanks
0
Comment
Question by:hadrons
  • 2
  • 2
5 Comments
 
LVL 26

Expert Comment

by:wilcoxon
ID: 40458007
What is the full structure of the XML?  You do not give enough information about the XML.  You should always use an XML parsing module when dealing with XML data.  Here's a code piece that will get you most of the way there:
use XML::Simple;
my %opt = (); # may not be needed or may need some of the options set
my $ref = XMLin($filename, %opt) or die "could not parse $filename: $!";
if ($ref->{supplydetail}{b002} > YYYYMMDD) {
    print XMLout($ref, %opt);
}

Open in new window

0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 40458022
perl -ne 'BEGIN{$/="</product>"}print $1 if m#(<product>.*<b002>(.*)</b002>.*)#s && $2 >= 20150122' file.xml
0
 

Author Comment

by:hadrons
ID: 40458303
The command line ozo supplied worked great, but I would like to follow on using a script. I'll attach a sample file to this comment to give you an idea of the XML structure - its just one record, but a file I normally work with can have thousands with additional tags in them. I do use a XML parser, but its usually Twig XML to extract specific parts of the record set whereas in this case I would need to entire record set.

Underneath is an example of what I would normally use to extract specific tags from all record sets, but I'm not sure how to rewrite to pluck out entire record set with a tag with a specific value.

my @files = glob("*.xml");      
foreach my $file(@files) {
          system ("echo currently processing file: $file");
                      open FILE, '<:encoding(UTF-8)', $file or warn "Can't open $file: $!";  
                      open PARSED, '>>:encoding(UTF-8)', ($file . "_isbneans.txt") or warn "Cannot open file for write: $!";  

   while (<FILE>) {
        $_=~ s/^\s+//;
        if (/ONIXmessage/) {
        my $t= XML::Twig->new(
                 twig_roots   => {
                                 'product/a001' => \&print_1,
                                 'product/b004' => \&print_1,
                                 'product/b005' => \&print_1,
                                         'product/productidentifier/b244' => \&print_1,
                 }
                            );

        eval {$t->parsefile( $file);};
        print PARSED;
}
}

##  SUB ROUTINES  
               
                sub print_1
                { my( $t, $elt)= @_;
                  eval{  print PARSED "\n" . $elt->text . "\n"; };
                  warn $@ if $@;
                  $t->purge;                                                                                                                                                                              
                }

}
sample.xml
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 40462884
I've never been a fan of XML::Twig approach - I mostly use XML::Simple or XML::SAX these days.

One word of warning on ozo's command line, it uses regexes so has the same risks and concerns as any other approach that does not use an actual XML parser.  On the other hand, ozo is king of the one-liner.

I think this should work for a modified script (or at least be close):
my @files = glob("*.xml");      
foreach my $file(@files) {
    system ("echo currently processing file: $file"); 
    open FILE, '<:encoding(UTF-8)', $file or warn "Can't open $file: $!";  
    open PARSED, '>>:encoding(UTF-8)', ($file . "_isbneans.txt") or warn "Cannot open file for write: $!";  

    while (<FILE>) {
        s/^\s+//;
        if (/ONIXmessage/) {
            my $t= XML::Twig->new(
                twig_roots   => {
                    'product' => \&do_product,
                    'product/a001' => \&print_1,
                    'product/b004' => \&print_1,
                    'product/b005' => \&print_1,
                    'product/productidentifier/b244' => \&print_1,
                }
            );

            eval {$t->parsefile( $file);};
            print PARSED;
        }
    }
}

##  SUB ROUTINES  
sub do_product {
    my ($t, $elt) = @_;
    return unless ($elt->first_child_text("b002") <= YYYYMMDD);
    eval { print PARSED "\n" . $elt->text . "\n" };
    warn $@ if $@;
}

sub print_1 {
    my( $t, $elt)= @_;
    eval{  print PARSED "\n" . $elt->text . "\n"; };
    warn $@ if $@;
    $t->purge;                                                                                                                                                                               
} 

Open in new window

0
 

Author Closing Comment

by:hadrons
ID: 40477631
Sorry for the delay in grading this solution, but I wanted to give the script a try and split the solution difference, but I wasn't able to get it to go, but I think it does provide a good template to move forward, so I'll tinker with it when I have more free time; thanks again.
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Adobe Customization Wizard XI issues 26 169
pre4 challenge 19 88
sumDigits  challenge 7 62
Not needed 13 58
On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
The viewer will learn how to implement Singleton Design Pattern in Java.
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now