Solved

Looking to entire extract record set in XML file if value of one tag is > or = to a specific value

Posted on 2014-11-21
5
199 Views
Last Modified: 2014-12-02
I have a xml file that basically formatted as such:

<product>
....
<a001>product name</a001>
<b002>product date</b002>
<c003>product price</c003>
....
</supplydetail></product>

The record sets start with <product> on its own line and while the </product> is suppose to be on its own line sometimes it isn't and the scheme is flexible that it doesn't have it, but this is just to describe what the file looks like; basically the record set starts at <product> and ends at </product>.

What I needs to a Perl script that will yank the entire record set from <product> to </product> with everything inbetween if the value of <b002>product date</b002> is greater or equal to a specific date. The date info would be in YYYYMMDD form.

So if the record set has a date greater or equal to 20150122 then it would be extracted from the source file. I do have similar scripts, but nothing like a function to evaluate the data value like that. Thanks
0
Comment
Question by:hadrons
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 26

Expert Comment

by:wilcoxon
ID: 40458007
What is the full structure of the XML?  You do not give enough information about the XML.  You should always use an XML parsing module when dealing with XML data.  Here's a code piece that will get you most of the way there:
use XML::Simple;
my %opt = (); # may not be needed or may need some of the options set
my $ref = XMLin($filename, %opt) or die "could not parse $filename: $!";
if ($ref->{supplydetail}{b002} > YYYYMMDD) {
    print XMLout($ref, %opt);
}

Open in new window

0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 40458022
perl -ne 'BEGIN{$/="</product>"}print $1 if m#(<product>.*<b002>(.*)</b002>.*)#s && $2 >= 20150122' file.xml
0
 

Author Comment

by:hadrons
ID: 40458303
The command line ozo supplied worked great, but I would like to follow on using a script. I'll attach a sample file to this comment to give you an idea of the XML structure - its just one record, but a file I normally work with can have thousands with additional tags in them. I do use a XML parser, but its usually Twig XML to extract specific parts of the record set whereas in this case I would need to entire record set.

Underneath is an example of what I would normally use to extract specific tags from all record sets, but I'm not sure how to rewrite to pluck out entire record set with a tag with a specific value.

my @files = glob("*.xml");      
foreach my $file(@files) {
          system ("echo currently processing file: $file");
                      open FILE, '<:encoding(UTF-8)', $file or warn "Can't open $file: $!";  
                      open PARSED, '>>:encoding(UTF-8)', ($file . "_isbneans.txt") or warn "Cannot open file for write: $!";  

   while (<FILE>) {
        $_=~ s/^\s+//;
        if (/ONIXmessage/) {
        my $t= XML::Twig->new(
                 twig_roots   => {
                                 'product/a001' => \&print_1,
                                 'product/b004' => \&print_1,
                                 'product/b005' => \&print_1,
                                         'product/productidentifier/b244' => \&print_1,
                 }
                            );

        eval {$t->parsefile( $file);};
        print PARSED;
}
}

##  SUB ROUTINES  
               
                sub print_1
                { my( $t, $elt)= @_;
                  eval{  print PARSED "\n" . $elt->text . "\n"; };
                  warn $@ if $@;
                  $t->purge;                                                                                                                                                                              
                }

}
sample.xml
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 40462884
I've never been a fan of XML::Twig approach - I mostly use XML::Simple or XML::SAX these days.

One word of warning on ozo's command line, it uses regexes so has the same risks and concerns as any other approach that does not use an actual XML parser.  On the other hand, ozo is king of the one-liner.

I think this should work for a modified script (or at least be close):
my @files = glob("*.xml");      
foreach my $file(@files) {
    system ("echo currently processing file: $file"); 
    open FILE, '<:encoding(UTF-8)', $file or warn "Can't open $file: $!";  
    open PARSED, '>>:encoding(UTF-8)', ($file . "_isbneans.txt") or warn "Cannot open file for write: $!";  

    while (<FILE>) {
        s/^\s+//;
        if (/ONIXmessage/) {
            my $t= XML::Twig->new(
                twig_roots   => {
                    'product' => \&do_product,
                    'product/a001' => \&print_1,
                    'product/b004' => \&print_1,
                    'product/b005' => \&print_1,
                    'product/productidentifier/b244' => \&print_1,
                }
            );

            eval {$t->parsefile( $file);};
            print PARSED;
        }
    }
}

##  SUB ROUTINES  
sub do_product {
    my ($t, $elt) = @_;
    return unless ($elt->first_child_text("b002") <= YYYYMMDD);
    eval { print PARSED "\n" . $elt->text . "\n" };
    warn $@ if $@;
}

sub print_1 {
    my( $t, $elt)= @_;
    eval{  print PARSED "\n" . $elt->text . "\n"; };
    warn $@ if $@;
    $t->purge;                                                                                                                                                                               
} 

Open in new window

0
 

Author Closing Comment

by:hadrons
ID: 40477631
Sorry for the delay in grading this solution, but I wanted to give the script a try and split the solution difference, but I wasn't able to get it to go, but I think it does provide a good template to move forward, so I'll tinker with it when I have more free time; thanks again.
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
bunnyEars challenge 6 195
Regex rule to match two different url 5 77
Problem to start Neon 20 158
Control Number of Log Files -Perl 7 102
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
Windows Script Host (WSH) has been part of Windows since Windows NT4. Windows Script Host provides architecture for building dynamic scripts that consist of a core object model, scripting hosts, and scripting engines. The key components of Window…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question