Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

String counter script

Posted on 2013-11-19
8
Medium Priority
?
221 Views
Last Modified: 2013-11-21
I need a script that will glob up a series of XML file and count the number of occurrences of the XML declaration string (<\?xml version)  and the ending string (<\/ENDmessage>) ... if their are more than one of either string string or if one is missing that's the tell-tale sign of a parsing error (encoding issues will cause this)

I wrote this:

my @files = glob("*clean.xml");      
foreach my $file(@files) {

                      open FILE, '<:encoding(UTF-8)', $file or warn "Can't open $file: $!";  
                      open PARSED1, '>:encoding(UTF-8)', ($file . "loading_problems.txt") or warn "Cannot open file for write: $!";  

my $open_declaration_count = 0;
my $closing_declaration_count = 0;

    while (<FILE>) {

    while (/<\?xml version/ig) {
        $open_declaration_count++;
    }
    while (/<\/ENDmessage>/ig) {
        $closing_declaration_count++;
    }

    if ($open_declaration_count > 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }

    if ($closing_declaration_count > 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }
   
    if ($open_declaration_count < 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }

    if ($closing_declaration_count < 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }
   
}

}

print PARSED1;
close FILE;
close PARSED1;

However, I get no output to a file and the output to the screen just repeats loops over the same message
0
Comment
Question by:hadrons
  • 4
  • 4
8 Comments
 
LVL 85

Accepted Solution

by:
ozo earned 2000 total points
ID: 39659795
You probably want to
print PARSED1 "Possible parsing problem with: " . $file . "\n";
after you are done parsing <FILE>
instead of printing to the screen for every line of <FILE>
0
 

Author Comment

by:hadrons
ID: 39660733
I did have to dump the last two condition blocks because of the problems this one was giving me:

    if ($closing_declaration_count < 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }
0
 
LVL 85

Expert Comment

by:ozo
ID: 39660744
What problem was it giving you?
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 

Author Comment

by:hadrons
ID: 39660788
It kept printing at a continues loop; I suspect its because the string </ENDmessage> should only appear once in a file and every line that doesn't match that string is returns.

I know I can slurp up the file by adding $/="";, but I wanted to avoid use it because of the large size of the files.  Basically what I'm looking for is if a file doesn't have </ENDmessage> then execute the print "Possible parsing problem with: " . $file . "\n"; command.
0
 
LVL 85

Expert Comment

by:ozo
ID: 39660807
Your original program was doing that print inside the  while (<FILE>) { loop,
so it would have printed that message once for every line of <FILE> until $closing_declaration_count was incremented.
If I'm understanding what you intended, you don't want to check $closing_declaration_count until after you are done reading through to the end of <FILE>;
0
 

Author Comment

by:hadrons
ID: 39660828
Yes, you have what I have in mind
0
 
LVL 85

Expert Comment

by:ozo
ID: 39660853
So did you understand my suggestion?
You accepted the answer, but then you seemed to be reporting an additional problem.
0
 

Author Comment

by:hadrons
ID: 39667339
You addressed my primary concern in the accepted answer, however, I did mention the follow-up of an issue - that was unrelated to the primary question - in case someone else came to use this code.

However, I did make some adjustments to the <WHILE> loop based on what you suggested and the results came out as I wanted:

    while (<FILE>) {

    while (/<\?xml version/ig) {
        $open_declaration_count++;

    if ($open_declaration_count > 1) {
    print PARSED1 "Possible parsing problem at the top of file with: " . $file . "\n";
    $open_declaration_count = 0;
    }
    }


    while (/<\/ENDmessage>/ig) {
        $closing_declaration_count++;

    if ($closing_declaration_count > 1) {
    print PARSED1 "Possible parsing problem at the bottom of file with: " . $file . "\n";
    $closing_declaration_count = 0;
    }
    }



}

I not sure I followed your advice correctly, but it has produced what I wanted.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Six Sigma Control Plans

824 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question