Solved

String counter script

Posted on 2013-11-19
8
215 Views
Last Modified: 2013-11-21
I need a script that will glob up a series of XML file and count the number of occurrences of the XML declaration string (<\?xml version)  and the ending string (<\/ENDmessage>) ... if their are more than one of either string string or if one is missing that's the tell-tale sign of a parsing error (encoding issues will cause this)

I wrote this:

my @files = glob("*clean.xml");      
foreach my $file(@files) {

                      open FILE, '<:encoding(UTF-8)', $file or warn "Can't open $file: $!";  
                      open PARSED1, '>:encoding(UTF-8)', ($file . "loading_problems.txt") or warn "Cannot open file for write: $!";  

my $open_declaration_count = 0;
my $closing_declaration_count = 0;

    while (<FILE>) {

    while (/<\?xml version/ig) {
        $open_declaration_count++;
    }
    while (/<\/ENDmessage>/ig) {
        $closing_declaration_count++;
    }

    if ($open_declaration_count > 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }

    if ($closing_declaration_count > 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }
   
    if ($open_declaration_count < 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }

    if ($closing_declaration_count < 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }
   
}

}

print PARSED1;
close FILE;
close PARSED1;

However, I get no output to a file and the output to the screen just repeats loops over the same message
0
Comment
Question by:hadrons
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
8 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 39659795
You probably want to
print PARSED1 "Possible parsing problem with: " . $file . "\n";
after you are done parsing <FILE>
instead of printing to the screen for every line of <FILE>
0
 

Author Comment

by:hadrons
ID: 39660733
I did have to dump the last two condition blocks because of the problems this one was giving me:

    if ($closing_declaration_count < 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }
0
 
LVL 84

Expert Comment

by:ozo
ID: 39660744
What problem was it giving you?
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:hadrons
ID: 39660788
It kept printing at a continues loop; I suspect its because the string </ENDmessage> should only appear once in a file and every line that doesn't match that string is returns.

I know I can slurp up the file by adding $/="";, but I wanted to avoid use it because of the large size of the files.  Basically what I'm looking for is if a file doesn't have </ENDmessage> then execute the print "Possible parsing problem with: " . $file . "\n"; command.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39660807
Your original program was doing that print inside the  while (<FILE>) { loop,
so it would have printed that message once for every line of <FILE> until $closing_declaration_count was incremented.
If I'm understanding what you intended, you don't want to check $closing_declaration_count until after you are done reading through to the end of <FILE>;
0
 

Author Comment

by:hadrons
ID: 39660828
Yes, you have what I have in mind
0
 
LVL 84

Expert Comment

by:ozo
ID: 39660853
So did you understand my suggestion?
You accepted the answer, but then you seemed to be reporting an additional problem.
0
 

Author Comment

by:hadrons
ID: 39667339
You addressed my primary concern in the accepted answer, however, I did mention the follow-up of an issue - that was unrelated to the primary question - in case someone else came to use this code.

However, I did make some adjustments to the <WHILE> loop based on what you suggested and the results came out as I wanted:

    while (<FILE>) {

    while (/<\?xml version/ig) {
        $open_declaration_count++;

    if ($open_declaration_count > 1) {
    print PARSED1 "Possible parsing problem at the top of file with: " . $file . "\n";
    $open_declaration_count = 0;
    }
    }


    while (/<\/ENDmessage>/ig) {
        $closing_declaration_count++;

    if ($closing_declaration_count > 1) {
    print PARSED1 "Possible parsing problem at the bottom of file with: " . $file . "\n";
    $closing_declaration_count = 0;
    }
    }



}

I not sure I followed your advice correctly, but it has produced what I wanted.
0

Featured Post

Want Experts Exchange at your fingertips?

With Experts Exchange’s latest app release, you can now experience our most recent features, updates, and the same community interface while on-the-go. Download our latest app release at the Android or Apple stores today!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

630 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question