Solved

String counter script

Posted on 2013-11-19
8
212 Views
Last Modified: 2013-11-21
I need a script that will glob up a series of XML file and count the number of occurrences of the XML declaration string (<\?xml version)  and the ending string (<\/ENDmessage>) ... if their are more than one of either string string or if one is missing that's the tell-tale sign of a parsing error (encoding issues will cause this)

I wrote this:

my @files = glob("*clean.xml");      
foreach my $file(@files) {

                      open FILE, '<:encoding(UTF-8)', $file or warn "Can't open $file: $!";  
                      open PARSED1, '>:encoding(UTF-8)', ($file . "loading_problems.txt") or warn "Cannot open file for write: $!";  

my $open_declaration_count = 0;
my $closing_declaration_count = 0;

    while (<FILE>) {

    while (/<\?xml version/ig) {
        $open_declaration_count++;
    }
    while (/<\/ENDmessage>/ig) {
        $closing_declaration_count++;
    }

    if ($open_declaration_count > 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }

    if ($closing_declaration_count > 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }
   
    if ($open_declaration_count < 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }

    if ($closing_declaration_count < 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }
   
}

}

print PARSED1;
close FILE;
close PARSED1;

However, I get no output to a file and the output to the screen just repeats loops over the same message
0
Comment
Question by:hadrons
  • 4
  • 4
8 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 39659795
You probably want to
print PARSED1 "Possible parsing problem with: " . $file . "\n";
after you are done parsing <FILE>
instead of printing to the screen for every line of <FILE>
0
 

Author Comment

by:hadrons
ID: 39660733
I did have to dump the last two condition blocks because of the problems this one was giving me:

    if ($closing_declaration_count < 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }
0
 
LVL 84

Expert Comment

by:ozo
ID: 39660744
What problem was it giving you?
0
Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

 

Author Comment

by:hadrons
ID: 39660788
It kept printing at a continues loop; I suspect its because the string </ENDmessage> should only appear once in a file and every line that doesn't match that string is returns.

I know I can slurp up the file by adding $/="";, but I wanted to avoid use it because of the large size of the files.  Basically what I'm looking for is if a file doesn't have </ENDmessage> then execute the print "Possible parsing problem with: " . $file . "\n"; command.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39660807
Your original program was doing that print inside the  while (<FILE>) { loop,
so it would have printed that message once for every line of <FILE> until $closing_declaration_count was incremented.
If I'm understanding what you intended, you don't want to check $closing_declaration_count until after you are done reading through to the end of <FILE>;
0
 

Author Comment

by:hadrons
ID: 39660828
Yes, you have what I have in mind
0
 
LVL 84

Expert Comment

by:ozo
ID: 39660853
So did you understand my suggestion?
You accepted the answer, but then you seemed to be reporting an additional problem.
0
 

Author Comment

by:hadrons
ID: 39667339
You addressed my primary concern in the accepted answer, however, I did mention the follow-up of an issue - that was unrelated to the primary question - in case someone else came to use this code.

However, I did make some adjustments to the <WHILE> loop based on what you suggested and the results came out as I wanted:

    while (<FILE>) {

    while (/<\?xml version/ig) {
        $open_declaration_count++;

    if ($open_declaration_count > 1) {
    print PARSED1 "Possible parsing problem at the top of file with: " . $file . "\n";
    $open_declaration_count = 0;
    }
    }


    while (/<\/ENDmessage>/ig) {
        $closing_declaration_count++;

    if ($closing_declaration_count > 1) {
    print PARSED1 "Possible parsing problem at the bottom of file with: " . $file . "\n";
    $closing_declaration_count = 0;
    }
    }



}

I not sure I followed your advice correctly, but it has produced what I wanted.
0

Featured Post

Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question