Solved

String counter script

Posted on 2013-11-19
8
213 Views
Last Modified: 2013-11-21
I need a script that will glob up a series of XML file and count the number of occurrences of the XML declaration string (<\?xml version)  and the ending string (<\/ENDmessage>) ... if their are more than one of either string string or if one is missing that's the tell-tale sign of a parsing error (encoding issues will cause this)

I wrote this:

my @files = glob("*clean.xml");      
foreach my $file(@files) {

                      open FILE, '<:encoding(UTF-8)', $file or warn "Can't open $file: $!";  
                      open PARSED1, '>:encoding(UTF-8)', ($file . "loading_problems.txt") or warn "Cannot open file for write: $!";  

my $open_declaration_count = 0;
my $closing_declaration_count = 0;

    while (<FILE>) {

    while (/<\?xml version/ig) {
        $open_declaration_count++;
    }
    while (/<\/ENDmessage>/ig) {
        $closing_declaration_count++;
    }

    if ($open_declaration_count > 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }

    if ($closing_declaration_count > 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }
   
    if ($open_declaration_count < 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }

    if ($closing_declaration_count < 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }
   
}

}

print PARSED1;
close FILE;
close PARSED1;

However, I get no output to a file and the output to the screen just repeats loops over the same message
0
Comment
Question by:hadrons
  • 4
  • 4
8 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 39659795
You probably want to
print PARSED1 "Possible parsing problem with: " . $file . "\n";
after you are done parsing <FILE>
instead of printing to the screen for every line of <FILE>
0
 

Author Comment

by:hadrons
ID: 39660733
I did have to dump the last two condition blocks because of the problems this one was giving me:

    if ($closing_declaration_count < 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }
0
 
LVL 84

Expert Comment

by:ozo
ID: 39660744
What problem was it giving you?
0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 

Author Comment

by:hadrons
ID: 39660788
It kept printing at a continues loop; I suspect its because the string </ENDmessage> should only appear once in a file and every line that doesn't match that string is returns.

I know I can slurp up the file by adding $/="";, but I wanted to avoid use it because of the large size of the files.  Basically what I'm looking for is if a file doesn't have </ENDmessage> then execute the print "Possible parsing problem with: " . $file . "\n"; command.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39660807
Your original program was doing that print inside the  while (<FILE>) { loop,
so it would have printed that message once for every line of <FILE> until $closing_declaration_count was incremented.
If I'm understanding what you intended, you don't want to check $closing_declaration_count until after you are done reading through to the end of <FILE>;
0
 

Author Comment

by:hadrons
ID: 39660828
Yes, you have what I have in mind
0
 
LVL 84

Expert Comment

by:ozo
ID: 39660853
So did you understand my suggestion?
You accepted the answer, but then you seemed to be reporting an additional problem.
0
 

Author Comment

by:hadrons
ID: 39667339
You addressed my primary concern in the accepted answer, however, I did mention the follow-up of an issue - that was unrelated to the primary question - in case someone else came to use this code.

However, I did make some adjustments to the <WHILE> loop based on what you suggested and the results came out as I wanted:

    while (<FILE>) {

    while (/<\?xml version/ig) {
        $open_declaration_count++;

    if ($open_declaration_count > 1) {
    print PARSED1 "Possible parsing problem at the top of file with: " . $file . "\n";
    $open_declaration_count = 0;
    }
    }


    while (/<\/ENDmessage>/ig) {
        $closing_declaration_count++;

    if ($closing_declaration_count > 1) {
    print PARSED1 "Possible parsing problem at the bottom of file with: " . $file . "\n";
    $closing_declaration_count = 0;
    }
    }



}

I not sure I followed your advice correctly, but it has produced what I wanted.
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SSRS - Suppress row when blank from 2nd dataset 2 64
regex to extract ip:john 17 74
edit mobile phone number in office 365 15 77
Regex to filter only IP's? 4 58
I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question