Solved

String counter script

Posted on 2013-11-19
8
209 Views
Last Modified: 2013-11-21
I need a script that will glob up a series of XML file and count the number of occurrences of the XML declaration string (<\?xml version)  and the ending string (<\/ENDmessage>) ... if their are more than one of either string string or if one is missing that's the tell-tale sign of a parsing error (encoding issues will cause this)

I wrote this:

my @files = glob("*clean.xml");      
foreach my $file(@files) {

                      open FILE, '<:encoding(UTF-8)', $file or warn "Can't open $file: $!";  
                      open PARSED1, '>:encoding(UTF-8)', ($file . "loading_problems.txt") or warn "Cannot open file for write: $!";  

my $open_declaration_count = 0;
my $closing_declaration_count = 0;

    while (<FILE>) {

    while (/<\?xml version/ig) {
        $open_declaration_count++;
    }
    while (/<\/ENDmessage>/ig) {
        $closing_declaration_count++;
    }

    if ($open_declaration_count > 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }

    if ($closing_declaration_count > 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }
   
    if ($open_declaration_count < 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }

    if ($closing_declaration_count < 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }
   
}

}

print PARSED1;
close FILE;
close PARSED1;

However, I get no output to a file and the output to the screen just repeats loops over the same message
0
Comment
Question by:hadrons
  • 4
  • 4
8 Comments
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 39659795
You probably want to
print PARSED1 "Possible parsing problem with: " . $file . "\n";
after you are done parsing <FILE>
instead of printing to the screen for every line of <FILE>
0
 

Author Comment

by:hadrons
ID: 39660733
I did have to dump the last two condition blocks because of the problems this one was giving me:

    if ($closing_declaration_count < 1) {
    print "Possible parsing problem with: " . $file . "\n";
    }
0
 
LVL 84

Expert Comment

by:ozo
ID: 39660744
What problem was it giving you?
0
 

Author Comment

by:hadrons
ID: 39660788
It kept printing at a continues loop; I suspect its because the string </ENDmessage> should only appear once in a file and every line that doesn't match that string is returns.

I know I can slurp up the file by adding $/="";, but I wanted to avoid use it because of the large size of the files.  Basically what I'm looking for is if a file doesn't have </ENDmessage> then execute the print "Possible parsing problem with: " . $file . "\n"; command.
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 84

Expert Comment

by:ozo
ID: 39660807
Your original program was doing that print inside the  while (<FILE>) { loop,
so it would have printed that message once for every line of <FILE> until $closing_declaration_count was incremented.
If I'm understanding what you intended, you don't want to check $closing_declaration_count until after you are done reading through to the end of <FILE>;
0
 

Author Comment

by:hadrons
ID: 39660828
Yes, you have what I have in mind
0
 
LVL 84

Expert Comment

by:ozo
ID: 39660853
So did you understand my suggestion?
You accepted the answer, but then you seemed to be reporting an additional problem.
0
 

Author Comment

by:hadrons
ID: 39667339
You addressed my primary concern in the accepted answer, however, I did mention the follow-up of an issue - that was unrelated to the primary question - in case someone else came to use this code.

However, I did make some adjustments to the <WHILE> loop based on what you suggested and the results came out as I wanted:

    while (<FILE>) {

    while (/<\?xml version/ig) {
        $open_declaration_count++;

    if ($open_declaration_count > 1) {
    print PARSED1 "Possible parsing problem at the top of file with: " . $file . "\n";
    $open_declaration_count = 0;
    }
    }


    while (/<\/ENDmessage>/ig) {
        $closing_declaration_count++;

    if ($closing_declaration_count > 1) {
    print PARSED1 "Possible parsing problem at the bottom of file with: " . $file . "\n";
    $closing_declaration_count = 0;
    }
    }



}

I not sure I followed your advice correctly, but it has produced what I wanted.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now