Solved

PERL Expert Needed To Parse A Log file

Posted on 2012-04-02
8
390 Views
Last Modified: 2013-11-13
Hi,
   I need to parse a directory filled with log files.  These log files will have the block of text below or similar text in the log file with other output that is not needed.  This repeated block of text will be unique only due to a base64 encoded string, but each of the log files will have multiple of the same block of text in each file.  I need code that will parse my logs files, and ONLY EXTRACT the unique blocks of text once even though pattern matching may have happened  6 times in the same file.  Ideally, I'd like to see a print output of ONE block of text, and the number of times it's been matched for each unique block of text (there might be more than one per log file).    Sorry this is a bit nebulous, but I can't be more specific due to policy restrictions.  

Thanks in advance and here is the text.  This is just an example of course of my log file:

Unnecessary Text  blahblahblah blah

This is a block of test.  This block of text will be repeated over and over again in a log file that will have similar matches.  The only unique value in this block of code will be a base64 encoded string:  ie YTM0NZomIzI2OTsmIzM0NTueYQ==.  This will be unique  

Unnecessary Text blahblahblahblah


Output might look like this:

Output for log file.1:
This is a block of test.  This block of text will be repeated over and over again in a log file that will have similar matches.  The only unique value in this block of code will be a base64 encoded string:  ie YTM0NZomIzI2OTsmIzM0NTueYQ==.  This will be unique

Matched this block: 5 times.

Thanks in advance.
0
Comment
Question by:unix_admin777
  • 3
  • 3
8 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 37794770
perl -ne 'BEGIN{$/="This is a block of test.  This block of text will be repeated over and over again in a log file that will have similar matches.  The only unique value in this block of code will be a base64 encoded string:  ie YTM0NZomIzI2OTsmIzM0NTueYQ==.  This will be unique"}END{print "Matched this block: ",$.-!chomp()," times\n"}'
0
 

Author Comment

by:unix_admin777
ID: 37794892
Can you please describe your solution in detail?  I've never seen the BEGIN and END block.  Also, I'm making this part of a larger PERL script so if you can break show an example of a full code block as well, that would be great.  Also, each separate log file will have a a number of these blocks of code with different base64 strings so I don't think your solution will work without a regex.  Thanks for the help though.
0
 
LVL 84

Expert Comment

by:ozo
ID: 37795303
What does the larger Perl script do?
What do the blocks with different base64 strings look like?
Can you give an example of the log files and what you want to do with them?
0
VMware Disaster Recovery and Data Protection

In this expert guide, you’ll learn about the components of a Modern Data Center. You will use cases for the value-added capabilities of Veeam®, including combining backup and replication for VMware disaster recovery and using replication for data center migration.

 

Author Comment

by:unix_admin777
ID: 37797026
I've attached a sample for your reference also.  Please note that the first 3 blocks of text that I want to match are the same, and the next two are different (they have different base64 strings).

Thanks.
log.txt
0
 

Author Comment

by:unix_admin777
ID: 37797188
Here is the expected output:

Log file 1 has the following:

First block match:

This is a block of test.  This block of text will be repeated over and over again in a log file that will have similar matches.  The only unique value in this block of code will be a base64 encoded string:  ie YTM0NZomIzI2OTsmIzM0NTueYQ==.  

This match was found: 3 times.

The Base64 string found in this match is: YTM0NZomIzI2OTsmIzM0NTueYQ==

Second block match:

This is a block of test.  This block of text will be repeated over and over again in a log file that will have similar matches.  The only unique value in this block of code will be a base64 encoded string:  ie YTMWZWEF@JIXzTWEEFSDXQWEff=.  

This match was found: 2 times

The Base64 string found in this match is: YTMWZWEF@JIXzTWEEFSDXQWEff=
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 37804791
while( <> ){
    $count{$_}++ if /The only unique value in this block of code will be a base64 encoded string:  ie [\w+\/@]+=+.  This will be unique/;
};
for( keys %count ){
    print "$_\nThis match was found $count{$_} times\n\nThe Base64 string found in this match is: ",/([\w+\/@]+=+)/,"\n\n";
}
0
 
LVL 53

Expert Comment

by:Dhaest
ID: 38249575
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
0

Featured Post

DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
White board coding practice 3 88
perl: Cleaning meta tags using RegEX 12 79
Controlled Assessment GCSE - desperate help needed 4 85
Help Required 2 32
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Computer science students often experience many of the same frustrations when going through their engineering courses. This article presents seven tips I found useful when completing a bachelors and masters degree in computing which I believe may he…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question