Solved

PERL Expert Needed To Parse A Log file

Posted on 2012-04-02
8
386 Views
Last Modified: 2013-11-13
Hi,
   I need to parse a directory filled with log files.  These log files will have the block of text below or similar text in the log file with other output that is not needed.  This repeated block of text will be unique only due to a base64 encoded string, but each of the log files will have multiple of the same block of text in each file.  I need code that will parse my logs files, and ONLY EXTRACT the unique blocks of text once even though pattern matching may have happened  6 times in the same file.  Ideally, I'd like to see a print output of ONE block of text, and the number of times it's been matched for each unique block of text (there might be more than one per log file).    Sorry this is a bit nebulous, but I can't be more specific due to policy restrictions.  

Thanks in advance and here is the text.  This is just an example of course of my log file:

Unnecessary Text  blahblahblah blah

This is a block of test.  This block of text will be repeated over and over again in a log file that will have similar matches.  The only unique value in this block of code will be a base64 encoded string:  ie YTM0NZomIzI2OTsmIzM0NTueYQ==.  This will be unique  

Unnecessary Text blahblahblahblah


Output might look like this:

Output for log file.1:
This is a block of test.  This block of text will be repeated over and over again in a log file that will have similar matches.  The only unique value in this block of code will be a base64 encoded string:  ie YTM0NZomIzI2OTsmIzM0NTueYQ==.  This will be unique

Matched this block: 5 times.

Thanks in advance.
0
Comment
Question by:unix_admin777
  • 3
  • 3
8 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 37794770
perl -ne 'BEGIN{$/="This is a block of test.  This block of text will be repeated over and over again in a log file that will have similar matches.  The only unique value in this block of code will be a base64 encoded string:  ie YTM0NZomIzI2OTsmIzM0NTueYQ==.  This will be unique"}END{print "Matched this block: ",$.-!chomp()," times\n"}'
0
 

Author Comment

by:unix_admin777
ID: 37794892
Can you please describe your solution in detail?  I've never seen the BEGIN and END block.  Also, I'm making this part of a larger PERL script so if you can break show an example of a full code block as well, that would be great.  Also, each separate log file will have a a number of these blocks of code with different base64 strings so I don't think your solution will work without a regex.  Thanks for the help though.
0
 
LVL 84

Expert Comment

by:ozo
ID: 37795303
What does the larger Perl script do?
What do the blocks with different base64 strings look like?
Can you give an example of the log files and what you want to do with them?
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:unix_admin777
ID: 37797026
I've attached a sample for your reference also.  Please note that the first 3 blocks of text that I want to match are the same, and the next two are different (they have different base64 strings).

Thanks.
log.txt
0
 

Author Comment

by:unix_admin777
ID: 37797188
Here is the expected output:

Log file 1 has the following:

First block match:

This is a block of test.  This block of text will be repeated over and over again in a log file that will have similar matches.  The only unique value in this block of code will be a base64 encoded string:  ie YTM0NZomIzI2OTsmIzM0NTueYQ==.  

This match was found: 3 times.

The Base64 string found in this match is: YTM0NZomIzI2OTsmIzM0NTueYQ==

Second block match:

This is a block of test.  This block of text will be repeated over and over again in a log file that will have similar matches.  The only unique value in this block of code will be a base64 encoded string:  ie YTMWZWEF@JIXzTWEEFSDXQWEff=.  

This match was found: 2 times

The Base64 string found in this match is: YTMWZWEF@JIXzTWEEFSDXQWEff=
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 37804791
while( <> ){
    $count{$_}++ if /The only unique value in this block of code will be a base64 encoded string:  ie [\w+\/@]+=+.  This will be unique/;
};
for( keys %count ){
    print "$_\nThis match was found $count{$_} times\n\nThe Base64 string found in this match is: ",/([\w+\/@]+=+)/,"\n\n";
}
0
 
LVL 53

Expert Comment

by:Dhaest
ID: 38249575
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
allStar challenge 1 52
splitOdd10 challenge 5 80
Matching a random pattern with one common character 2 45
Basic Java Case or If-Else statement... 3 43
Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user a…
A short article about problems I had with the new location API and permissions in Marshmallow
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now