unix_admin777
asked on
PERL Expert Needed To Parse A Log file
Hi,
I need to parse a directory filled with log files. These log files will have the block of text below or similar text in the log file with other output that is not needed. This repeated block of text will be unique only due to a base64 encoded string, but each of the log files will have multiple of the same block of text in each file. I need code that will parse my logs files, and ONLY EXTRACT the unique blocks of text once even though pattern matching may have happened 6 times in the same file. Ideally, I'd like to see a print output of ONE block of text, and the number of times it's been matched for each unique block of text (there might be more than one per log file). Sorry this is a bit nebulous, but I can't be more specific due to policy restrictions.
Thanks in advance and here is the text. This is just an example of course of my log file:
Unnecessary Text blahblahblah blah
This is a block of test. This block of text will be repeated over and over again in a log file that will have similar matches. The only unique value in this block of code will be a base64 encoded string: ie YTM0NZomIzI2OTsmIzM0NTueYQ ==. This will be unique
Unnecessary Text blahblahblahblah
Output might look like this:
Output for log file.1:
This is a block of test. This block of text will be repeated over and over again in a log file that will have similar matches. The only unique value in this block of code will be a base64 encoded string: ie YTM0NZomIzI2OTsmIzM0NTueYQ ==. This will be unique
Matched this block: 5 times.
Thanks in advance.
I need to parse a directory filled with log files. These log files will have the block of text below or similar text in the log file with other output that is not needed. This repeated block of text will be unique only due to a base64 encoded string, but each of the log files will have multiple of the same block of text in each file. I need code that will parse my logs files, and ONLY EXTRACT the unique blocks of text once even though pattern matching may have happened 6 times in the same file. Ideally, I'd like to see a print output of ONE block of text, and the number of times it's been matched for each unique block of text (there might be more than one per log file). Sorry this is a bit nebulous, but I can't be more specific due to policy restrictions.
Thanks in advance and here is the text. This is just an example of course of my log file:
Unnecessary Text blahblahblah blah
This is a block of test. This block of text will be repeated over and over again in a log file that will have similar matches. The only unique value in this block of code will be a base64 encoded string: ie YTM0NZomIzI2OTsmIzM0NTueYQ
Unnecessary Text blahblahblahblah
Output might look like this:
Output for log file.1:
This is a block of test. This block of text will be repeated over and over again in a log file that will have similar matches. The only unique value in this block of code will be a base64 encoded string: ie YTM0NZomIzI2OTsmIzM0NTueYQ
Matched this block: 5 times.
Thanks in advance.
perl -ne 'BEGIN{$/="This is a block of test. This block of text will be repeated over and over again in a log file that will have similar matches. The only unique value in this block of code will be a base64 encoded string: ie YTM0NZomIzI2OTsmIzM0NTueYQ ==. This will be unique"}END{print "Matched this block: ",$.-!chomp()," times\n"}'
ASKER
Can you please describe your solution in detail? I've never seen the BEGIN and END block. Also, I'm making this part of a larger PERL script so if you can break show an example of a full code block as well, that would be great. Also, each separate log file will have a a number of these blocks of code with different base64 strings so I don't think your solution will work without a regex. Thanks for the help though.
What does the larger Perl script do?
What do the blocks with different base64 strings look like?
Can you give an example of the log files and what you want to do with them?
What do the blocks with different base64 strings look like?
Can you give an example of the log files and what you want to do with them?
ASKER
I've attached a sample for your reference also. Please note that the first 3 blocks of text that I want to match are the same, and the next two are different (they have different base64 strings).
Thanks.
log.txt
Thanks.
log.txt
ASKER
Here is the expected output:
Log file 1 has the following:
First block match:
This is a block of test. This block of text will be repeated over and over again in a log file that will have similar matches. The only unique value in this block of code will be a base64 encoded string: ie YTM0NZomIzI2OTsmIzM0NTueYQ ==.
This match was found: 3 times.
The Base64 string found in this match is: YTM0NZomIzI2OTsmIzM0NTueYQ ==
Second block match:
This is a block of test. This block of text will be repeated over and over again in a log file that will have similar matches. The only unique value in this block of code will be a base64 encoded string: ie YTMWZWEF@JIXzTWEEFSDXQWEff =.
This match was found: 2 times
The Base64 string found in this match is: YTMWZWEF@JIXzTWEEFSDXQWEff =
Log file 1 has the following:
First block match:
This is a block of test. This block of text will be repeated over and over again in a log file that will have similar matches. The only unique value in this block of code will be a base64 encoded string: ie YTM0NZomIzI2OTsmIzM0NTueYQ
This match was found: 3 times.
The Base64 string found in this match is: YTM0NZomIzI2OTsmIzM0NTueYQ
Second block match:
This is a block of test. This block of text will be repeated over and over again in a log file that will have similar matches. The only unique value in this block of code will be a base64 encoded string: ie YTMWZWEF@JIXzTWEEFSDXQWEff
This match was found: 2 times
The Base64 string found in this match is: YTMWZWEF@JIXzTWEEFSDXQWEff
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.