Link to home
Start Free TrialLog in
Avatar of wesly_chen
wesly_chenFlag for United States of America

asked on

Parse data file with 5 lines in a group separated by "--"

Here is my original question
https://www.experts-exchange.com/questions/27020984/Parse-the-grep-A-4-output-and-strip-out-all-5-lines-if-contain-keyword.html

My data file (in code area) are 5 lines in a group and separated by "--"
I would like to parse this data file:
1. If any 5 lines group contains "OperationTimeout" or "invalid signature", then strip out this 5 lines group.
2. If the LAST group  contain less than 5 lines, then strip out.

The answer I got doesn't fully meet my second criteria because my sample file doen't include less than 5 lines group in the middle. Some of my error message group contain less than 5 lines. But I just want to take out the LAST group.
Also no "--" at the bottom of file.

So the final result for this sample code is:
--------------
2011-05-03 21:05:48,019 Thread-6320:[/opt/opinmind/clogs/bidder-cweb/poster/error/577682209-1304456666114.csv] -- Failure occured in the component: auctionCweb-poster
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
--
2011-05-12 02:30:34,027 Thread-12313:[/opt/opinmind/clogs/bidder-cweb/poster/input/auction_contextweb_netezza_ny-1251465803-1305167366115.csv] -- Failure occured in the component: auctionCweb-poster
org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 10000 ms
        at org.apache.createSocket(ReflectionSocketFactory.java:154)
2011-05-03 21:05:48,019 [ERROR] Thread-6320:[/opt/opinmind/clogs/bidder-cweb/poster/error/577682209-1304456666114.csv] -- Failure occured in the component: auctionCweb-poster
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
--
2011-05-04 17:58:01,756 [ERROR] http-8080-Processor7 -- Error handling bid request [adUnit=728x90,foldCount=0,cookieId=79270535,ipAddress=198.203.177.177,language=en,hashedVisitorGuid=9geiaUQQgnkv0SiMCsn_Pg,impressionGuid=prIrAIjJ9PRf,url=http://www.azcentral.com/ent/celeb/articles/2011/05/04/20110504rob-lowe-inadvertently-flew-911-terrorist-dry-run-flight.html,referUrl=http://www.az.com/ent/celeb/articles/2011/05/04/20110504rob-lowe-inadv...,tagId=81462,userTzOffsetMinutes=null,userAgent=WINDOWS-FIREFOX,hashedVisitorGuid=9geiaUQQgnkv0SiMCsn_Pg,userVisitCount=13,webPageKeyWords=phoenix|news|flight|more|rob lowe|tv|celebrity|home|cars|dining|email|show|th...]
net.spy.memcached.OperationTimeoutException: Timeout waiting for value
        at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:853)
        at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:868)
        at com.opinmind.common.cache.SpyMemcached.get(SpyMemcached.java:111)
--
2011-05-04 18:58:51,889 [ERROR] http-8080-Processor143 -- Error handling notify request
java.lang.RuntimeException: java.lang.RuntimeException: Invalid signature
        at com.opinmind.bidder.cweb.common.MacroEncryptionUtil.decrypt(MacroEncryptionUtil.java:68)
        at com.opinmind.bidder.cweb.dataobjects.CwebBidResponseNotifyInfoFactory.getNotifyInfo(CwebBidResponseNotifyInfoFactory.java:40)
        at com.opinmind.bidder.cweb.action.CwebBidResponseNotifyRequestActionImpl.handleNotifyRequest(CwebBidResponseNotifyRequestActionImpl.java:23)
--
2011-05-12 02:30:34,027 [ERROR] Thread-12313:[/opt/opinmind/clogs/bidder-cweb/poster/input/auction_contextweb_netezza_ny-1251465803-1305167366115.csv] -- Failure occured in the component: auctionCweb-poster
org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 10000 ms
        at org.apache.createSocket(ReflectionSocketFactory.java:154)
--

2011-05-04 18:58:51,901 [ERROR] http-8080-Processor168 -- Error handling notify request
java.lang.RuntimeException: java.lang.RuntimeException: Invalid signature
        at com.opinmind.bidder.cweb.common.MacroEncryptionUtil.decrypt(MacroEncryptionUtil.java:68)
        at com.opinmind.bidder.cweb.dataobjects.CwebBidResponseNotifyInfoFactory.getNotifyInfo(CwebBidResponseNotifyInfoFactory.java:40)
        at com.opinmind.bidder.cweb.action.CwebBidResponseNotifyRequestActionImpl.handleNotifyRequest(CwebBidResponseNotifyRequestActionImpl.java:23)
--
2011-05-04 21:42:03,263 [ERROR] http-8080-Processor169 -- Error handling bid request [adUnit=160x600,foldCount=0,cookieId=108632106,ipAddress=76.95.84.45,language=en,hashedVisitorGuid=UTqCkstu8K-BAKODxyNQ3w,impressionGuid=IKebWgxOZHdm,url=http://www.theybf.com/2011/05/04/alicia-keys-performs-in-torontobeyonces-run-the-world-girls-video-preview,referUrl=http://www.ybf.com/2011/05/04/alicia-keys-performs-in-torontobeyonces-run-...,tagId=95135,userTzOffsetMinutes=null,userAgent=WINDOWS-IE,hashedVisitorGuid=UTqCkstu8K-BAKODxyNQ3w,userVisitCount=75,webPageKeyWords=e|video|baby|beyonce|love|world|toronto|alicia keys|more|basketball|concert|e...]

Open in new window

Avatar of wilcoxon
wilcoxon
Flag of United States of America image

What should be done with a group larger than 5 lines (lines 23-28)?  Or is this just a copy-paste error in your sample file?

Why are lines 19-21 not included in your output?  It is less than 5 lines but is not the last group and does not contain either "OperationTimeout" or "invalid signature".

When you say "strip out" do you mean remove from the input file or just don't print it on output?

How large is the input file?  Can I assume it will fit into memory (and read the whole file in)?
Avatar of wesly_chen

ASKER

> What should be done with a group larger than 5 lines (lines 23-28)?
Good catch.

I should re-phrase my question.
It should be grouped by (I use "egrep -A 4 \[ERROR  application.log").
So if the orignial application.log contain
----------
2011-05-04 17:58:01,756 message1
 liine2-1
 line3-1
2011-05-04 17:59:01,756 message 2
 line2-2
2011-05-04 17:59:21,756 message 3
 line2-3
 line3-3
----------
Then the output file of "egrep -A 5 \[ERROR  application.log" will remain the same
without "--" to separate the
So the separator should be the line contain
> Why are lines 19-21 not included in your output?
It should be in my expected output as my post. The script I got previously script line 19-21

> When you say "strip out" do you mean remove from the input file or just don't print it on output?
either way, I prefer remove from the input file.

> How large is the input file?
Usually less than 30 lines. For extreme case like lost connection to database, then there will be around 2000 lines.
Hi Wesly,

Sorry for delay in responding to this, and for the problem with my first solution.  I guess that also highlights the need for test data which tests all the main scenarios - but I'm not complaining.

Does this do what you require:
    export EXCLUDE_STRING=$1
    perl -0ne '$d="\n--\n";@F=split $d;for(@F[0..$#F-1]){print "$_$d" if $_ !~ /$ENV{EXCLUDE_STRING}/};END{print "@F[$#F]\n" if @F[$#F] =~ /(\n.*){3}/ and @F[$#F] !~ /$ENV{EXCLUDE_STRING}/}' data.in

Notes:
- If the last group is stripped out, this solution will print "--" at the bottom of the file.
- This solution still matches on "--", not on "".
If either of the above are a problem, let me know.
- You should be able to change your grep from:
  egrep -A 5 "\[ERROR" application.log
to:
  grep -A5 "\[ERROR" application.log
if you like.  egrep is not required in this case (I'm not sure if grep would be faster).
If you want further changes, it might be better to work directly from application.log, rather than from grepped output.  This might depend on the size of application.log and performance requirements though, as grep is probably faster.
Sorry for taking so long to write a solution.  This should handle all of your criteria (remove from original file, not relying on --, etc)...

Here's the script:
#!/usr/local/bin/perl

use strict;
use warnings;
use Tie::File;

# setup regex here - make invalid signature case-insensitive
my $rx = qr(OperationTimeout|(?i:invalid\s+signature));

my $fil = shift or die "Usage: $0 input_file\n";
tie my @file, 'Tie::File', $fil or die "could not tie $fil: $!";
my $start = 0;
$start++ while ($file[$start] !~ m{\[ERROR\]});
my $end = scalar @file;

while ($start < $end) {
    print "checking line $start\n";
    # find block and figure out if we should remove it
    my $skip = ($file[$start] =~ m{$rx}) ? 1 : 0;
    my $len = 1;
    while ($start+$len < $end) {
        last if ($file[$start+$len] =~ m{\[ERROR\]});
        $skip++ if ($file[$start+$len] =~ m{$rx});
        $len++;
    }
    $skip++ if ($start+$len >= $end and $len < 5);
    # remove it if we should
    if ($skip) {
        print "removing block at $start for $len\n";
        splice @file, $start, $len;
        $end = scalar @file;
    } else {
        # don't advance if we just removed a bunch of lines
    	$start += $len;
    }
}

Open in new window


Given this input (same as yours but with the -- lines removed per your latest comment):
 
2011-05-03 21:05:48,019 [ERROR] Thread-6320:[/opt/opinmind/clogs/bidder-cweb/poster/error/577682209-1304456666114.csv] -- Failure occured in the component: auctionCweb-poster
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
2011-05-04 17:58:01,756 [ERROR] http-8080-Processor7 -- Error handling bid request [adUnit=728x90,foldCount=0,cookieId=79270535,ipAddress=198.203.177.177,language=en,hashedVisitorGuid=9geiaUQQgnkv0SiMCsn_Pg,impressionGuid=prIrAIjJ9PRf,url=http://www.azcentral.com/ent/celeb/articles/2011/05/04/20110504rob-lowe-inadvertently-flew-911-terrorist-dry-run-flight.html,referUrl=http://www.az.com/ent/celeb/articles/2011/05/04/20110504rob-lowe-inadv...,tagId=81462,userTzOffsetMinutes=null,userAgent=WINDOWS-FIREFOX,hashedVisitorGuid=9geiaUQQgnkv0SiMCsn_Pg,userVisitCount=13,webPageKeyWords=phoenix|news|flight|more|rob lowe|tv|celebrity|home|cars|dining|email|show|th...]
net.spy.memcached.OperationTimeoutException: Timeout waiting for value
        at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:853)
        at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:868)
        at com.opinmind.common.cache.SpyMemcached.get(SpyMemcached.java:111)
2011-05-04 18:58:51,889 [ERROR] http-8080-Processor143 -- Error handling notify request
java.lang.RuntimeException: java.lang.RuntimeException: Invalid signature
        at com.opinmind.bidder.cweb.common.MacroEncryptionUtil.decrypt(MacroEncryptionUtil.java:68)
        at com.opinmind.bidder.cweb.dataobjects.CwebBidResponseNotifyInfoFactory.getNotifyInfo(CwebBidResponseNotifyInfoFactory.java:40)
        at com.opinmind.bidder.cweb.action.CwebBidResponseNotifyRequestActionImpl.handleNotifyRequest(CwebBidResponseNotifyRequestActionImpl.java:23)
2011-05-12 02:30:34,027 [ERROR] Thread-12313:[/opt/opinmind/clogs/bidder-cweb/poster/input/auction_contextweb_netezza_ny-1251465803-1305167366115.csv] -- Failure occured in the component: auctionCweb-poster
org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 10000 ms
        at org.apache.createSocket(ReflectionSocketFactory.java:154)

2011-05-04 18:58:51,901 [ERROR] http-8080-Processor168 -- Error handling notify request
java.lang.RuntimeException: java.lang.RuntimeException: Invalid signature
        at com.opinmind.bidder.cweb.common.MacroEncryptionUtil.decrypt(MacroEncryptionUtil.java:68)
        at com.opinmind.bidder.cweb.dataobjects.CwebBidResponseNotifyInfoFactory.getNotifyInfo(CwebBidResponseNotifyInfoFactory.java:40)
        at com.opinmind.bidder.cweb.action.CwebBidResponseNotifyRequestActionImpl.handleNotifyRequest(CwebBidResponseNotifyRequestActionImpl.java:23)
2011-05-04 21:42:03,263 [ERROR] http-8080-Processor169 -- Error handling bid request [adUnit=160x600,foldCount=0,cookieId=108632106,ipAddress=76.95.84.45,language=en,hashedVisitorGuid=UTqCkstu8K-BAKODxyNQ3w,impressionGuid=IKebWgxOZHdm,url=http://www.theybf.com/2011/05/04/alicia-keys-performs-in-torontobeyonces-run-the-world-girls-video-preview,referUrl=http://www.ybf.com/2011/05/04/alicia-keys-performs-in-torontobeyonces-run-...,tagId=95135,userTzOffsetMinutes=null,userAgent=WINDOWS-IE,hashedVisitorGuid=UTqCkstu8K-BAKODxyNQ3w,userVisitCount=75,webPageKeyWords=e|video|baby|beyonce|love|world|toronto|alicia keys|more|basketball|concert|e...]

Open in new window


Produces this output:
 
2011-05-03 21:05:48,019 [ERROR] Thread-6320:[/opt/opinmind/clogs/bidder-cweb/poster/error/577682209-1304456666114.csv] -- Failure occured in the component: auctionCweb-poster
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
2011-05-12 02:30:34,027 [ERROR] Thread-12313:[/opt/opinmind/clogs/bidder-cweb/poster/input/auction_contextweb_netezza_ny-1251465803-1305167366115.csv] -- Failure occured in the component: auctionCweb-poster
org.apache.commons.httpclient.ConnectTimeoutException: The host did not accept the connection within timeout of 10000 ms
        at org.apache.createSocket(ReflectionSocketFactory.java:154)

Open in new window

Hi again Wesly,

See code below with minor adjustments to my solution, the only necessary one being to change "{3}" (which I used for testing) to "{5}":
    export EXCLUDE_STRING=$1
    perl -0ne '$d="\n--\n";@g=split $d;for(@g[0..$#g-1]){print "$_$d" if $_ !~ /$ENV{EXCLUDE_STRING}/};END{print "@g[$#g]\n" if @g[$#g] =~ /(\n.*){5}/ and @g[$#g] !~ /$ENV{EXCLUDE_STRING}/}' data.in

The above still replies on "--" being the separator between groups.

Question:
You said:
'...Then the output file of "egrep -A 5 \[ERROR  application.log" will remain the same
without "--" to separate the '
That egrep line looks as if it will put "--" lines between each match.  How are the "--" lines going to be removed, before the data hits the Perl script?
@tel2
The reason for the output of
egrep -A 4 "\[ERROR" application.log
without "--" between is because those error messages are less than 5 lines.

> - If the last group is stripped out, this solution will print "--" at the bottom of the file.
OK

> - This solution still matches on "--", not on "".
OK

The reason I use egrep is originally I tried regexp in the grep pattern
egrep -A 4 "\[ERROR\].*PATTERN2" application.log

Your second post looks good.
Could you explain a little bit. Thanks.
I'm learning perl.
@wilcoxon
I might mislead you with my second post.
https://www.experts-exchange.com/questions/27033326/Parse-data-file-with-5-lines-in-a-group-separated-by.html?anchorAnswerId=35760523#a35760523

The input file did include "--" as separator since it is the output of
grep -A 5 "\[ERROR" java_application.log

However, some of original ERROR messages are less than 5 lines and show up in log file sequentially so after "grep -A 5..." it becomes

2011-05-04 17:58:01,756 message1
 liine2-1
 line3-1
2011-05-04 17:59:01,756 message 2
 line2-2
2011-05-04 17:59:21,756 message 3
 line2-3
 line3-3
--
 2011-05-04 17:58:01,756 message1
  line2
  line3
  line4
  line5
--

Besides, I have a shell script to ssh@remotehost and run "grep -A 4...".
So I prefer for one-line perl script or bash shell script so I don't need to re-write my script
(Nagios plug-in actually).


> it might be better to work directly from application.log, rather than from grepped output.
Each application log file is around 11MB.
I've more than 100 application log files on remote machines need to be parsed within certain period.
Hi Wesly,

I see your comment to wilcoxon:
> However, some of original ERROR messages are less than 5 lines and show up in log file sequentially so after "grep -A 5..." it becomes
...etc...

Q1. Have you tested my one-liner against data like that?
Q2. Are you satisfied that my one-liner meets all your requirements?

Q3. Can you provide some real (or close to real) data, which has the above difference, which we can use as input for our scripts?

Q4. Do you want or need the "--" separator in the output of the Perl script?
Q5. Do you want or need "--" to be inserted between errors which didn't have it between them in the input?

> Could you explain a little bit. Thanks.
Maybe when we've dealt with all the other issues, including those above, otherwise I might end up changing the script and explaining it again.
A1. Yes
A2. So far so good. I will do more thorough test.
A3  Attached in the code
A4. Yes. The separator will be easy to read.
A5. Yes, if it is possible.
2011-04-22 01:10:24,573 [ERROR] Thread-4614:[/opt/opinmind/clogs/bidder-cweb/poster/error/auction_contextweb_netezza_ny-1966572191-1303431539902.csv] -- Failure occured in the component: auctionCweb-poster
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
--
2011-04-22 01:10:24,855 [ERROR] Thread-4616:[/opt/opinmind/clogs/bidder-cweb/poster/error/auction_contextweb_netezza_ny-1971631734-1303434062413.csv] -- Failure occured in the component: auctionCweb-poster
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
    at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
--
2011-04-22 01:10:25,700 [ERROR] Thread-4621:[/opt/opinmind/clogs/bidder-cweb/poster/error/auction_contextweb_netezza_ny-2081880554-1303433182724.csv] -- Failure occured in the component: auctionCweb-poster
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
2011-04-22 01:10:25,701 [ERROR] Thread-4622:[/opt/opinmind/clogs/bidder-cweb/poster/error/auction_contextweb_netezza_ny-2090010045-1303430760335.csv] -- Failure occured in the component: auctionCweb-poster
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
2011-04-22 01:10:25,701 [ERROR] Thread-4622:[/opt/opinmind/clogs/bidder-cweb/poster/error/auction_contextweb_netezza_ny-2090010045-1303430760335.csv] -- Failure occured in the component: auctionCweb-poster
--
2011-04-22 01:30:02,575 [ERROR] http-8080-Processor17 -- Error handling bid request [adUnit=728x90,foldCount=3,cookieId=78022809,ipAddress=114.76.164.168,language=,hashedVisitorGuid=vwSlqxygXzwVsPVWxrjCGw,impressionGuid=7LtzEi3PZaJz,url=http://www.apnicommunity.com/ram-milayi-jodi/389167-ram-milayee-jodi-21st-april-2011-video-update-watch-online-*hq*.html,referUrl=http://www.apnicommunity.com/ram-milayi-jodi/389167-ram-milayee-jodi-21st-apr...,tagId=26450,userTzOffsetMinutes=null,userAgent=WINDOWS-CHROME,hashedVisitorGuid=vwSlqxygXzwVsPVWxrjCGw,userVisitCount=1,webPageKeyWords=2011|ram|online|watch online|video|tv|television|dvd|who|citizen|community|in...]
net.spy.memcached.OperationTimeoutException: Timeout waiting for value
    at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:853)
    at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:868)
    at com.opinmind.common.cache.SpyMemcached.get(SpyMemcached.java:111)
    at com.opinmind.ssc.cache.UserDataCacheImpl.isOptOut(UserDataCacheImpl.java:187)
--
2011-04-23 18:26:49,493 [ERROR] http-8080-Processor11 -- Error handling notify request
java.lang.RuntimeException: java.lang.RuntimeException: Invalid signature
    at com.opinmind.bidder.cweb.common.MacroEncryptionUtil.decrypt(MacroEncryptionUtil.java:68)
    at com.opinmind.bidder.cweb.dataobjects.CwebBidResponseNotifyInfoFactory.getNotifyInfo(CwebBidResponseNotifyInfoFactory.java:40)
    at com.opinmind.bidder.cweb.action.CwebBidResponseNotifyRequestActionImpl.handleNotifyRequest(CwebBidResponseNotifyRequestActionImpl.java:23)
    at com.opinmind.bidder.cweb.web.BidResponseNotifyRequestHandler.handleRequest(BidResponseNotifyRequestHandler.java:48)
--
2011-04-24 04:37:32,918 [ERROR] http-8080-Processor31 -- Error handling bid request [adUnit=728x90,foldCount=1,cookieId=109505386,ipAddress=68.200.154.93,language=en,hashedVisitorGuid=rzHcH8vRDsafaqHCeYhR2w,impressionGuid=tZtGk9X6BUuM,url=http://leitesculinaria.com/73877/giveaways-farm-together-now.html,referUrl=http://leitesculinaria.com/73877/giveaways-farm-together-now.html,tagId=59040,userTzOffsetMinutes=null,userAgent=WINDOWS-IE,hashedVisitorGuid=rzHcH8vRDsafaqHCeYhR2w,userVisitCount=8,webPageKeyWords=food|e|2011|chocolate|post|recipes|cookies|country|home|love|lunch|recipe|tab...]
net.spy.memcached.OperationTimeoutException: Timeout waiting for value
    at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:853)
    at net.spy.memcached.MemcachedClient.get(MemcachedClient.java:868)
    at com.opinmind.common.cache.SpyMemcached.get(SpyMemcached.java:111)
--
2011-04-22 01:10:24,573 [ERROR] Thread-4614:[/opt/opinmind/clogs/bidder-cweb/poster/error/auction_contextweb_netezza_ny-1966572191-1303431539902.csv] -- Failure occured in the component: auctionCweb-poster
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)

Open in new window

Hi Wesly,

I don't think my one-liner will work on that data, because it will see 3 errors as a single group, and will check for the presense of $EXCLUDE_STRING in the whole group, instead of in each error.  Understand?
On that basis, if you want me to rewrite it, then pls say so, but don't hold your breath, coz I don't know if I'll be making time for it.  It might be different if this exception was specified up front.

> ...So I prefer for one-line perl script or bash shell script so I don't need to re-write my script
I don't think this should be an issue.  Your choices include (but are not limited to) any one of these:
- Calling grep from wilcoxon's script, instead of from your shell script.
- Running wilcoxon's script as a here document in your shell script.
- Calling wilcoxon's script (a separate file), from your shell script.
If you still think it's an issue, please explain exactly why.

Thanks.
> will check for the presense of $EXCLUDE_STRING in the whole group, instead of in each error.
Good catch.
So it needs to either separate each group with or
add separator "--" for each group before parse the $EXCLUDE_STRING.

My script, as Nagios plug-in,
- pass "-H hostname -f /path-to-log-file -c critical_threshold -w warn_threshold -s search_string -e exclude_string
  -d time_stamp_format_type -t $LAST_CHECK_TIME" argument
- parse input argument
- Check the log file in remote host via ssh (or scp) with ssh-key every 9 minutes
- Get the previous check time stamp (Nagios built-in argument, in unix time format such as "date +%s")
- Check if the log file time stamp is newer than the previous check time stamp
- grep the ERROR messages with pattern "[ERROR" and the following 4 lines messages if the ERROR messages is longer than 5 lines (Java web application) after previous check time stamp
- count how many matched [ERROR
- If the count greater than critical threshold, then print the "critical" message with the output (This output is the
  what I mentioned here)
- If the count greater than warning threshold, then print  then "warn" message with the output
- echo ok if the log file is older than previous check time stamp or the count is zero.

It will be a big task to re-write into perl. As a plug-in, it is better to be a single file. Calling the external non-system script is error prone for Nagios upgrade, migration.
So does this sound like a good option, Wesly?:
> - Running wilcoxon's script as a here document in your shell script.
> as a here document in your shell script
What does this mean?
The Perl script will be contained within your shell script.  Here's a generalised definition:
  http://en.wikipedia.org/wiki/Here_document
If you're OK with the concept, I expect wilcoxon will be happy to show you how to do it with his script.
OK, I'm very interested in how to contain perl script (not just one-liner) in the shell script.
@wilcoxon
Your script seems ok.
Could it be possible to add "--" to separate  each block?

With the input file at
https://www.experts-exchange.com/questions/27033326/Parse-data-file-with-5-lines-in-a-group-separated-by.html?cid=1575&anchorAnswerId=35774546#a35774546

grep -v "^--" sample.log > sample2.log
<parse>.pl sample2.log

And the result of sample2.log is
2011-04-22 01:10:24,573 [ERROR] Thread-4614:[/opt/opinmind/clogs/bidder-cweb/poster/error/auction_contextweb_netezza_ny-1966572191-1303431539902.csv] -- Failure occured in the component: auctionCweb-poster
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
--
2011-04-22 01:10:24,855 [ERROR] Thread-4616:[/opt/opinmind/clogs/bidder-cweb/poster/error/auction_contextweb_netezza_ny-1971631734-1303434062413.csv] -- Failure occured in the component: auctionCweb-poster
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
    at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
--
2011-04-22 01:10:25,700 [ERROR] Thread-4621:[/opt/opinmind/clogs/bidder-cweb/poster/error/auction_contextweb_netezza_ny-2081880554-1303433182724.csv] -- Failure occured in the component: auctionCweb-poster
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
--
2011-04-22 01:10:25,701 [ERROR] Thread-4622:[/opt/opinmind/clogs/bidder-cweb/poster/error/auction_contextweb_netezza_ny-2090010045-1303430760335.csv] -- Failure occured in the component: auctionCweb-poster
java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
--
2011-04-22 01:10:25,701 [ERROR] Thread-4622:[/opt/opinmind/clogs/bidder-cweb/poster/error/auction_contextweb_netezza_ny-2090010045-1303430760335.csv] -- Failure occured in the component: auctionCweb-poster

Open in new window

Besides, how do I contain your perl code into bash script?
My code will work fine with or without -- in the file.  It uses the lines to separate the blocks and the only place the line count (the only thing that will change with or without --) is the last block which won't have -- anyway.

To embed perl in bash, you need to do:

#!/bin/bash

perl <<'END_PERL'
# place my script here but remove the #! line
END_PERL

The one gotcha is that it no longer works to call it as "parse.sh input_file" (assuming it used to be called as "parse.pl input_file").  I'm not sure how to read the filename from the command line using the embedded perl within bash.  If you can hard-code the filename, just remove the "my $fil = shift" line and replace $fil on the "tie" line with the quoted filename.
Here's a modification of my script to add -- if it isn't already present...
#!/usr/local/bin/perl

use strict;
use warnings;
use Tie::File;

# setup regex here - make invalid signature case-insensitive
my $rx = qr(OperationTimeout|(?i:invalid\s+signature));

my $fil = shift or die "Usage: $0 input_file\n";
tie my @file, 'Tie::File', $fil or die "could not tie $fil: $!";
my $start = 0;
$start++ while ($file[$start] !~ m{\[ERROR\]});
my $end = scalar @file;

while ($start < $end) {
    print "checking line $start\n";
    # find block and figure out if we should remove it
    my $skip = ($file[$start] =~ m{$rx}) ? 1 : 0;
    my $len = 1;
    while ($start+$len < $end) {
        last if ($file[$start+$len] =~ m{\[ERROR\]});
        $skip++ if ($file[$start+$len] =~ m{$rx});
        $len++;
    }
    $skip++ if ($start+$len >= $end and $len < 5);
    # remove it if we should
    if ($skip) {
        print "removing block at $start for $len\n";
        splice @file, $start, $len;
        $end = scalar @file;
    } else {
        # add -- on previous line if not present
        unless ($file[$start-1] =~ m{^--\s*$}) {
            splice @file, $start, 0, '--';
            $len++;
        }
        # don't advance if we just removed a bunch of lines
        $start += $len;
    }
}

Open in new window

Great. It works perfectly.

- Regarding to contain perl code into shell script,

I should replace
> my $fil = shift or die "Usage: $0 input_file\n";
> tie my @file, 'Tie::File', $fil or die "could not tie $fil: $!";
with

export SAMPLE_FILE='/tmp/sample.log'
perl << 'END_PERL'
...
open(file, $ENV{SAMPLE_FILE} )

- Could you give some more description about your perl code?
  I'm learning the perl. Thanks.
Not quite...  Here's the bash version of my latest perl script...
#!/bin/bash

perl <<'END_PERL'
use strict;
use warnings;
use Tie::File;

# setup regex here - make invalid signature case-insensitive
my $rx = qr(OperationTimeout|(?i:invalid\s+signature));

tie my @file, "Tie::File", "input_filename_here" or die "could not tie input file: $!";
my $start = 0;
$start++ while ($file[$start] !~ m{\[ERROR\]});
my $end = scalar @file;

while ($start < $end) {
    print "checking line $start\n";
    # find block and figure out if we should remove it
    my $skip = ($file[$start] =~ m{$rx}) ? 1 : 0;
    my $len = 1;
    while ($start+$len < $end) {
        last if ($file[$start+$len] =~ m{\[ERROR\]});
        $skip++ if ($file[$start+$len] =~ m{$rx});
        $len++;
    }
    $skip++ if ($start+$len >= $end and $len < 5);
    # remove it if we should
    if ($skip) {
        print "removing block at $start for $len\n";
        splice @file, $start, $len;
        $end = scalar @file;
    } else {
        # add -- on previous line if not present
        unless ($file[$start-1] =~ m{^--\s*$}) {
            splice @file, $start, 0, "--";
            $len++;
        }
        # do not advance if we just removed a bunch of lines
        $start += $len;
    }
}
END_PERL

Open in new window

I replaced the code that didn't work with "input_filename_here" - just replace that with the actual filename you want to use.  Unfortunately, this does preclude passing the filename in on the command line (unless you know bash better than I do - I'm not sure how to pass arguments to the "script" inside the bash).


Here's the latest perl script with additional comments that should help explain what it's doing...
 
#!/usr/local/bin/perl

use strict;
use warnings;
use Tie::File;

# setup regex here - make invalid signature case-insensitive
my $rx = qr(OperationTimeout|(?i:invalid\s+signature));

# get the filename from the command line
my $fil = shift or die "Usage: $0 input_file\n";
# tie the input file using the Tie::File module
# this is the easiest way to manipulate a file in-place
tie my @file, 'Tie::File', $fil or die "could not tie $fil: $!";
# set starting line number to 0
my $start = 0;
# "skip" lines until we find a line containing [ERROR]
$start++ while ($file[$start] !~ m{\[ERROR\]});
# set end to the number of lines in the file
my $end = scalar @file;

while ($start < $end) {
    # debug statement just to make it easy to see which line the block it is
    # currently checking starts on
    print "checking line $start\n";
    ## find block and figure out if we should remove it
    # set $skip if the $start line matches our regex
    my $skip = ($file[$start] =~ m{$rx}) ? 1 : 0;
    # set number of lines in current block
    my $len = 1;
    while ($start+$len < $end) {
        # break out of loop if we found the next line with [ERROR]
        last if ($file[$start+$len] =~ m{\[ERROR\]});
        # set $skip if the current line matches our regex
        $skip++ if ($file[$start+$len] =~ m{$rx});
        # increment the length of the current block
        $len++;
    }
    # set $skip if this is the last block and if it is < 5 lines
    $skip++ if ($start+$len >= $end and $len < 5);
    ## remove it if we should
    if ($skip) {
        # debug statement to make it easy to see the details of the block
        # being removed
        print "removing block at $start for $len\n";
        # do the actual removal
        # technically, splice replaces the block with the fourth argument
        # which is undef in this case (so replaces it with nothing)
        splice @file, $start, $len;
        # update our end-of-file line count to match the modified file
        $end = scalar @file;
    } else {
        ## add -- on previous line if not present
        # if the previous line isn't -- then...
        unless ($file[$start-1] =~ m{^--\s*$}) {
            # add a line containing --
            # technically, it replaces a "block" of 0 lines at the current
            # location with --
            splice @file, $start, 0, '--';
            # increment the length of the block to account for the added line
            $len++;
        }
        # advance our position by the length of the current block
        # only happens if we did not remove the current block
        $start += $len;
    }
}

Open in new window

Sorry for the delay with this, guys.  It's been night time on this side of the planet (NZ).

> The one gotcha is that it no longer works to call it as "parse.sh input_file"...
Here's a solution:
    perl - sample2.log <<'END_PERL'   # Or use "$1" instead of "sample2.log"

Wesly, pls note that 'END_PERL' can be almost any text, as long as it doesn't appear in your code on a line by itself.  If you want to indent your code, including the closing END_PERL, then prefix both with the same number of spaces, e.g.:
    perl - sample2.log <<'    END_PERL'
    END_PERL

Also, I expect (but this is a guess), that you can now get rid of the shebang line at the beginning of wilcoxon's code:
    #!/usr/local/bin/perl
If you want to specify a path for Perl, I guess you could do it like this:
    /usr/local/bin/perl - sample2.log <<'END_PERL'
> perl - sample2.log <<'END_PERL'   # Or use "$1" instead of "sample2.log"
----
export SAMPLE_FILE='/tmp/sample.log'
perl - $SAMPLE_FILE << 'END_PERL'
----

This doesn't work.

Besides, I need to pass $EXCLUDE_STRING to
my $rx = qr(OperationTimeout|(?i:invalid\s+signature));
as something
my $rx = qr($ENV{EXCLUDE_STRING});

How to?
SOLUTION
Avatar of tel2
tel2
Flag of New Zealand image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Good idea.  I didn't think about grabbing bash vars from %ENV.  That should work for passing any data into the perl.
So the "final" code (final given all ideas so far) should be...
#!/bin/bash

export EXCLUDE_STRING 'OperationTimeout|(?i:invalid\s+signature)'

perl - /input_file <<'END_PERL'
use strict;
use warnings;
use Tie::File;

# setup regex here - make invalid signature case-insensitive
my $rx = qr($ENV{EXCLUDE_STRING});

my $fil = shift or die "Usage: perl - filename\n";
tie my @file, "Tie::File", $fil or die "could not tie input file $fil: $!";
my $start = 0;
$start++ while ($file[$start] !~ m{\[ERROR\]});
my $end = scalar @file;

while ($start < $end) {
    print "checking line $start\n";
    # find block and figure out if we should remove it
    my $skip = ($file[$start] =~ m{$rx}) ? 1 : 0;
    my $len = 1;
    while ($start+$len < $end) {
        last if ($file[$start+$len] =~ m{\[ERROR\]});
        $skip++ if ($file[$start+$len] =~ m{$rx});
        $len++;
    }
    $skip++ if ($start+$len >= $end and $len < 5);
    # remove it if we should
    if ($skip) {
        print "removing block at $start for $len\n";
        splice @file, $start, $len;
        $end = scalar @file;
    } else {
        # add -- on previous line if not present
        unless ($file[$start-1] =~ m{^--\s*$}) {
            splice @file, $start, 0, "--";
            $len++;
        }
        # do not advance if we just removed a bunch of lines
        $start += $len;
    }
}
END_PERL

Open in new window

Sorry about the comment.

Here is the message I got:
-----------#!/bin/bash -x
+ /usr/bin/perl - /tmp/sample.log
Use of uninitialized value in pattern match (m//) at - line 13, <$fh> line 2.
Use of uninitialized value in pattern match (m//) at - line 13, <$fh> line 2.
Use of uninitialized value in pattern match (m//) at - line 13, <$fh> line 2.
Use of uninitialized value in pattern match (m//) at - line 13, <$fh> line 2.
... (tons of lines and Ctrl-c to break)
---------

I use CentOS 5.x
bash, version 3.2.25(1)-release (x86_64-redhat-linux-gnu)
perl-5.8.8-32.el5_5.2

export SAMPLE_FILE='/tmp/sample.log'
/usr/bin/perl - $LOG_FILE_LOCAL <<'END_PERL'

use strict;
use warnings;
use Tie::File;

my $rx = qr(OperationTimeout|(?i:invalid\s+signature));
#my $rx = qr($ENV{EXCLUDE_STRING});

# tie my @file, "Tie::File", "$ENV{LOG_FILE_LOCAL}" or die "could not tie input file: $!";
my $fil = shift or die "Usage: $0 input_file\n";
tie my @file, 'Tie::File', $fil or die "could not tie $fil: $!";
my $start = 0;
$start++ while ($file[$start] !~ m{\[ERROR\]});
...

Open in new window

Sorry, I pass
/usr/bin/perl - $SAMPLE_FILE <<'END_PERL'
Wesly, did you try my script.sh?

Have you tried wilcoxon's latest script?  You should.

If you can confirm that arguments like this:
    'OperationTimeout|invalid signature'
can be treated case sensitively, and will always contain the same number of spaces, then wilcoxon's script can be simplified.
Great. Both codes in
https://www.experts-exchange.com/questions/27033326/Parse-data-file-with-5-lines-in-a-group-separated-by.html?cid=1575&anchorAnswerId=35781811#a35781811
and
https://www.experts-exchange.com/questions/27033326/Parse-data-file-with-5-lines-in-a-group-separated-by.html?cid=1575&anchorAnswerId=35781906#a35781906
(replaced line 3 with export EXCLUDE_STRING='Ope...)
work.

However, for some issue, I copied and pasted into my script and it keeps getting this error message
------
Use of uninitialized value in pattern match (m//) at - line 11, <$fh> line 2.
------
which point to this line of code
$start++ while ($file[$start] !~ m{\[ERROR\]});

I still can not figure out why it complains about "uninitialized value in pattern match (m//)"?
OK, finally find the root cause for "uninitialized value in pattern match (m//)".
It is because that the file I pass to perl is empty, which is in real world there is no error message in the past 9 minutes.

This bring me the the question, how to add the check in the perl code when
sample.log is empty
or no
string?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Sigh - too quick to submit...

Replace the line:

perl - /input_file <<'END_PERL'

with:

export LOG_FILE '/input_file' # or whatever filename you want
perl - $LOG_FILE <<'END_PERL'
Yes, it works.
Excellent! Thanks for all your help!
> export LOG_FILE '/input_file' # or whatever filename you want
Make that:
    export LOG_FILE='/input_file' # or whatever filename you want


Hi Wesly,
What's your response to my last comment in my previous post, about arguments?
@tel2
> If you can confirm that arguments like this:
>    'OperationTimeout|invalid signature'
I will pass the different $EXCLUDE_STRING for different application log checks.
Within Nagios, the string is better to simple and it is case sensitive and contain exact number of spaces.