Link to home
Start Free TrialLog in
Avatar of M DXYZ
M DXYZFlag for United States of America

asked on

bash system event monitoring

Hi I need assistance creating a bash script that will grep system events and if it finds some other than OK (between brackets) it will send an email out. Please find below the output.

18: Fan1 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
19: Fan2 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
20: Fan3 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
21: Fan4 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
22: Fan5 (Fan): 4100.00 RPM (300.00/NA): [OK]
23: Fan6 (Fan): 4100.00 RPM (300.00/NA): [OK]
24: Fan7/CPU1 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
25: Fan8/CPU2 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
26: Intrusion (Platform Chassis Intrusion): [General Chassis Intrusion]
27: Power Supply (Power Supply): [OK]
28: CPU0 Internal E (Module/Board): [OK]
29: CPU1 Internal E (Module/Board): [OK]
30: CPU Overheat (Module/Board): [OK]
31: Thermal Trip0 (Module/Board): [OK]
32: Thermal Trip1 (Module/Board): [OK]
Avatar of M DXYZ
M DXYZ
Flag of United States of America image

ASKER

Please keep in mind that there are some values that I need to ignore for ie lines 18-21, those values do not work.

THANX
Avatar of Tintin
Tintin

grep -Ev '[OK]|Lower Non-Recoverable Threshold' system-log | mail -s "System events" some@user
Avatar of M DXYZ

ASKER

Hi, there is the output for my script

9: CPU1 Vcore (Voltage): 1.28 V (0.69/1.63): [OK]
10: CPU2 Vcore (Voltage): 1.30 V (0.69/1.63): [OK]
11: 3.3V (Voltage): 3.36 V (2.93/3.66): [OK]
12: 5V (Voltage): 4.94 V (4.44/5.54): [OK]
13: 12V (Voltage): 11.81 V (10.56/13.44): [OK]
14: -12V (Voltage): -12.30 V (-10.60/-13.40): [OK]
15: 1.5V (Voltage): 1.52 V (1.31/1.68): [OK]
16: 5VSB (Voltage): 4.97 V (4.44/5.54): [OK]
26: Intrusion (Platform Chassis Intrusion): [General Chassis Intrusion]
27: Power Supply (Power Supply): [OK]
28: CPU0 Internal E (Module/Board): [OK]
29: CPU1 Internal E (Module/Board): [OK]
30: CPU Overheat (Module/Board): [OK]
31: Thermal Trip0 (Module/Board): [OK]
32: Thermal Trip1 (Module/Board): [OK]
17: VBAT (Voltage): 3.25 V (2.93/3.66): [OK]
22: Fan5 (Fan): 4100.00 RPM (300.00/NA): [OK]
23: Fan6 (Fan): 4000.00 RPM (300.00/NA): [OK]

Now I would like to have the script grep for what is inside the brackets, if any other thing than OK appears, then it would generate a message and it will be sent to my email. Also it would be great to allow me to do exceptions such us the General Chassis Instrusion

#!/bin/bash
ADMINS=admin@myemail.com
OUT=/root/sensors.out
OUT1=/root/sensors1.out
 
ipmi-sensors > $OUT
 
cat $OUT|grep -Ev '[NA]' > $OUT1
cat $OUT|grep 17: >> $OUT1
cat $OUT|grep 22: >> $OUT1
cat $OUT|grep 23: >> $OUT1
mail -s "System Events" $ADMINS < $OUT1

Open in new window


#!/bin/bash
ADMINS=admin@myemail.com
 
ipmi-sensors | grep -Ev '[OK]|General Chassis Intrusion' | mail -s "System Events" $ADMINS

Open in new window

Avatar of M DXYZ

ASKER

Hi, What I am trying to do is to search for values other than OK within the brackets.

Regards,

Michael
That's exactly what my script does.

It will email any line that *doesn't* contain '[OK]' or 'General Chassis Intrusion'

To add other exclusions, just add a

|string

to the grep.
Avatar of M DXYZ

ASKER

ok, now since I will be quering the system and the output is the following:


4: CPU Temp 1 (Temperature): 34.00 C (NA/78.00): [OK]
5: CPU Temp 2 (Temperature): 31.00 C (NA/78.00): [OK]
6: CPU Temp 3 (Temperature): NA (NA/78.00): [NA]
7: CPU Temp 4 (Temperature): NA (NA/78.00): [NA]
8: Sys Temp (Temperature): 36.00 C (NA/78.00): [OK]
9: CPU1 Vcore (Voltage): 1.28 V (0.69/1.63): [OK]
10: CPU2 Vcore (Voltage): 1.30 V (0.69/1.63): [OK]
11: 3.3V (Voltage): 3.36 V (2.93/3.66): [OK]
12: 5V (Voltage): 4.94 V (4.44/5.54): [OK]
13: 12V (Voltage): 11.71 V (10.56/13.44): [OK]
14: -12V (Voltage): -12.10 V (-10.60/-13.40): [OK]
15: 1.5V (Voltage): 1.50 V (1.31/1.68): [OK]
16: 5VSB (Voltage): 4.97 V (4.44/5.54): [OK]
17: VBAT (Voltage): 3.25 V (2.93/3.66): [OK]
18: Fan1 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
19: Fan2 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
20: Fan3 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
21: Fan4 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
22: Fan5 (Fan): 4100.00 RPM (300.00/NA): [OK]
23: Fan6 (Fan): 4000.00 RPM (300.00/NA): [OK]
24: Fan7/CPU1 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
25: Fan8/CPU2 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
26: Intrusion (Platform Chassis Intrusion): [General Chassis Intrusion]
27: Power Supply (Power Supply): [OK]
28: CPU0 Internal E (Module/Board): [OK]
29: CPU1 Internal E (Module/Board): [OK]
30: CPU Overheat (Module/Board): [OK]
31: Thermal Trip0 (Module/Board): [OK]
32: Thermal Trip1 (Module/Board): [OK]



How would you address the problem in case of errors.

I would really appreciate your input.

Regards,


Michael
>How would you address the problem in case of errors.

It depends on what the error is.  You still need a human to read the error and determine what action is needed, eg: a fan needing replacement.

Is that what you meant?
Avatar of M DXYZ

ASKER

I do not believe you understood what I was asking for, let me explain it to you, the purpose of scripting is to have a system alert the system administrator if there is a failure within the actual system. Now since I was getting a raw output I had to look for exceptions in order to have the system report when an error or a message other than OK it will send out an email.

Thank you for your assistance

#!/bin/bash
ADMINS=myemail@gmail.com
OUT=/root/sensors.out
OUT1=/root/sensors1.out
OUT2=/root/sensors2.out
 
if [ -f $OUT ];then
  rm -f $OUT
fi
 
if [ -f $OUT1 ];then
  rm -f $OUT1
fi
 
if [ -f $OUT2 ];then
  rm -f $OUT2
fi
 
ipmi-sensors > $OUT
 
#cat $OUT|grep -Ev '[OK]|Lower Non-Recoverable Threshold' | mail -s "System even
ts" $ADMINS
cat $OUT|grep -Ev '[NA]' > $OUT1
cat $OUT|grep 17: >> $OUT1
cat $OUT|grep 22: >> $OUT1
cat $OUT|grep 23: >> $OUT1
cat $OUT1| grep -Ev "OK|General Chassis Intrusion" > $OUT2
 
mail -s "System Events" $ADMINS < $OUT2

Open in new window

I don't quite understand.  Based on your last script, you want to

Exclude lines with

[NA]
[OK]

But include lines with

17:
22:
23:

Is that correct?
Avatar of M DXYZ

ASKER

the script I got works for me, I had to narrow down the output in order to have the system report any other can you assist me to narrow down the code of the following please:

if [ -f $OUT ];then
  rm -f $OUT
fi
 
if [ -f $OUT1 ];then
  rm -f $OUT1
fi
 
if [ -f $OUT2 ];then
  rm -f $OUT2
fi

ASKER CERTIFIED SOLUTION
Avatar of Tintin
Tintin

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial