M DXYZ
asked on
bash system event monitoring
Hi I need assistance creating a bash script that will grep system events and if it finds some other than OK (between brackets) it will send an email out. Please find below the output.
18: Fan1 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
19: Fan2 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
20: Fan3 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
21: Fan4 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
22: Fan5 (Fan): 4100.00 RPM (300.00/NA): [OK]
23: Fan6 (Fan): 4100.00 RPM (300.00/NA): [OK]
24: Fan7/CPU1 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
25: Fan8/CPU2 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
26: Intrusion (Platform Chassis Intrusion): [General Chassis Intrusion]
27: Power Supply (Power Supply): [OK]
28: CPU0 Internal E (Module/Board): [OK]
29: CPU1 Internal E (Module/Board): [OK]
30: CPU Overheat (Module/Board): [OK]
31: Thermal Trip0 (Module/Board): [OK]
32: Thermal Trip1 (Module/Board): [OK]
18: Fan1 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
19: Fan2 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
20: Fan3 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
21: Fan4 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
22: Fan5 (Fan): 4100.00 RPM (300.00/NA): [OK]
23: Fan6 (Fan): 4100.00 RPM (300.00/NA): [OK]
24: Fan7/CPU1 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
25: Fan8/CPU2 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
26: Intrusion (Platform Chassis Intrusion): [General Chassis Intrusion]
27: Power Supply (Power Supply): [OK]
28: CPU0 Internal E (Module/Board): [OK]
29: CPU1 Internal E (Module/Board): [OK]
30: CPU Overheat (Module/Board): [OK]
31: Thermal Trip0 (Module/Board): [OK]
32: Thermal Trip1 (Module/Board): [OK]
grep -Ev '[OK]|Lower Non-Recoverable Threshold' system-log | mail -s "System events" some@user
ASKER
Hi, there is the output for my script
9: CPU1 Vcore (Voltage): 1.28 V (0.69/1.63): [OK]
10: CPU2 Vcore (Voltage): 1.30 V (0.69/1.63): [OK]
11: 3.3V (Voltage): 3.36 V (2.93/3.66): [OK]
12: 5V (Voltage): 4.94 V (4.44/5.54): [OK]
13: 12V (Voltage): 11.81 V (10.56/13.44): [OK]
14: -12V (Voltage): -12.30 V (-10.60/-13.40): [OK]
15: 1.5V (Voltage): 1.52 V (1.31/1.68): [OK]
16: 5VSB (Voltage): 4.97 V (4.44/5.54): [OK]
26: Intrusion (Platform Chassis Intrusion): [General Chassis Intrusion]
27: Power Supply (Power Supply): [OK]
28: CPU0 Internal E (Module/Board): [OK]
29: CPU1 Internal E (Module/Board): [OK]
30: CPU Overheat (Module/Board): [OK]
31: Thermal Trip0 (Module/Board): [OK]
32: Thermal Trip1 (Module/Board): [OK]
17: VBAT (Voltage): 3.25 V (2.93/3.66): [OK]
22: Fan5 (Fan): 4100.00 RPM (300.00/NA): [OK]
23: Fan6 (Fan): 4000.00 RPM (300.00/NA): [OK]
Now I would like to have the script grep for what is inside the brackets, if any other thing than OK appears, then it would generate a message and it will be sent to my email. Also it would be great to allow me to do exceptions such us the General Chassis Instrusion
9: CPU1 Vcore (Voltage): 1.28 V (0.69/1.63): [OK]
10: CPU2 Vcore (Voltage): 1.30 V (0.69/1.63): [OK]
11: 3.3V (Voltage): 3.36 V (2.93/3.66): [OK]
12: 5V (Voltage): 4.94 V (4.44/5.54): [OK]
13: 12V (Voltage): 11.81 V (10.56/13.44): [OK]
14: -12V (Voltage): -12.30 V (-10.60/-13.40): [OK]
15: 1.5V (Voltage): 1.52 V (1.31/1.68): [OK]
16: 5VSB (Voltage): 4.97 V (4.44/5.54): [OK]
26: Intrusion (Platform Chassis Intrusion): [General Chassis Intrusion]
27: Power Supply (Power Supply): [OK]
28: CPU0 Internal E (Module/Board): [OK]
29: CPU1 Internal E (Module/Board): [OK]
30: CPU Overheat (Module/Board): [OK]
31: Thermal Trip0 (Module/Board): [OK]
32: Thermal Trip1 (Module/Board): [OK]
17: VBAT (Voltage): 3.25 V (2.93/3.66): [OK]
22: Fan5 (Fan): 4100.00 RPM (300.00/NA): [OK]
23: Fan6 (Fan): 4000.00 RPM (300.00/NA): [OK]
Now I would like to have the script grep for what is inside the brackets, if any other thing than OK appears, then it would generate a message and it will be sent to my email. Also it would be great to allow me to do exceptions such us the General Chassis Instrusion
#!/bin/bash
ADMINS=admin@myemail.com
OUT=/root/sensors.out
OUT1=/root/sensors1.out
ipmi-sensors > $OUT
cat $OUT|grep -Ev '[NA]' > $OUT1
cat $OUT|grep 17: >> $OUT1
cat $OUT|grep 22: >> $OUT1
cat $OUT|grep 23: >> $OUT1
mail -s "System Events" $ADMINS < $OUT1
#!/bin/bash
ADMINS=admin@myemail.com
ipmi-sensors | grep -Ev '[OK]|General Chassis Intrusion' | mail -s "System Events" $ADMINS
ASKER
Hi, What I am trying to do is to search for values other than OK within the brackets.
Regards,
Michael
Regards,
Michael
That's exactly what my script does.
It will email any line that *doesn't* contain '[OK]' or 'General Chassis Intrusion'
To add other exclusions, just add a
|string
to the grep.
It will email any line that *doesn't* contain '[OK]' or 'General Chassis Intrusion'
To add other exclusions, just add a
|string
to the grep.
ASKER
ok, now since I will be quering the system and the output is the following:
4: CPU Temp 1 (Temperature): 34.00 C (NA/78.00): [OK]
5: CPU Temp 2 (Temperature): 31.00 C (NA/78.00): [OK]
6: CPU Temp 3 (Temperature): NA (NA/78.00): [NA]
7: CPU Temp 4 (Temperature): NA (NA/78.00): [NA]
8: Sys Temp (Temperature): 36.00 C (NA/78.00): [OK]
9: CPU1 Vcore (Voltage): 1.28 V (0.69/1.63): [OK]
10: CPU2 Vcore (Voltage): 1.30 V (0.69/1.63): [OK]
11: 3.3V (Voltage): 3.36 V (2.93/3.66): [OK]
12: 5V (Voltage): 4.94 V (4.44/5.54): [OK]
13: 12V (Voltage): 11.71 V (10.56/13.44): [OK]
14: -12V (Voltage): -12.10 V (-10.60/-13.40): [OK]
15: 1.5V (Voltage): 1.50 V (1.31/1.68): [OK]
16: 5VSB (Voltage): 4.97 V (4.44/5.54): [OK]
17: VBAT (Voltage): 3.25 V (2.93/3.66): [OK]
18: Fan1 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
19: Fan2 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
20: Fan3 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
21: Fan4 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
22: Fan5 (Fan): 4100.00 RPM (300.00/NA): [OK]
23: Fan6 (Fan): 4000.00 RPM (300.00/NA): [OK]
24: Fan7/CPU1 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
25: Fan8/CPU2 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
26: Intrusion (Platform Chassis Intrusion): [General Chassis Intrusion]
27: Power Supply (Power Supply): [OK]
28: CPU0 Internal E (Module/Board): [OK]
29: CPU1 Internal E (Module/Board): [OK]
30: CPU Overheat (Module/Board): [OK]
31: Thermal Trip0 (Module/Board): [OK]
32: Thermal Trip1 (Module/Board): [OK]
How would you address the problem in case of errors.
I would really appreciate your input.
Regards,
Michael
4: CPU Temp 1 (Temperature): 34.00 C (NA/78.00): [OK]
5: CPU Temp 2 (Temperature): 31.00 C (NA/78.00): [OK]
6: CPU Temp 3 (Temperature): NA (NA/78.00): [NA]
7: CPU Temp 4 (Temperature): NA (NA/78.00): [NA]
8: Sys Temp (Temperature): 36.00 C (NA/78.00): [OK]
9: CPU1 Vcore (Voltage): 1.28 V (0.69/1.63): [OK]
10: CPU2 Vcore (Voltage): 1.30 V (0.69/1.63): [OK]
11: 3.3V (Voltage): 3.36 V (2.93/3.66): [OK]
12: 5V (Voltage): 4.94 V (4.44/5.54): [OK]
13: 12V (Voltage): 11.71 V (10.56/13.44): [OK]
14: -12V (Voltage): -12.10 V (-10.60/-13.40): [OK]
15: 1.5V (Voltage): 1.50 V (1.31/1.68): [OK]
16: 5VSB (Voltage): 4.97 V (4.44/5.54): [OK]
17: VBAT (Voltage): 3.25 V (2.93/3.66): [OK]
18: Fan1 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
19: Fan2 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
20: Fan3 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
21: Fan4 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
22: Fan5 (Fan): 4100.00 RPM (300.00/NA): [OK]
23: Fan6 (Fan): 4000.00 RPM (300.00/NA): [OK]
24: Fan7/CPU1 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
25: Fan8/CPU2 (Fan): 0.00 RPM (300.00/NA): [At or Below (<=) Lower Non-Recoverable Threshold]
26: Intrusion (Platform Chassis Intrusion): [General Chassis Intrusion]
27: Power Supply (Power Supply): [OK]
28: CPU0 Internal E (Module/Board): [OK]
29: CPU1 Internal E (Module/Board): [OK]
30: CPU Overheat (Module/Board): [OK]
31: Thermal Trip0 (Module/Board): [OK]
32: Thermal Trip1 (Module/Board): [OK]
How would you address the problem in case of errors.
I would really appreciate your input.
Regards,
Michael
>How would you address the problem in case of errors.
It depends on what the error is. You still need a human to read the error and determine what action is needed, eg: a fan needing replacement.
Is that what you meant?
It depends on what the error is. You still need a human to read the error and determine what action is needed, eg: a fan needing replacement.
Is that what you meant?
ASKER
I do not believe you understood what I was asking for, let me explain it to you, the purpose of scripting is to have a system alert the system administrator if there is a failure within the actual system. Now since I was getting a raw output I had to look for exceptions in order to have the system report when an error or a message other than OK it will send out an email.
Thank you for your assistance
Thank you for your assistance
#!/bin/bash
ADMINS=myemail@gmail.com
OUT=/root/sensors.out
OUT1=/root/sensors1.out
OUT2=/root/sensors2.out
if [ -f $OUT ];then
rm -f $OUT
fi
if [ -f $OUT1 ];then
rm -f $OUT1
fi
if [ -f $OUT2 ];then
rm -f $OUT2
fi
ipmi-sensors > $OUT
#cat $OUT|grep -Ev '[OK]|Lower Non-Recoverable Threshold' | mail -s "System even
ts" $ADMINS
cat $OUT|grep -Ev '[NA]' > $OUT1
cat $OUT|grep 17: >> $OUT1
cat $OUT|grep 22: >> $OUT1
cat $OUT|grep 23: >> $OUT1
cat $OUT1| grep -Ev "OK|General Chassis Intrusion" > $OUT2
mail -s "System Events" $ADMINS < $OUT2
I don't quite understand. Based on your last script, you want to
Exclude lines with
[NA]
[OK]
But include lines with
17:
22:
23:
Is that correct?
Exclude lines with
[NA]
[OK]
But include lines with
17:
22:
23:
Is that correct?
ASKER
the script I got works for me, I had to narrow down the output in order to have the system report any other can you assist me to narrow down the code of the following please:
if [ -f $OUT ];then
rm -f $OUT
fi
if [ -f $OUT1 ];then
rm -f $OUT1
fi
if [ -f $OUT2 ];then
rm -f $OUT2
fi
if [ -f $OUT ];then
rm -f $OUT
fi
if [ -f $OUT1 ];then
rm -f $OUT1
fi
if [ -f $OUT2 ];then
rm -f $OUT2
fi
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
THANX