Chris Andrews
asked on
troubleshooting sh script
I don't know much about shell scripts, but I would like to run one that I found as a means of troubleshooting some high server loads.
When I run the one below, I get:
/etc/load-process-mon.sh: line 19: [: too many arguments
How can I fix this?
Thank you for any assistance,
Chris
When I run the one below, I get:
/etc/load-process-mon.sh: line 19: [: too many arguments
How can I fix this?
Thank you for any assistance,
Chris
#!/bin/bash
# Define Variables
DT=$(date +"%A %b %e %r")
HOSTNAME= 'hostname'
# Create dir to store data
mkdir -p /opt/loadcheck/
# Retrieve the load average of the past 1 minute
LAVG="uptime | awk {'print $10}' | cut -d. -f1"
LCURRENT="uptime | awk {'print $10,$11,$12}'"
# Define Threshold. This value will be compared with the current load average. Set the value as per your wish.
LIMIT=-1
# Compare the current load average with Threshold and email the server administrator if threshold is greater.
if [ $LAVG -gt $LIMIT ]
then
#Save the current running processes in a file
/bin/ps -auxf >> /opt/ps_output
echo "Current Time :: $DT." >> /tmp/loadmon.txt
echo "Current Load Average :: $LCURRENT." >> /tmp/loadmon.txt
echo "current processes list attached with the email 1 instance." >> /tmp/loadmon.txt
echo "Also check loadps.txt :: loadtop.txt :: netstat_all.txt :: netstat_port80.txt inside /opt/loadcheck/ on the server." >> /tmp/loadmon.txt
# Send email to support
/usr/bin/mutt -s "Server Load ALERT!!! High 1 minute load average on '$HOSTNAME'" -a /opt/ps_output support@somedomain.com > /opt/ps_output
echo "Current Time :: $DT" >> /tmp/loadmon.txt
echo "Current Load Average :: $LCURRENT" >> /tmp/loadmon.txt
echo "current processes list attached with the email 1 instance" >> /tmp/loadmon.txt
echo "Also check loadps.txt :: loadtop.txt :: netstat_all.txt :: netstat_port80.txt inside /opt/loadcheck/ on the server" >> /tmp/loadmon.txt
# Send email to support
/usr/bin/mutt -s " Server Load ALERT ::: High 1 minute load average on '$HOSTNAME' " -a /opt/ps_output chris@andrews.com > /opt/loadcheck/loadps.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadps.txt
/bin/top -c -n1 >> /opt/loadcheck/loadtop.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadtop.txt
/bin/netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_all.txt
echo "#########################################################################################################################" >> /opt/loadcheck/netstat_all.txt
/bin/netstat -alntp | grep :80 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_port80.txt
echo "#########################################################################################################################" >> /opt/loadcheck/netstat_port80.txt
/bin/ps -auxf >> /opt/loadcheck/loadps.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadps.txt
/bin/top -c -n1 >> /opt/loadcheck/loadtop.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadtop.txt
/bin/netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_all.txt
echo "#########################################################################################################################" >> /opt/loadcheck/netstat_all.txt
/bin/netstat -alntp | grep :80 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_port80.txt
echo "#########################################################################################################################" >> /opt/loadcheck/netstat_port80.txt
/bin/ps -auxf >> /opt/loadcheck/loadps.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadps.txt
/bin/top -c -n1 >> /opt/loadcheck/loadtop.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadtop.txt
/bin/netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_all.txt
echo "#########################################################################################################################" >> /opt/loadcheck/netstat_all.txt
/bin/netstat -alntp | grep :80 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_port80.txt
echo "#########################################################################################################################" >> /opt/loadcheck/netstat_port80.txt
/bin/ps -auxf >> /opt/loadcheck/loadps.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadps.txt
/bin/top -c -n1 >> /opt/loadcheck/loadtop.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadtop.txt
/bin/netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_all.txt
echo "#########################################################################################################################" >> /opt/loadcheck/netstat_all.txt
/bin/netstat -alntp | grep :80 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_port80.txt
echo "#########################################################################################################################" >> /opt/loadcheck/netstat_port80.txt
/bin/ps -auxf >> /opt/loadcheck/loadps.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadps.txt
/bin/top -c -n1 >> /opt/loadcheck/loadtop.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadtop.txt
/bin/netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_all.txt
echo "#########################################################################################################################" >> /opt/loadcheck/netstat_all.txt
/bin/netstat -alntp | grep :80 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_port80.txt
echo ” ######################################################################################################################### ” >> /opt/loadcheck/netstat_port80.txt
fi
# Remove residue logs
/bin/rm -f /tmp/loadmon.txt
/bin/rm -f /opt/ps_output
There is also a typo in the awk statement in the line starting with "LCURRENT= ..."
Corrected:
LCURRENT=`uptime | awk '{print $10,$11,$12}'`
or
LCURRENT=$(uptime | awk '{print $10,$11,$12}')
i.e. the single quote before { , not behind!
Sorry, didn't see it at first sight (copy-and paste ...)
Corrected:
LCURRENT=`uptime | awk '{print $10,$11,$12}'`
or
LCURRENT=$(uptime | awk '{print $10,$11,$12}')
i.e. the single quote before { , not behind!
Sorry, didn't see it at first sight (copy-and paste ...)
The same typo in the line starting with "LAVG=..."!
Seems I'm reading (and typing) too fast today.
Seems I'm reading (and typing) too fast today.
One other thing is that the length of "uptime" output varies. On two of my machines I have
17:47:47 up 2:08, 1 user, load average: 0.18, 0.17, 0.15
and
17:47:53 up 45 days, 5:05, 2 users, load average: 0.01, 0.12, 0.11
so field 10 is the 15-minute load on one, and the 1-minute load on the other. Change your two lines to:
LAVG=$(uptime | awk '{print $(NF-2)}' | cut -d. -f1)
LCURRENT=$(uptime | awk '{print $(NF-2),$(NF-1),$NF}')
17:47:47 up 2:08, 1 user, load average: 0.18, 0.17, 0.15
and
17:47:53 up 45 days, 5:05, 2 users, load average: 0.01, 0.12, 0.11
so field 10 is the 15-minute load on one, and the 1-minute load on the other. Change your two lines to:
LAVG=$(uptime | awk '{print $(NF-2)}' | cut -d. -f1)
LCURRENT=$(uptime | awk '{print $(NF-2),$(NF-1),$NF}')
ASKER
Ok, with your changes wmp, I know get:
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2. 7/FAQ
doesn't say what line or anything, so I don't know how to figure that out...
Simon, did it work for you as is?
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.
doesn't say what line or anything, so I don't know how to figure that out...
Simon, did it work for you as is?
The error seems to come from "ps" or "top"
Does "ps -auxf" work from the commandline?
Does "top" -c -n1" work from the commandline?
Normally it should! I have procps-3.2.6 here (you have 3.2.7) and the above commands work.
wmp
Does "ps -auxf" work from the commandline?
Does "top" -c -n1" work from the commandline?
Normally it should! I have procps-3.2.6 here (you have 3.2.7) and the above commands work.
wmp
And isn't it /usr/bin/top instead of /bin/top?
ASKER
Yes, both those work from the command line
ASKER
oh, wait a minute, just noticed ps -auxf said:
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2. 7/FAQ
before it gave me the list.
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.
before it gave me the list.
So you should rewrite the ps and top statements.
It seems that the minus signs are not what they appear to be.
It seems that the minus signs are not what they appear to be.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
"And isn't it /usr/bin/top instead of /bin/top?"
Yes, just checked and changed that.
Yes, just checked and changed that.
ASKER
Changed to ps auxf in the script, but still getting the same:
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2. 7/FAQ
I tried the netstat -ntu commands and they are not issuing a warning by command line.
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.
I tried the netstat -ntu commands and they are not issuing a warning by command line.
Does
ps auxf
work from commandline?
ps auxf
work from commandline?
ASKER
Yes, without any warnings.
Did you change all occurrences of "ps -auxf" to "ps auxf"?
If you did, I fear I'll soon run out of ideas ...
If you did, I fear I'll soon run out of ideas ...
ASKER
Ah, oh, now I feel foolish - no, I missed some. I'm sorry.
Trying now, hang on...
Trying now, hang on...
ASKER
Ah, ok, that fixed the warning.
Now, the instructions with this script say to run it by cron...
bin/sh /etc/load-process-mon.sh
If I do that right now with the command line, I enter the above line, press enter, and my ssh client bumps the curser down to the next line and nothing else happens. It never brings me back to the:
-bash-3.2#
prompt.
Do I need something more at the end of the script to finish this out? Or is the script having a fatal failure somewhere still?
If this isn't something like a simple 'add this to the end' answer, I'll start this as a whole new question. In the meantime I will award points here, thank you,
Chris
Now, the instructions with this script say to run it by cron...
bin/sh /etc/load-process-mon.sh
If I do that right now with the command line, I enter the above line, press enter, and my ssh client bumps the curser down to the next line and nothing else happens. It never brings me back to the:
-bash-3.2#
prompt.
Do I need something more at the end of the script to finish this out? Or is the script having a fatal failure somewhere still?
If this isn't something like a simple 'add this to the end' answer, I'll start this as a whole new question. In the meantime I will award points here, thank you,
Chris
This could come from line 30 (the mutt command).
It might be waiting for you to enter a body text.
Try
date | /usr/bin/mutt -s "Server Load ALERT!!! High 1 minute load average on '$HOSTNAME'" -a /opt/ps_output support@somedomain.com > /opt/ps_output
This will write the date/time string to the mail body, so mutt does no longer have to wait.
You could also reverse the last ">" to "<". This will put the content of /opt/ps_output into the body instead of emptying this file:
/usr/bin/mutt -s "Server Load ALERT!!! High 1 minute load average on '$HOSTNAME'" -a /opt/ps_output support@somedomain.com < /opt/ps_output
wmp
It might be waiting for you to enter a body text.
Try
date | /usr/bin/mutt -s "Server Load ALERT!!! High 1 minute load average on '$HOSTNAME'" -a /opt/ps_output support@somedomain.com > /opt/ps_output
This will write the date/time string to the mail body, so mutt does no longer have to wait.
You could also reverse the last ">" to "<". This will put the content of /opt/ps_output into the body instead of emptying this file:
/usr/bin/mutt -s "Server Load ALERT!!! High 1 minute load average on '$HOSTNAME'" -a /opt/ps_output support@somedomain.com < /opt/ps_output
wmp
ASKER
Thank you! Working now.
Oh, now I have a whole bunch of details on a high load and I don't know what they mean :) THAT is for another question!
Thanks WMP, appreciate you sticking with it here,
Chris
Oh, now I have a whole bunch of details on a high load and I don't know what they mean :) THAT is for another question!
Thanks WMP, appreciate you sticking with it here,
Chris
HOSTNAME= 'hostname' #Wrong!
LAVG="uptime | awk {'print $10}' | cut -d. -f1" #Wrong!
LCURRENT="uptime | awk {'print $10,$11,$12}'" #Wrong!
It must look like this (backticks, "accent grave"):
HOSTNAME=`hostname`
LAVG=`uptime | awk {'print $10}' | cut -d. -f1`
LCURRENT=`uptime | awk {'print $10,$11,$12}'`
or like this (POSIX notation):
LAVG=$(uptime | awk {'print $10}' | cut -d. -f1)
HOSTNAME=$(hostname)
LCURRENT=$(uptime | awk {'print $10,$11,$12}')
I'd recommend the last version! It's best readable and the risk hitting the wrong keys by mistake is rather low!