Link to home
Start Free TrialLog in
Avatar of Chris Andrews
Chris AndrewsFlag for United States of America

asked on

troubleshooting sh script

I don't know much about shell scripts, but I would like to run one that I found as a means of troubleshooting some high server loads.

When I run the one below, I get:

/etc/load-process-mon.sh: line 19: [: too many arguments

How can I fix this?

Thank you for any assistance,

Chris
#!/bin/bash

# Define Variables
DT=$(date +"%A %b %e %r")
HOSTNAME= 'hostname'

# Create dir to store data
mkdir -p /opt/loadcheck/

# Retrieve the load average of the past 1 minute
LAVG="uptime | awk {'print $10}' | cut -d. -f1"
LCURRENT="uptime | awk {'print $10,$11,$12}'"

# Define Threshold. This value will be compared with the current load average. Set the value as per your wish.
LIMIT=-1

# Compare the current load average with Threshold and email the server administrator if threshold is greater.

if [ $LAVG -gt $LIMIT ]
then

#Save the current running processes in a file
/bin/ps -auxf >> /opt/ps_output

echo "Current Time :: $DT." >> /tmp/loadmon.txt
echo "Current Load Average :: $LCURRENT." >> /tmp/loadmon.txt
echo "current processes list attached with the email 1 instance." >> /tmp/loadmon.txt
echo "Also check loadps.txt :: loadtop.txt :: netstat_all.txt :: netstat_port80.txt inside /opt/loadcheck/ on the server." >> /tmp/loadmon.txt
# Send email to support
/usr/bin/mutt -s "Server Load ALERT!!! High 1 minute load average on '$HOSTNAME'" -a /opt/ps_output support@somedomain.com > /opt/ps_output

echo "Current Time :: $DT" >> /tmp/loadmon.txt
echo "Current Load Average :: $LCURRENT" >> /tmp/loadmon.txt
echo "current processes list attached with the email 1 instance" >> /tmp/loadmon.txt
echo "Also check loadps.txt :: loadtop.txt :: netstat_all.txt :: netstat_port80.txt inside /opt/loadcheck/ on the server" >> /tmp/loadmon.txt
# Send email to support
/usr/bin/mutt -s " Server Load ALERT ::: High 1 minute load average on '$HOSTNAME' " -a /opt/ps_output chris@andrews.com > /opt/loadcheck/loadps.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadps.txt
/bin/top -c -n1 >> /opt/loadcheck/loadtop.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadtop.txt
/bin/netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_all.txt
echo "#########################################################################################################################" >> /opt/loadcheck/netstat_all.txt
/bin/netstat -alntp | grep :80 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_port80.txt
echo "#########################################################################################################################" >> /opt/loadcheck/netstat_port80.txt

/bin/ps -auxf >> /opt/loadcheck/loadps.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadps.txt
/bin/top -c -n1 >> /opt/loadcheck/loadtop.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadtop.txt
/bin/netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_all.txt
echo "#########################################################################################################################" >> /opt/loadcheck/netstat_all.txt
/bin/netstat -alntp | grep :80 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_port80.txt
echo "#########################################################################################################################" >> /opt/loadcheck/netstat_port80.txt

/bin/ps -auxf >> /opt/loadcheck/loadps.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadps.txt
/bin/top -c -n1 >> /opt/loadcheck/loadtop.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadtop.txt
/bin/netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_all.txt
echo "#########################################################################################################################" >> /opt/loadcheck/netstat_all.txt
/bin/netstat -alntp | grep :80 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_port80.txt
echo "#########################################################################################################################" >> /opt/loadcheck/netstat_port80.txt

/bin/ps -auxf >> /opt/loadcheck/loadps.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadps.txt
/bin/top -c -n1 >> /opt/loadcheck/loadtop.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadtop.txt
/bin/netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_all.txt
echo "#########################################################################################################################" >> /opt/loadcheck/netstat_all.txt
/bin/netstat -alntp | grep :80 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_port80.txt
echo "#########################################################################################################################" >> /opt/loadcheck/netstat_port80.txt

/bin/ps -auxf >> /opt/loadcheck/loadps.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadps.txt
/bin/top -c -n1 >> /opt/loadcheck/loadtop.txt
echo "#########################################################################################################################" >> /opt/loadcheck/loadtop.txt
/bin/netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_all.txt
echo "#########################################################################################################################" >> /opt/loadcheck/netstat_all.txt
/bin/netstat -alntp | grep :80 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n >> /opt/loadcheck/netstat_port80.txt
echo ” ######################################################################################################################### ” >> /opt/loadcheck/netstat_port80.txt

fi

# Remove residue logs
/bin/rm -f /tmp/loadmon.txt
/bin/rm -f /opt/ps_output

Open in new window

Avatar of woolmilkporc
woolmilkporc
Flag of Germany image

You're using the wrong quote characters here:

HOSTNAME= 'hostname'  #Wrong!
LAVG="uptime | awk {'print $10}' | cut -d. -f1" #Wrong!
LCURRENT="uptime | awk {'print $10,$11,$12}'" #Wrong!

It must look like this (backticks, "accent grave"):

HOSTNAME=`hostname`
LAVG=`uptime | awk {'print $10}' | cut -d. -f1`
LCURRENT=`uptime | awk {'print $10,$11,$12}'`

or like this (POSIX notation):

LAVG=$(uptime | awk {'print $10}' | cut -d. -f1)
HOSTNAME=$(hostname)
LCURRENT=$(uptime | awk {'print $10,$11,$12}')


I'd recommend the last version! It's best readable and the risk hitting the wrong keys by mistake is rather low!
There is also a typo in the awk statement in the line starting with "LCURRENT= ..."

Corrected:

LCURRENT=`uptime | awk '{print $10,$11,$12}'`
or
LCURRENT=$(uptime | awk '{print $10,$11,$12}')

i.e. the single quote before { , not behind!

Sorry, didn't see it at first sight (copy-and paste ...)
The same typo in the line starting with "LAVG=..."!

Seems I'm reading (and typing) too fast today.
One other thing is that the length of "uptime" output varies.  On two of my machines I have

 17:47:47 up  2:08,  1 user,  load average: 0.18, 0.17, 0.15

and

 17:47:53 up 45 days,  5:05,  2 users,  load average: 0.01, 0.12, 0.11

so field 10 is the 15-minute load on one, and the 1-minute load on the other.  Change your two lines to:

LAVG=$(uptime | awk '{print $(NF-2)}' | cut -d. -f1)
LCURRENT=$(uptime | awk '{print $(NF-2),$(NF-1),$NF}')
Avatar of Chris Andrews

ASKER

Ok, with your changes wmp, I know get:

Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.7/FAQ

doesn't say what line or anything, so I don't know how to figure that out...

Simon, did it work for you as is?
The error seems to come from "ps" or "top"

Does "ps -auxf" work from the commandline?
Does "top" -c -n1" work from the commandline?

Normally it should! I have procps-3.2.6 here (you have 3.2.7) and the above commands work.

wmp


And isn't it /usr/bin/top instead of /bin/top?
Yes, both those work from the command line
oh, wait a minute, just noticed ps -auxf  said:

Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.7/FAQ

before it gave me the list.
So you should rewrite the ps and top statements.
It seems that the minus signs are not what they appear to be.
ASKER CERTIFIED SOLUTION
Avatar of woolmilkporc
woolmilkporc
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
"And isn't it /usr/bin/top instead of /bin/top?"

Yes, just checked and changed that.
Changed to ps auxf in the script, but still getting the same:

Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.7/FAQ

I tried the netstat -ntu commands and they are not issuing a warning by command line.
Does

ps auxf

work from commandline?
Yes, without any warnings.
Did you change all occurrences of "ps -auxf" to "ps auxf"?

If you did, I fear I'll soon run out of ideas ...
Ah, oh, now I feel foolish - no, I missed some. I'm sorry.

Trying now, hang on...
Ah, ok, that fixed the warning.

Now, the instructions with this script say to run it by cron...

bin/sh /etc/load-process-mon.sh

If I do that right now with the command line, I enter the above line, press enter, and my ssh client bumps the curser down to the next line and nothing else happens. It never brings me back to the:

-bash-3.2#

prompt.

Do I need  something more at the end of the script to finish this out? Or is the script having a fatal failure somewhere still?

If this isn't something like a simple 'add this to the end' answer, I'll start this as a whole new question. In the meantime I will award points here, thank you,

Chris
This could come from line 30 (the mutt command).

It might be waiting for you to enter a body text.

Try

date | /usr/bin/mutt -s "Server Load ALERT!!! High 1 minute load average on '$HOSTNAME'" -a /opt/ps_output support@somedomain.com > /opt/ps_output

This will write the date/time string to the mail body, so mutt does no longer have to wait.

You could also reverse the last ">" to "<". This will put the content of /opt/ps_output into the body instead of emptying this file:

/usr/bin/mutt -s "Server Load ALERT!!! High 1 minute load average on '$HOSTNAME'" -a /opt/ps_output support@somedomain.com < /opt/ps_output

wmp
Thank you! Working now.

Oh, now I have a whole bunch of details on a high load and I don't know what they mean :)  THAT is for another question!

Thanks WMP, appreciate you sticking with it here,

Chris