wordswithfriends
asked on
How can I diagnose my log file to work out why my server load was 9.6?
My Apache web server died so I ran the following command
killall -9 sys-snap.sh
And then restarted httpd. My log file is here:
http://dl.dropbox.com/u/12337149/30.log
I can't make much sense of it I still don't understand why the server died
killall -9 sys-snap.sh
And then restarted httpd. My log file is here:
http://dl.dropbox.com/u/12337149/30.log
I can't make much sense of it I still don't understand why the server died
ASKER
It doesn't sound normal although I only have a very passing knowledge of how the daemon is supposed to work. Let's say the user opens a web page that runs a SELECT query that takes a few seconds. In this case would the server open and quickly close a MySQL demon? If this is true, then 876 hours would be very unusual
On the other hand if there is a single mysql that sits in the background and processes all queries then maybe it's possible. Running
ps -aux | grep mysql
I get
[root@wor system-snapshot]# ps -aux | grep mysql
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2. 7/FAQ
root 22151 0.0 0.1 11932 1416 ? S Jan31 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/my sql.sock --log-error=/var/log/mysql d.log --pid-file=/var/run/mysqld /mysqld.pi d --user=mysql
mysql 22229 3.1 2.3 164164 18560 ? Sl Jan31 922:49 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld /mysqld.pi d --skip-external-locking --socket=/var/lib/mysql/my sql.sock
It looks like it is still going. What is normal behavior?
On the other hand if there is a single mysql that sits in the background and processes all queries then maybe it's possible. Running
ps -aux | grep mysql
I get
[root@wor system-snapshot]# ps -aux | grep mysql
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.
root 22151 0.0 0.1 11932 1416 ? S Jan31 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/my
mysql 22229 3.1 2.3 164164 18560 ? Sl Jan31 922:49 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld
It looks like it is still going. What is normal behavior?
Usually it's normal to have the MySQL daemon running for that long (or even much more) if you have a running DB on the server.
Apparently there's nothing wrong in the log you posted. You had free memory, the sum of %CPU in all the processes is below 10-15% cpu power. What is missing is disk space info, so you might want to check /tmp or /var free space.
Apparently there's nothing wrong in the log you posted. You had free memory, the sum of %CPU in all the processes is below 10-15% cpu power. What is missing is disk space info, so you might want to check /tmp or /var free space.
ASKER
heaps of disk space
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vzfs 30G 1.8G 29G 6% /
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vzfs 30G 1.8G 29G 6% /
How long has the system been up? (uptime)
There are not enough hours from Jan 31 to today to account for the 960hours reflected.
Look at the /var/log/httpd/error_log to see if there is a clue to why it crashed. Check if you have a core dump in the web directory
find /path/to/web -name 'core'
If you have, you could analyze the core dump in an attempt to determine the cause of apach's crash.
You could setup a cron to collect data iostat/vmstat/memstat/etc. this way should apache crash again, you will have a trend of data leading up to it.
Depending on what you have i.e. php, etc., you may not have allocated enough resources.
The apache log should have a clue as to why it crashed.
There are not enough hours from Jan 31 to today to account for the 960hours reflected.
Look at the /var/log/httpd/error_log to see if there is a clue to why it crashed. Check if you have a core dump in the web directory
find /path/to/web -name 'core'
If you have, you could analyze the core dump in an attempt to determine the cause of apach's crash.
You could setup a cron to collect data iostat/vmstat/memstat/etc.
Depending on what you have i.e. php, etc., you may not have allocated enough resources.
The apache log should have a clue as to why it crashed.
ASKER
uptime
23:07:55 up 21 days, 5:21, 2 users, load average: 0.82, 0.35, 0.23
SIGTERM was me restarting Apache. Doesn't seem like anything interesting before then.
[Sun Feb 20 06:06:46 2011] [client 127.0.0.1] File does not exist: /var/www/html/whm-server-s tatus
[Sun Feb 20 06:07:55 2011] [client 127.0.0.1] File does not exist: /var/www/html/whm-server-s tatus
[Sun Feb 20 06:09:04 2011] [client 127.0.0.1] File does not exist: /var/www/html/whm-server-s tatus
[Sun Feb 20 06:10:14 2011] [client 127.0.0.1] File does not exist: /var/www/html/whm-server-s tatus
[Sun Feb 20 06:11:41 2011] [client 127.0.0.1] File does not exist: /var/www/html/whm-server-s tatus
[Sun Feb 20 06:12:50 2011] [client 127.0.0.1] File does not exist: /var/www/html/whm-server-s tatus
[Sun Feb 20 06:21:55 2011] [notice] caught SIGTERM, shutting down
find /var/www -name 'core' doesn't return any results.
Could you give more details about the cron job?
23:07:55 up 21 days, 5:21, 2 users, load average: 0.82, 0.35, 0.23
SIGTERM was me restarting Apache. Doesn't seem like anything interesting before then.
[Sun Feb 20 06:06:46 2011] [client 127.0.0.1] File does not exist: /var/www/html/whm-server-s
[Sun Feb 20 06:07:55 2011] [client 127.0.0.1] File does not exist: /var/www/html/whm-server-s
[Sun Feb 20 06:09:04 2011] [client 127.0.0.1] File does not exist: /var/www/html/whm-server-s
[Sun Feb 20 06:10:14 2011] [client 127.0.0.1] File does not exist: /var/www/html/whm-server-s
[Sun Feb 20 06:11:41 2011] [client 127.0.0.1] File does not exist: /var/www/html/whm-server-s
[Sun Feb 20 06:12:50 2011] [client 127.0.0.1] File does not exist: /var/www/html/whm-server-s
[Sun Feb 20 06:21:55 2011] [notice] caught SIGTERM, shutting down
find /var/www -name 'core' doesn't return any results.
Could you give more details about the cron job?
You can setup a cron job for a script that runs vmstat 5 5, iostat 5 5
top -n 1
etc.
If you have another linux/unix box on which you can setup cacti or set it up on this one and by enabling snmp on the web/mysql/mail/courier,etc . system you can have cacti collect the data as well which will then be represented in graphical term CPU, memory, HD, and there are application templates for apache.
Should it get stuck in the same way, use strace -f -p <pid_of_apache_parent> to see what it is doing.
You may have allocated too few children or your system was experiencing a DoS attack.
Check the access_log to see the number of queries it was getting per second i.e. was there a spike in the number of requests per second it was recording in the log.
The time stamp is in unix time format (epoch number of elapsed seconds since 1/1/1970)
another option is to tabulate how many requests were being seen from the same source.
top -n 1
etc.
If you have another linux/unix box on which you can setup cacti or set it up on this one and by enabling snmp on the web/mysql/mail/courier,etc
Should it get stuck in the same way, use strace -f -p <pid_of_apache_parent> to see what it is doing.
You may have allocated too few children or your system was experiencing a DoS attack.
Check the access_log to see the number of queries it was getting per second i.e. was there a spike in the number of requests per second it was recording in the log.
The time stamp is in unix time format (epoch number of elapsed seconds since 1/1/1970)
another option is to tabulate how many requests were being seen from the same source.
ASKER
This was too complicated for me to follow
To determine the underlying cause, you need to collect data such that when the issue reoccurs, you can look at the collected data to see whether there is something there that can explain the situation.
ASKER
Understood. But the following instructions
You can setup a cron job for a script that runs vmstat 5 5, iostat 5 5
Whilst may becorrect is not particularly easy to follow
You can setup a cron job for a script that runs vmstat 5 5, iostat 5 5
Whilst may becorrect is not particularly easy to follow
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
mysql 22229 3.0 2.3 164164 18496 ? Sl Jan31 876:49 \_ /usr/libexec/mysqld