LAMP problem identifier tool ?

Posted on 2016-07-21
Last Modified: 2016-08-08
Hello  ! We have the following LAMP-stack

Linux: Debian 7.10
Apache:       Apache/2.2.22
MySQL:  5.5.49
PHP: PHP Version 5.4.45-0+deb7u2

HW is fairly new. (HP Proliant 320e Gen8, v2). Not brand new, but not old. We have 4GB RAM.
From time to time we have a load-average of around 100. (average: 2-4 weeks, but no time-pattern behind it)
The server ist completely unresponsible. Not even login is possible anymore in most cases.
If sometimes possible, I login and stop Apache. When I start again, everything works fine.
The whole restart-process takes about 20-30 minutes.
The server is running a Joomla driven website.
There are no traces at all in the logfiles (/var/log/messages, apache/php logfile, mysql-logfile).
We are running linux-scripts right now doing ps aux to a special log-file to get any further.
But in general. These kind of errors are torturing me, since you don't have the possibility
to identify the problematic php-script in a moderate time-frame. I would love to have a tool to identify
the PHP-script, which is driving the server insane.
Any ideas ?
Question by:Uwe Degenhardt
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
  • +1
LVL 35

Assisted Solution

gr8gonzo earned 250 total points
ID: 41724027
This expert suggested creating a Gigs project.
You could potentially run xdebug and generate a cachegrind file, but that would be a pretty desperate move.

My thoughts:
1. Ensure the Apache access logs include the variable that records how long it took to serve the request. After enabling it, wait a day and then look for any requests that are taking a long time to execute or aren't returning a 200 result.

2. Ensure Apache's server status module is enabled and available so you can check on the requests from time to time and see if there's any that are hanging.

3. Watch the disk space (df -h is the command) and look for any disks that are close to being full. Email the output to yourself every hour or log it to a file with a date/time.

4. Enable the MySQL slow query log and set the threshold to 3 seconds (which is a LONG time for most queries). Pay attention to any queries that show up in that log.

5. Make sure PHP is outputting its own error log and check it for any recent entries. Even things that might seem okay might be contributing to the problem.

6. Use "find [DIR] -mtime -1" on the command line each night and email yourself the results so you can see which files are changing each day in a particular folder (and it's subfolders). For example:
"find /var/log -mtime -1" would show you all log files modified in the last 24 hours. Run it on all major folders that might contain changes (/tmp, /var/log, your doc root, etc). You'll especially want to run this after you experience the problem.

7. Use this to get a rough count of how many Apache child processes are running at any given time:
ps aux | grep -ci "httpd"
Email the output of that to yourself every hour or concatenate it to a log file with the date/time.

8. Log or email the output of this command to yourself every hour:
netstat -anvep
(That shows all the network sockets, what programs are using them, etc)

9. If none of the above yields any good information, create a new Gig here and pay an expert to help out.
LVL 19

Expert Comment

ID: 41724218
I generally use atop (and pretty much all the above), atop logs the performance data every 5 mins so I can trace back when problems start and identify rogue processes.

Author Comment

by:Uwe Degenhardt
ID: 41724803
Good idea. But it should store data every minute, otherwise it is too slow. Thank you Jonathan also for your comments. I will try 2-3 of them. The other ideas either are not fitting my needs or I do use already.
Save the day with this special offer from ATEN!

Save 30% on the CV211 using promo code EXPERTS30 now through April 30th. The ATEN CV211 connects a laptop directly to any server allowing you instant access to perform data maintenance and local operations, for quick troubleshooting, updating, service and repair.

LVL 19

Assisted Solution

jools earned 125 total points
ID: 41724829
atop is configurable to log data how often you need but I'd only log that level while you are looking into the issue or you add to the overhead.
LVL 35

Assisted Solution

gr8gonzo earned 250 total points
ID: 41724925
Frequency is up to you. You said it happens "from time to time", which usually means that it's not happening multiple times a day.

You usually don't want to log so much data that it becomes too difficult to sort through and recognize patterns or problems. For example, let's say you log disk space every minute. Chances are it's not going to change drastically every minute, or even every hour. So if you have 1,440 logs to review for a 24 hour period instead of just 24, then you have a higher risk of missing a critical change among all of the data.

There are a lot of logging areas to review, so increasing the frequency across all of them will exponentially increase the amount of data you have to review.

So usually the first pass at troubleshooting is to try and identify the general problem area, and THEN make the logging more frequent in that area if necessary.

One more thing - I would recommend logging the output of the "du -sh /path/to/your/database", too, to identify any major changes in table sizes that might indicate unusual activity.

Author Comment

by:Uwe Degenhardt
ID: 41725589
ok. I think I should explain it a bit further here.
The incidents come suddenly. Within seconds. The load average jumps to high values.
About 100. The machine is totally unstable/unusable afterwards. I/O traffic is high. It happens all
2-3 weeks but without any pattern (cron or so)
But the logging engine can't log any unusual activity any more. Also the memory gets short. And swap is being filled. But again, you can not do anything on the engine anymore. Even to stop apache takes 20-30 minutes. So there is no way to use Apache with mode status anymore. The mod_status engine is implemented and is sitting there for months. My question was more, if there is a general tool to monitor this kind of incidents. Right now we are having a batch file monitoring ps aux (CPU-load and I/O consumption) only if we jump over a certain threshold. This keeps the logging files small and and handy.
LVL 40

Assisted Solution

noci earned 125 total points
ID: 41725960
In addition: Pay attention to RAM usage. Your machine may very well be starved on RAM and start swapping itself out of business.
Also check the "wait" time spent. That is time a CPU is idle waiting for a disk to respond to IO.
Logging too much may very well add insult to the injury wrt. to this.

Another tool that may help is collectd..... do pay attention to it's setup though, be sure that you DON't log on the system that you investigate but send the data over a network connection to a nearby server. (Otherwise you WILL trash your IO system). Also be sure to configure the RRD cache on that nearby server. CDP can bu used to view the graphs from the nearby server.

Other advise is that IMHO (& experience) Apache just loves to eat RAM and may run into trouble more often, where NGINX or LIGHTTPD can handle a lot more on top of  that load.   YMMV here though.
LVL 35

Accepted Solution

gr8gonzo earned 250 total points
ID: 41726036
So based on those symptoms, it sounds more like there might be some compromised service on your system. Either you're getting hit with DDOS attacks (which would be unusual - Joomla sites aren't usually high-value targets), or more likely, your server is serving up something you don't expect.

I saw a similar issue on a dedicated server one time and I was brought in to consult on the issue and it turned out that a hacker had set up a hidden FTP server process (it was a Windows box) and was using the server to host illegal/ripped movies for public download. So when people were downloading the contents, it would drain the server resources pretty quickly.

If the problem happens suddenly, it might be a similar situation but possibly a P2P scenario instead. P2P sharing (especially after a new torrent or similar gets shared) can result in hundreds of connections in seconds that can overload the server.

Netstat is going to be your best friend here. It sounds like the machine is usable enough to restart the Apache service, so it should be usable enough to run the netstat command I provided. That might point to the problem scenario.

Now, that said, the fact that the problem goes away after restarting Apache points to a few other possibilities:

1. You might not have reasonable limits on Apache's child processes. Some people will just boost the max child processes up to some big number in the config and let it be, but that's a really bad idea. Look at your average amount of CONCURRENT traffic and base your max workers off of double that number. It's better to have people queue up in the server for a second or two than to have the whole server go down because Apache is doing too much.

2. There might be a bad/recursive web call. I've seen some poorly-written AJAX calls that end up calling themselves again and they don't stop. This can also happen with poorly-written mod_rewrite statements that send browsers into a loop (some browsers have mechanisms to stop the looping, but not all of them do). All it takes is one or two people to hit a situation like that and end up hammering your server accidentally. That said, between the concurrent client connection limits and anti-loop mechanisms in MOST browsers, it's more likely that a server-side script would end up hammering the server - something that doesn't have to render the page but can just send out endless requests.

3. It pays to have a firewall in place that can filter out some unwanted traffic. If you are able to, I'd highly recommend putting something like IPCop in front of the server. It can help filter out some basic DDOS attacks and other unwanted traffic, and block access to ports that you don't want to be publicly-accessible (e.g. I use IP whitelists to allow traffic to SSH ports and so on, while my normal web ports 80 and 443 are publicly-accessible). If you can't use a separate device, then at the very least learn and implement iptables on the same server. It's not as efficient as a separate device, but it's the same underlying mechanism (distros like IPCop and pfSense and such are all glorified wrappers around the iptables firewall). Just be careful you don't lock yourself out, so experiment with your rules locally before you implement them on a remote server (or ensure you have a way to access the terminal through a remote KVM - something that doesn't rely on network access to the box itself).

4. There's a smaller chance that you could just have a bad build of Apache (or one of its modules) that occasionally sends the system into a panic. Often a kernel panic or something of similar severity will render the system unusable, so you can't even do anything without a full reboot, so it's probably not that, but I'm just throwing it out there in case you installed Apache or any of its modules through any channels that aren't 100% official (there are some repositories out there for packages that some people will use in order to gain bleeding-edge releases of software instead of waiting for the official packages to be released, but that can lead to problems). You'd probably see stuff in dmesg or messages or syslog log files if this were the problem, though.

Finally, while I do agree with noci's comments that Apache tends to be RAM-hungry (especially if it has a lot of compiled-in modules, which will bump up the memory usage of each child), the symptoms don't really sound like that's the problem. I am a fan of Nginx in some situations - particularly for systems that need to serve up LOTS of static content. However, I think Apache's history and availability make it easier to support. You'll find more people familiar with Apache, more blogs and articles online about specific Apache configurations, etc... Plus, there's still a lot of content out there that (unfortunately) depends on mod_rewrite (which is Apache-specific) to function. I can't recall if Joomla depends on it or not, but if so, then you'll face some trouble getting it to work on a non-Apache server.

Author Closing Comment

by:Uwe Degenhardt
ID: 41747535
There is no real solution, but thank you everybody for your help.

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

These days, all we hear about hacktivists took down so and so websites and retrieved thousands of user’s data. One of the techniques to get unauthorized access to database is by performing SQL injection. This article is quite lengthy which gives bas…
Google Drive is extremely cheap offsite storage, and it's even possible to get extra storage for free for two years.  You can use the free account 15GB, and if you have an Android device..when you install Google Drive for the first time it will give…
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

735 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question