Solved

Website post-mortem

Posted on 2009-07-03
4
323 Views
Last Modified: 2013-11-08
I have had a very bad experience that I would like to learn from:

I built a CentOS 5 server running apache, mysql, and drupal for an election application for use at a convention that had about 300 voters.  I built it on a moderately strong PC and attached it to a gigabit switch.  Users connected to the server via a couple of Cisco access points that I put on both sides of a large room.

The users were able to attach to the wireless networks without issue.  The machine acted as a DHCP server without issue also.  Once they tried to log into the drupal system, the server stopped responding to any and all network requests including my attempts to SSH into the machine.

Basically I am sure that the system was far from able to handle the requests from all those users at once.  I am looking for theories on where I went wrong.

Did I grossly underestimate the computer hardware?
Did I blunder by putting all of the services on a single machine?
Was a vanilla install of Apache, Mysql and Drupal in need of major tuning?
How can I stress test such a system to know what it is capable of?

I appreciate your input.  It was a very bad day.
0
Comment
Question by:chronolith
  • 2
4 Comments
 
LVL 12

Accepted Solution

by:
kevin_u earned 250 total points
ID: 24775540
Based on what you said, it didn't seem grossly under-sized.

Tuning could have been an issue, but I have successfully dealt with 1500 posts per minute on an untuned box without loosing control of it, and the users hardly noticed. (i had forgotten to tune it.. quick deploy).

Did you regain control of the machine without a reboot?   Its possible it was a wi-fi overload. 11mb @ 300 users 2 ap's?  Depends on the size(s) of the pages I guess.

You might want to consider that you had malicious attack.  Someone might have decided to vote with a script.  What hints do the logs provide?

Simple stress testing would simply involve scripting the site access and posts using something as simple as curl, or as complicated as a commercial web app testing suite.

Again, I'd be looking at the logs for hints.

0
 
LVL 30

Assisted Solution

by:Kerem ERSOY
Kerem ERSOY earned 250 total points
ID: 24779859
Hi,

I also think that this is not an issue with undersized hardware. IT is not nornal for any system to stop responding. Especially not with 30 users. I myself is the sysdamin for a local Linux-Users-Society and we have more than 1500 members and 2 VCPU's Xen system can handle it quite easily. We have more that 5 lists and lots of tracs Wiki's and several voting areas.

There should be a hardware problem such as RAM or Hard-disk malfunction. Other than that your account of what has happened is not consistent with a system having problems under load.
0
 

Author Comment

by:chronolith
ID: 24787201
I was not able to regain any kind of control and I had to power the thing down.

As for the "server", it was not a server grade machine at all.  Just a mid-level PC.  I was tasked with building this thing without spending money.

I tend to not think it was any kind of malicious attack, this particular community is just not capable of these sorts of things.  In fact they were not even exposed to the system until minutes before the voting was to take place.

The Cisco AP's were both set for G and B.  Right now that is my favorite theory.

My review of the logs did not show anything too terrible apart from the fact that I did not set the apache Max Clients option higher than the vanilla 150 setting.  In that case I would expect to see maybe half of the users getting refused but not all of them.

Should I perhaps think about getting two machines and clustering them?
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 24789730
I don't think you need to cluster machines. To do this you need to make sure that you have reached very high transactions per second. It seems to me that there was a software or hardware failure in your setup.

To verify this I'll suggest you to setup your computer again. I'll suggest you to get sysstat package through yum and use sar -q and iosatat commands to verify that this is a load problem.
0

Featured Post

NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In my business, I use the LTS (Long Term Support) versions of Linux. My workstations do real work, and so I rarely have the patience to deal with silly problems caused by an upgraded kernel that had experimental software on it to begin with from a r…
This article will explain how to establish a SSH connection to Ubuntu through the firewall and using a different port other then 22. I have set up a Ubuntu virtual machine in Virtualbox and I am running a Windows 7 workstation. From the Ubuntu vi…
This video shows how to use Hyena, from SystemTools Software, to bulk import 100 user accounts from an external text file. View in 1080p for best video quality.
In an interesting question (https://www.experts-exchange.com/questions/29008360/) here at Experts Exchange, a member asked how to split a single image into multiple images. The primary usage for this is to place many photographs on a flatbed scanner…

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question