Solved

Website post-mortem

Posted on 2009-07-03
4
318 Views
Last Modified: 2013-11-08
I have had a very bad experience that I would like to learn from:

I built a CentOS 5 server running apache, mysql, and drupal for an election application for use at a convention that had about 300 voters.  I built it on a moderately strong PC and attached it to a gigabit switch.  Users connected to the server via a couple of Cisco access points that I put on both sides of a large room.

The users were able to attach to the wireless networks without issue.  The machine acted as a DHCP server without issue also.  Once they tried to log into the drupal system, the server stopped responding to any and all network requests including my attempts to SSH into the machine.

Basically I am sure that the system was far from able to handle the requests from all those users at once.  I am looking for theories on where I went wrong.

Did I grossly underestimate the computer hardware?
Did I blunder by putting all of the services on a single machine?
Was a vanilla install of Apache, Mysql and Drupal in need of major tuning?
How can I stress test such a system to know what it is capable of?

I appreciate your input.  It was a very bad day.
0
Comment
Question by:chronolith
  • 2
4 Comments
 
LVL 12

Accepted Solution

by:
kevin_u earned 250 total points
Comment Utility
Based on what you said, it didn't seem grossly under-sized.

Tuning could have been an issue, but I have successfully dealt with 1500 posts per minute on an untuned box without loosing control of it, and the users hardly noticed. (i had forgotten to tune it.. quick deploy).

Did you regain control of the machine without a reboot?   Its possible it was a wi-fi overload. 11mb @ 300 users 2 ap's?  Depends on the size(s) of the pages I guess.

You might want to consider that you had malicious attack.  Someone might have decided to vote with a script.  What hints do the logs provide?

Simple stress testing would simply involve scripting the site access and posts using something as simple as curl, or as complicated as a commercial web app testing suite.

Again, I'd be looking at the logs for hints.

0
 
LVL 30

Assisted Solution

by:Kerem ERSOY
Kerem ERSOY earned 250 total points
Comment Utility
Hi,

I also think that this is not an issue with undersized hardware. IT is not nornal for any system to stop responding. Especially not with 30 users. I myself is the sysdamin for a local Linux-Users-Society and we have more than 1500 members and 2 VCPU's Xen system can handle it quite easily. We have more that 5 lists and lots of tracs Wiki's and several voting areas.

There should be a hardware problem such as RAM or Hard-disk malfunction. Other than that your account of what has happened is not consistent with a system having problems under load.
0
 

Author Comment

by:chronolith
Comment Utility
I was not able to regain any kind of control and I had to power the thing down.

As for the "server", it was not a server grade machine at all.  Just a mid-level PC.  I was tasked with building this thing without spending money.

I tend to not think it was any kind of malicious attack, this particular community is just not capable of these sorts of things.  In fact they were not even exposed to the system until minutes before the voting was to take place.

The Cisco AP's were both set for G and B.  Right now that is my favorite theory.

My review of the logs did not show anything too terrible apart from the fact that I did not set the apache Max Clients option higher than the vanilla 150 setting.  In that case I would expect to see maybe half of the users getting refused but not all of them.

Should I perhaps think about getting two machines and clustering them?
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
Comment Utility
I don't think you need to cluster machines. To do this you need to make sure that you have reached very high transactions per second. It seems to me that there was a software or hardware failure in your setup.

To verify this I'll suggest you to setup your computer again. I'll suggest you to get sysstat package through yum and use sar -q and iosatat commands to verify that this is a load problem.
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

Hi, in this article I'm going to teach you how to run your own site, and how to let people in (without IP). I'll talk about and explain each step... :) By the way, everything in this Tutorial is completely free and legal. This article is for …
In my business, I use the LTS (Long Term Support) versions of Linux. My workstations do real work, and so I rarely have the patience to deal with silly problems caused by an upgraded kernel that had experimental software on it to begin with from a r…
It is a freely distributed piece of software for such tasks as photo retouching, image composition and image authoring. It works on many operating systems, in many languages.
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now