[Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Open
  • Priority: Low
  • Security: Public
  • Views: 40
  • Last Modified:

Server environment performance degradation during load tests

We have a server environment consisting of a load balance (hardware), 3 web servers and a single database server.  Against the environment we have run 3 load tests, scaling up to 600 concurrent users over a 20 minute period.  On each test the response time for the page load begins to degrade at exactly the same point, when the number of concurrent users hits 100.  

Test one was done with two web servers balanced
Test two was done with three web servers balanced
Test three was done with three web servers balanced but with +2 cpu allocated

All three tests have exactly the same result.  I have had the hosting provider for the infrastructure review the network and none of the hardware elements in the route have restrictions or limits on users and the network is handling the load with ease.  The individual web servers are also handling the load, even at peak evenly and without maxing out CPU or memory. Its seems highly irregular that the drop-off point remains identical despite the increase in resource and has lead me to question whether there is a configuration or set-up default value which is reaching its limit (100) within the system.

This is a theory but it posing a major challenge to what should be a robust server set-up and I would appreciate some expert opinion on what could be causing the issue.  In preparation I have already ensured I have New Relic APM data available for all three sessions as well as general hardware monitoring data.
0
James Mclean
Asked:
James Mclean
5 Comments
 
Zaheer IqbalTechnical Assurance & ImplementationCommented:
under the Application Pool IIS there is a Queue settings can't remember it from the top of my head using EE mobile to answer.
0
 
David FavorLinux/LXD/WordPress/Hosting SavantCommented:
Linux provides the ability to nice (CPU) + ionice (Disk) a process to change it's CPU/Disk queuing priority.

Check your OS docs to see if there's a similar way to effect CPU/Disk queuing priority.

Good rule of thumb is avoid load testing on a production machine, unless your entire Technology Stack is tuned very well.

One Ubuntu + LXD + WordPress sites, I run type type of load test every few minutes on all my production servers, to ensure all client sites are running at full speed.

lxd: net11-jasites # time nice -19 h2speed --compact --count=1000 https://net11.ubuntu.zesty.php72.davidfavor.com/hello.php
h2load -ph2c -t16 -c16 -m16 -n16000 https://net11.ubuntu.zesty.php72.davidfavor.com/hello.php
finished in 1.59s, 10084.08 req/s, 1.18MB/s
requests: 16000 total, 16000 started, 16000 done, 16000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 16000 2xx, 0 3xx, 0 4xx, 0 5xx
Requests per second: 10,084.08
Requests per minute: 605,044.8
Requests per hour  : 36,302,688

real	0m1.673s
user	0m0.596s
sys	0m0.656s

Open in new window


The only way this works is if entire Tech Stack is caching content correctly.

So ensuring that a single request correctly caches, so subsequent tests return cached data is vital also.

Where caching includes database + PHP Opcache + mod_rewrite (or IIS equivalent).

I normally run load tests on a zero length .txt file first, because if this runs slow all other load testing is abandoned.

Next is a test of a simple PHP file, like hello.php or similar.

Next is a simple database test - open + SELECT LIMIT 1 + close.

Then if all the three previous tests run at anticipated speed, I run a load test against the actual WordPress sites, as all client sites I host are WordPress.

By taking a stepped approach to load testing, you can ensure (or make a good guess) your load test will run without taking out your production site(s).
0
 
SeanSystem EngineerCommented:
There isn't exactly a single setting that would say 100 users max but there are a number of things you can do to improve performance:
https://docs.microsoft.com/en-us/biztalk/technical-guides/optimizing-iis-performance

Also make sure you aren't maxing out disk I/O. You can throw all the CPU and memory you want but if disk I/O is capped then you'll always be limited by that.
0
 
Dan McFaddenSystems EngineerCommented:
A few additional questions:

1.  What is the site built in?  ASP.NET, PHP, etc...
1a.  What version is in use?
1b.  32bit or 64bit?
2.  What version of Windows Server?
3.  What database product?  MS SQL, MySQL, Postgres, Oracle?
4.  What load balancer is in use?
5.  Does the web site/ web app store session data?
6.  How is the http traffic load balanced?  Round robin, load based, latency based, sticky session?
8.  What is the load on the CPU, RAM on the LB?
7.  What is the load on the CPU, RAM, Disk on the IIS servers?
8.  What is the load on the CPU, RAM, Disk on the database server?

As for your testing methodology... have you tried stressing one of the web servers directly to see if the user issue arises?  By taking out LB out of the equation, you may be able to isolate the issue and focus an a smaller area.

In order to get a better feel for a baseline of your setup, testing the individual components of the system is helpful.  If you know the capacity of the web server, you can determine if the system, as a whole, is meeting or exceeding the expected output.

Dan
0
 
James McleanAuthor Commented:
My thanks to everyone who has submitted a solution to date, they are all helpful in our quest to understand this behaviour.
0

Join & Write a Comment

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now