Link to home
Start Free TrialLog in
Avatar of Rob Crozier
Rob Crozier

asked on

Slow SSL under load (but not when ab testing?) - Apache

We're currently experiencing an issue with our SSL configuration whereby after a short while (after an Apache restart) the ssl handshake seems to take a silly amount of time, ~20-30 seconds!

We have the same certificate on another server with similar setup (albeit older versions) and it's running fine.

Main differences are:

Newer version of apache (2.4.18 on new, problematic server | 2.2.16 on old, performant one)
New server running Ubuntu 16.04 as oppose to Debian 6 on old one
New server allowing TLSv1, TLSv1.1, TLSv1.2 but old server only accepting TLSv1
Different cyphers:

NEW: AES256+EECDH AES256+EDH AES256-SHA AES128-SHA ECDHE-RSA-AES128-SHA ECDHE-RSA-AES256-SHA ECDHE-RSA-AES256-GCM-SHA384 ECDHE-RSA-AES256-SHA384 AES256-GCM-SHA384 AES256-SHA256 ECDHE-RSA-AES128-GCM-SHA256 ECDHE-RSA-AES128-SHA256 AES128-GCM-SHA256 AES128-SHA256

OLD: DES-CBC3-SHA AES256+EDH AES256-SHA AES128-SHA

Key things tried:

Check that we are using /dev/urandom
Check keepalive_timeout (currently 100)
SSLHonorCipherOrder on
SSLHonorCipherOrder on
StartServers 5
ServerLimit 400
MaxClients 400
MinSpareServers 10
MaxSpareServers 50
MaxRequestWorkers 400
MaxConnectionsPerChild 0
using mpm_perfork (got 196G RAM!)
We've been through all of the StackOverflow posts we can find on the problem but nothing seems to resolve it.

The problem only seems to occur under real world load and not when using apache ab to try and simulate load.

Has anyone experiences something similar before? Any obvious pointers?

Thanks in advance for any suggestions!
Avatar of David Favor
David Favor
Flag of United States of America image

To be sure, you'd have to publish your actual URL, so people can test using tools.

Apache-2.4.17 introduced HTTP2 support + completely replumbed a huge amount of the entire SSL code base.

Apache-2.4.18 was the first "." (dot) release after HTTP2 support was rolled in. As such... Apache-2.4.18, to me, is barely ready for prime time deployment.

It works + for any level of stability you must completely disable HTTP2.

As of Apache-2.4.26 (as I recall) is where using mod_http2 + mod_prefork together is deprecated/blocked, because http2 multi-threads so prefork always caused random glitches.

I'd strongly suggest you move to Apache-2.4.27 or Apache-2.4.28 + mod_event + FPM (if you're running PHP), before you start debugging your problem, as your cert likely has nothing to do with what you're seeing. Likely it's a combo of Apache version + OpenSSL version + HTTP2 (maybe).

If the problem persists, enable deep Apache SSL debugging, via...

LogLevel info ssl:trace8

Open in new window


This LogLevel will dump your entire SSL conversations.

All this being said, there's one other area for you to explore.

If your... SSL related process (entire content flow) contains many DNS references, the problem you're seeing may be related to systemd-resolved, which is one of the worst excuses of code I've every seen.

You make no mention of your runtime environment + if you're using Apache-2.4.18, you're likely hobbled by a systemd based OS.

The problem with systemd-resolved is... well... it's just broken... This code incorrectly caches DNS requests + mangles returned data + many times glitches + just hangs.

You might start your entire debugging process by installing dnsmasq-base (or your OS equivalent) + stopping/disabling/deinstalling systemd-resolved.

Once I started nuking systemd-resolved as the first step of installing a new machine or LXD container, many random network related problems resolved.
ASKER CERTIFIED SOLUTION
Avatar of Rob Crozier
Rob Crozier

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Be sure you carefully check your NGINX logs.

Many people dump NGNIX after watching the logs for a few days.
Avatar of Rob Crozier
Rob Crozier

ASKER

.