One of my most common tasks with clients is breaking the bad news that their code can’t stand their success. In other words, one of their ideas paid off and now their technology can’t keep up with their traffic, usually because the coders they hired delivered exactly what they were asked to deliver.
The coders delivered code to accomplish some task, for the cheapest price, with no additional parameters of the code’s effect on performance, and without consideration of concurrent visitors the code must handle.
A recipe for disaster.
WordPress provides a good example of scalable code, because WordPress leverages...
|Many WordPress caching plugins do use mod_rewrite and most use mod_rewrite so poorly, you’d be better off using no caching plugin at all. You must test using PHP FPM access logs to ensure requests stop at the Apache level, with no leakage down to PHP.
Writing scalable code is very simple.
You accomplish this by minimizing disk i/o, especially disk writes down to the bare minimum.
You can cheat and use WordPress as a framework and write your code as a plugin.
Writing scalable code outside of a framework like WordPress requires some considerable thought and testing at every stage of development.
|MariaDB is a drop in MySQL replacement. Think of MariaDB as MySQL that works faster and better than MySQL.
Recently, I took on a new hosting client and after analyzing his MariaDB logs for one day, I told him he’d have to rewrite much of his code or I’d have to move him to more expensive hosting.
The problem was simple: He was using MariaDB to first do a SELECT on a data table, then write to the table if there was no data collision.
Seems simple, right?
Problem was, his table had no indexes and contained nearly 1,000,000 rows and had no pruning mechanism, so every day this system would get slower.
There were many tables like this, so disk i/o was taking this machine down.
Logging data is for flat text files that live in /var/log, not MariaDB data tables.
This same client was using the SQL LIKE operator to test for equality, which takes roughly 10x more resources than an equality test, so…
SELECT * FROM test WHERE value LIKE 'abc'
SELECT * FROM test WHERE value = 'abc'
Both returned the same result set, with very different resource requirements.
The first statement scans either an entire index, or if the value column has no index, then the entire data table, so potentially a 1,000,000+ reads.
The second statement does a hashed index lookup and then returns the matching rows, so 2-3 reads plus i/o to return matched rows.
Let’s think about a brute force attack, blocked by one of the many WordPress security plugins, which all espouse you should handle security at the WordPress level, rather than the Kernel and OS level.
Each attack traverses the Kernel, TCP stack, Apache, MariaDB + PHP + WordPress core + theme + all plugins and then finally the security plugin runs and returns some message to the attacker, which is likely a Bot, so this entire process can be used by an attacker to kill any machine.
The easiest way to kill a machine is to issue a 404 attack. Pick a random page you know doesn’t exist. Normally just this attack will take out a WordPress site, as 404 pages are never cached.
If the WordPress site is running a security plugin, then taking the site down is faster and easier to keep the site down, because security plugins tend to be more resource hungry than just processing the normal 404 error.
The fix is what all high-traffic sites do… Track/var/log/auth.log using fail2ban + anytime someone has 3 password failures in 1 hour, block the IP for 1 hour or 1 day. Or something similar to this.
This means first 3 attacks must run at normal resource usage, then once fail2ban issues an iptables block for the attacking IP, the machine now blocks future attacks with near zero resource usage.
So if you’re working on security enforcement code, your best tie into fail2ban or primary cause of machine failure may be due to your security code’s resource usage, while attack traffic hits a machine.
As for virus scanning, this is a joke too, because security plugins continually hammer your disk looking for hacks—they never protect you from getting hacked.
Better to work with a hosting provider with high Linux and WordPress expertise. Better to never be hacked than be hacked and have to recover.
Most of these are horrible.
They tend to read all redirects, sequentially out of the database for every request. So if you’ve set up 100 redirect links on your site—or worse, you’ve set up layers of Pretty Links redirects, so 1000s of redirect links—consider what occurs next.
Every arriving page request must read every single redirect out of the database and test each redirect for a potential match. Since most page requests never fire a redirect, each page request becomes 100s to 1000s of database reads.
For launching sites or those with continuous high traffic, redirection plugins with many redirects are a common reason sites go down.
I wrote my first lines of code in 1982… whew… Pascal + FORTRAN… shudder…
Even after decades of coding, I make mistakes all the time. Point is, no matter how many decades you code, you will make mistakes.
The key, for me, is to do incremental testing.
Mistakes caught and fixed early are usually never noticed, especially by clients paying for coding services.
When I write code I think in terms of millions. For example, a recent membership site project I coded started with the premise of, “If we succeed wildly and have 1,000,000+ members, how should code be written to still run blazing fast with this level of usage?”
The way to determine this is similar than you might imagine.
Do a 1,000,000 visit load test, like this:
h2load -ph2c -t16 -c16 -m16 -n1000000 https://site-to-test.com/
Depending on your jurisdiction (country), issuing this type of test against someone’s site may be a crime.
Also, even if you issue this type of test from one machine you control against a site on another machine you control, many ISPs between these two machines will block your traffic and may report you to authorities.
Also, the ISP where you initiate this test will likely suspend your account and you’ll have a good bit of explaining to do before they allow your account to be resumed.
|Rule: Only initiate this test on the machine where your site runs, so test traffic is contained on your machine and never leaks anywhere else.
For membership sites, test the speed of logged-in members by simply disabling your caching plugin and disabling your paywall/protection on the page you wish to test, because logged-in users will likely be running without caching.
My personal guideline: only deliver WordPress sites to clients which run at 1,000,000+ requests/minute throughput, for anonymous users (non logged-in users). This ensures sites I deliver will scale easily.
After delivering one of these sites last week, one of the agencies working on the project asked me how they could train their in-house staff to deliver sites running at similar speeds.
Here’s my best shot describing how a person can learn to deliver very fast WordPress sites or API sites or any PHP based code:
Get use to using your logs. As you build your PHP system, track the effect of every change on the throughput of your test suite.
If you use continual testing, you catch and fix mistakes early in your development cycle, so every project code base you deliver will run lighting quick.