Improve company productivity with a Business Account.Sign Up

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 4760
  • Last Modified:

How to scale a website, Google App Engine, Amazon Web Services or other

First of all I'm quite new to all this and may have some misconceptions.

I have a website (Linux, PHP, Apache, MySQL) that runs on a small managed server (I can somewhat setup a server but don't feel comfortable running it all by myself). It's a small browser-based game where people manage their own soccer team and play against each other. At specific times during a day cron runs a php script that plays all the games (thousands), evaluates transfers and such. It can take quite a while (up to 40 minutes) and is quite taxing for the server so we display a simple message to users and don't allow them to do anything else. After this is over up to 200 users are logged in at the same moment to check their results and the server is noticably slower for an hour before it subsides to normal levels.

At this point I've decided to start over, completely rewrite the code and try to market it a lot more. In the future it should be able to handle 10x the workload and more.

However, I have no idea how the architecture behind all this should look like and I probably don't have the money to have it done for me. If it's going to be 10x more taxing it'd take 400 minutes to simulate all the games and the server would be crushed by 2000 concurrent users. I obviously need a stronger server but also more than one. At that point it'd probably be good to separate database, all the simulations and webpage serving.

Since I'm not a pro and want to spend my time actually developing the game I'd like to have as much as possible done by the service I choose. Right now I'm looking at cloud-based services such as Amazon Web Services and Google Cloud Platform which allow me cheaply test whatever setup I decide to use.

My questions are:

What is the best service to use and exactly how do I make all the servers/instances work in unison? It'd probably need to be as automated as possible.

When there is 20000 games to be simulated and I want it to be done quickly how do I actually tell two or more servers they need to start doing something - access a database on a separate server, look at games that need to be simulated, then each taking their subset of games to simulate and finally upload the results back to the database?

If I needed more than one server for a database how do i connect them together to act as one?

Services such as GAE and AWS offer load balancing - they seem to be straight forward and easy to set up and can be a solution for many concurrent users accesing the website. Is there a similar solution that can distribute the games I need to simulate to other servers?

Is there some guide that would actually give me step by step instructions with examples?

Thank you for your help!
  • 3
  • 2
  • 2
  • +2
8 Solutions
Ray PaseurCommented:
The real answer to your question is a bachelor's degree in computer engineering.  At scale things work differently from many of the tidy and neat and "efficient" things we're taught in the introduction-to-programming courses.  Your basic data base introductory classes will teach you about normalizing your data base.  This does not work at scale, where your objective is to denormalize so you can throw hardware at the problem.

If you don't have cash, but do have access to a college-level computer science or engineering department you might look for a graduate assistant who would want to be a partner in your enterprise.  You could pick up a lot of expertise, but of course you would have to sacrifice some part of the ownership.

You can get help and advice from the support departments at AWS and similar services.

I think before you start refactoring the code you might want to ask yourself why the current code is slow.  In my experience with computer performance evaluation slow applications are always caused by poor response from the I/O subsystems.  In web applications this almost always means the data base.  Here is a short list of things you might consider.

Do you have any queries that say SELECT * ?  If so, change the queries to SELECT only the exact columns your script needs.

Do you have any queries without a LIMIT clause?  If so, ask yourself if the query absolutely requires a complete table scan, and if not, add the appropriate LIMIT clause.  UPDATE queries seem to be common offenders.

Do you have any complex queries ("complex" would be defined as touching more than one table)?  If so, use EXPLAIN SELECT to learn how the SQL engine is handling them.

Do you have any compound queries (multiple SELECT statements in a single query string)?  Use EXPLAIN SELECT on these, too.

Do you have appropriate indexes, covering every column used in WHERE, GROUP, ORDER, JOIN, HAVING, etc?  

Do you have images or other BLOB-type data in the data base?  Consider moving large data objects into the server file system and just storing the URL in the data base.

After you've tried these things you will have a handle on what the performance issues have been in the current installation and you'll have some experience that will inform your data base design as you go forward.  You may find that there are really big improvements in performance coming from really small changes in the code.
Steve BinkCommented:
Ray's comments are definitely on target.  Optimizing your database is going to be one of the most important aspects of your application.  However, it is not the only place to look for speed improvements.

First, you mentioned a cron job performed to "play" all the games.  While this script executes, your site and application are unavailable.  This tells me that you are thinking linearly.  Find ways to multithread your application.  For example, instead of one cron job that runs through 20000 games sequentially, you can set up a few smaller threads, each working through a single game.  These threads should pull their next task from a queue system, which is populated daily with the necessary game information.  Make sure you include flags on your master game able to indicate if a game has been calculated or not.  This will have the nice side effect of allowing you to display real-time results as games are completed, or estimate how long a player must wait until their game will be ready.

Another consideration is web traffic.  If you have 20000 games, it is a logical inference that you could have 40000 participants (or more...don't forget the spectators) hitting your site during the peak hours.  Your server architecture should be able to handle that kind of load, or at the very least, fail gracefully.  Some of this can be handled by properly configuring your web service (i.e., Apache, IIS, etc) to be more efficient.  Separating your web and database servers will be helpful, allowing them to live in environments where they do not compete for resources.  

Even with properly separated services, you may find your web server just can't handle the peak load.  In that case, start looking at load-balancing configurations.  This is an environment with two or more web servers, all running the same application, with a device sitting between them and the internet.  As a request comes in, the balancer decides to which back-end server it should be routed.  For these systems to work efficiently, it is absolutely critical to have proper separation of your code and data.

Yet another part is how your application obtains data and renders pages.  Chances are, your data retrieval code is inline with your page rendering or other application logic code.  Separate your data retrieval capabilities into an API-based architecture.  This can make your web server threads more efficient, thereby spending less time on each individual request.  Less execution per thread = more requests being served.

Perhaps the most important advice of all contained in Ray's first paragraph.  Building to scale is a whole specialization within web development.  Even with the proper training and background, optimizing a specific application will almost always be a continual trial-and-error process.  There are no simple guides to follow - be prepared for the long haul, and lots of education.
A large portion of my day job is determining how to design various parts of applications to scale better. I completely agree with Ray that this is not a simple question or answer.

Scaling depends heavily on the situation. For example, cars drive along a freeway and to scale up the number of cars the freeway can handle, you have to add lanes, and those extra lanes have specific construction guidelines and impacts on their surroundings.

Adding freeway lanes won't solve anything related to a problem of enabling more kids to be in an amusement park at the same time (it might even make things worse, since more freeway lanes could allow more people to arrive at the amusement park at the same time, and if the park can't scale to take them all, it's bad news).

So each application has its own way of scaling, and that growth can impact other things, as well. There will always be SOME bottleneck.

If you consider yourself small (in terms of volume), it may be that the way the code is written is having too much of an impact and a rewrite could solve your problems for the foreseeable future.

Usually when I'm coaching new developers on how to program with scaling in mind, my first instruction is to start thinking about what jobs can be done at the same time.

Let's consider an app with the following steps:

Step A: Get all records of data from the database (let's say 10,000)
Step B: Perform simulations
Step C: Save the results

A lot of developers will build an application so that A is done, then B, and finally C, and each one has to finish before getting to the next. That works great at first, but it doesn't scale well, because you have one script that is trying to process everything, so every additional record of data will slow down steps A, B, and C.

This is like having a factory that could hold 100 workers, but instead you have one guy that is running around doing everything. So record #2 just sits and waits there until the guy finishes record #1 (think of each worker as being a running instance of your application / script), and then record #3 has to wait until record #2 is finished, and so on.

Servers are made to handle quite a bit, and you might be surprised at how much they can handle at one time when they are put to the test. They are indeed a factory that can hold quite a bit more than one person.

By changing step A so that instead of grabbing ALL the data in the database to be processed, it simply grabs up to 100 records, and then processes those, you can run your application multiple times. Each instance of the application / script can grab the next 100 records to be processed (you have to flag them as they are grabbed so that other runs of the application will not grab them and then try to do the same work - something called a "race condition").

It might take one worker an hour to process 1000 records. By splitting up the work evenly among 6 workers, you might be able to process the same amount of data in 10 minutes. For the record, this is considered going from a sequential or "serial" flow to parallel.

There are still potential bottlenecks at various points, but this tends to make the most use of initial resources on the server. Even if you ran out of resources on that server, you would likely find a cheaper hardware-based solution by purchasing a used server and sending it to a colocated data center like ColoPronto ($49/mo for 1U, or $79/mo for 2U - you can fit an amazing amount of server power into that space).

It's not an exact science and some apps can't scale this way, but it tends to apply to a lot  of applications that were written quickly.

I'd also heavily recommend profiling your application to get a better sense of code-level bottlenecks or inefficiencies:
KuppingerCole Reviews AlgoSec in Executive Report

Leading analyst firm, KuppingerCole reviews AlgoSec's Security Policy Management Solution, and the security challenges faced by companies today in their Executive View report.

greetings Carbonecz, , you say - "it's small browser-based game" and it is running in a PHP server environment. I have had to tangle with some game programming, and in development you set up your needed factors (player GUI, player controls (inputs), backend input analysis , access all other GUI environment interactions (location on the fieldmap and other players affected by input), access database to select and update all affected factors. . . . so it works great, then when tested with 25 players it is TOO SLOW,
so you go back and chop out, eliminate, delete code, optimize-minimize (tables, db access) UNTIL the speed works for 25. and then try 100 users, and again do the chop-outs so the speed works for that, and on and on.
My view is that your - "evaluates transfers and such. It can take quite a while (up to 40 minutes"
is absolutely unacceptable! I am sure that you have some idea the very tremendous-tremendous amount of processor cycles (computational work) that 40 minutes has in a 4 mo-board, 4 processor quad-core xenon per mo-board server.
I would think that this "evaluates transfers" cron jobs, is not a well thought out gaming operation-strategy. If you expect your players to wait 40 minutes, even once a day, you need to think again about this "evaluates transfers" cron job. Even if you get more hardware or cloud-access, there MUST be a way to incorporate the "evaluates transfers" into the database operations for the actual game play, so that a cron job would NOT be needed OR the cron only takes 30 seconds - 2 minutes.
There are 100's of developer forums Specifically for "Game Developers" where you can get info about reducing GUI to backend transfers, successful game database setups, and fast "game participation evaluations", and other stuff that you have never even thought of.
Game programming is a specialty knowledge thing, and even at the less that really complex game level, there are many already developed and updated game engine methods you can access to see what others have done over the years.
CarboneczAuthor Commented:
The main bottleneck definitely is my DB. A cron job runs the PHP script as a separate process so this by itself doesn't affect the server all that much, it's when it starts using the DB heavily the server is unresponsive (even with some caching). At this point it's better not to allow any user to do anything so a simple message is displayed. However, it's not the only reason I want to rewrite everything. I didn't write the initial code which is over 10 years old (I bought the website 1,5 years ago to learn from it) and which is a pain to use, also the DB is badly designed and very bloated (couple tables have 1.5GB of data in 20 million rows each) and I would like to change the game itself so that's why I want to start over.

I know I probably won't be able to do everything by myself forever, but I want to do things right, right from the beginning. That means understanding how stuff works so I'm not blindsided by unforeseen problems which will halt everything for weeks and/or require to completely rewrite the code.

Anyway, my thought about the distribution of a workload, lets say simulation of those 20000 games, was that I'd have a master that pulls whatever information is needed from a DB (game IDs), tells each slave what games (IDs) it's supposed to work on, the slaves then pull all the necessary data about the games from the DB, calculate results and then push the results back to the DB. So the issue is how to send and receive messages between master and slave. I was thinking about sending a message from the master using PHP's fsockopen() which then runs a php script on the slave. I just pulled that out of thin air so it's probably not how it's supposed to be done.

Is there somewhere information about scaling a php application so I don't make a expensive mistake right at the beginning just because I don't know any better? I can Google so I've found some information (like this one mensioning some good practices with examples, but if you know about some more comprehensive resource I'd be glad.

Edit: just to ilustrate what I'm aiming for:
Twenty million rows isn't tiny, but it's not overwhelming, either. You might be able to benefit greatly from optimizing the database portion. Again, code profiling should point out how long it's taking to process different sections of code, including database queries, so follow the instructions in that first article I linked to, and then read this article on how to optimize the database performance.
Also, I wouldn't mess with sockets. There's really no reason to use sockets unless you're using some kind of custom protocol or you want to build some kind of real-time communication, but that's getting complicated quickly and likely without much gain. You can have the slaves talk directly to the database with the MySQL extension. You just need to figure out how to allocate and distribute the jobs.

The simplest way is to simply add a column to the main table with the data, and call it something like "job_id". Have the slave generate a GUID, then lock the table (from updates so you're not competing with other slaves), update 100 or so records (WHERE job_id IS NULL) with the GUID, and unlock the table. Then re-select all records where job_id = the GUID you just generated. This gives the slave a batch of data to work on, and then the slave can do its thing from there.

FWIW, a medium-grade scaling architecture in your situation typically includes multiple database servers that are set up with bi-directional replication (or a MySQL cluster), and different slaves pull jobs from the different DB servers.

I still don't think you need multiple servers and the full master/slave architecture yet, though. You can probably radically optimize the app and db.
OK you say, "code which is over 10 years old"
I would think that there is no way around it, you need to start development OVER again, and using better PHP code, better MySQL table architecture , and probably use the NEW web techs, like AJAX, CSS3, HTML5, PHP classes, mysqli, and the new Javascript implementations.
You simply CAN NOT do the cron job, as you have it, as a game database operation - evaluation now.
If you add the db Table update (for the player use evaluation) to the game play code, it may add a few microseconds to each of several thousand player interactions, instead of dong ALL games evaluations at one time in a cron job, taking half an hour.
Ray PaseurCommented:
... about scaling a php application so I don't make a expensive mistake right at the beginning just because I don't know any better?
I recommend that you contact Eli White at php[architect] and retain the firm to design your data base for you.  Eli was one of the original developers of Digg, and has written at least one book about PHP.  I know him well and he knows exactly what you need to do.  You could probably spend a few thousand dollars now and save yourself from the expensive mistake, or (with the understanding that time is money) you could take the time to learn all the stuff that Eli already knows, then do the design yourself.  I know which I would choose, and that's why I recommend php[architect] to you.

Best of luck with the project, ~Ray
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Building an Effective Phishing Protection Program

Join Director of Product Management Todd OBoyle on April 26th as he covers the key elements of a phishing protection program. Whether you’re an old hat at phishing education or considering starting a program -- we'll discuss critical components that should be in any program.

  • 3
  • 2
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now