Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win


How to Build a Scalable Website

Published on
3,428 Points
Last Modified:
Serhiy Kozlov
Founder & CEO of Romexsoft, a tech entrepreneur having vast experience in product development and software outsourcing business.
This guide will walk you through the essential considerations and tech stack for building scalable websites. Know how to grow your business the smart way!
In 2004, Facebook had about one million users. As of the end of Q2, 2016, that number had increased to 1.7 billion. And when users access the site, they expect to be directed to their newsfeed immediately and for the read and write functions to perform quickly. And they do. That’s scalability in action.
Yahoo and Google – these are sites, among others of course, that have been scaled and scaled so well that millions of online users can access them at once, and the load is not a problem.
Most enterprises that need to have scalable websites will never be the size of Facebook or Google (and there’s no obvious need for that). However, the potential for a continuing increase of users and load must be planned in advance so that, as it occurs, the correct scaling architecture is in place.

Principles of a Scalable Website – Critical Tips for Development9LI5Xq22uBIq1zGaENKQS9RPHEBwNvt6VFWxhbij


Before any design architecture is created, the following planning considerations are critical.


For a company to maintain its reputation, website uptime is vital. Consider a large online retailer, for example. If the site is unavailable for even a short period of time, millions in revenue can be lost. Same goes for SaaS, online publishers and pretty much any enterprise-sized web application.

Constant availability will require a designer to think about redundancy for the key components, to provide for quick recovery of system failures and interruptions.


A scaled website with poor performance (resulting in user dissatisfaction) can impact SEO rankings as well. A rapid response along with fast retrieval (low latency) is a must.

Reliability of Retrieval

When a user requests data, the same data should show up, unless it has been updated of course. Users need to trust that when information is stored in the system, it will be there if they access it again.


The system has to be easy to operate, maintain, and update. Problems should be easy to diagnose.


It’s not just the cost of hardware and software. There is development cost, what it takes to operate the system, and training that may be required. Total cost is what it takes to own and operate the system.

It’s important to note that these principles must all be considered but there may be trade-offs. If for example, you decide to resolve capacity issues by adding more servers, costs will increase and manageability may become more difficult. 

Architectural Basics

While investing in initial full scaling is probably not smart, whatever architecture is developed, the potential for high scalability must be considered. This will save resources and time later on. There are core factors that will make scalability easier as it becomes necessary.

  1. Services
There are two services that are supplied to users of a website – writing and reading. For example, a large site that hosts photos will serve users who come and search for a photo (read) and then retrieve a photo or series of photos from which to select (write). Most businesses want the read (retrieval) to be faster than the write. In designing such a site, decisions need to be made. 

  • Scalable architecture for storage should be planned for, because ideally, there should be no limit.

  • There must be quick retrieval of images, and the more images queried at a single time, the more scalability there must be.

  • If a user wants to upload a photo, it must be there permanently – storage scalability again.
Building a small site, hosted on a single server, would be pointless. Scalable web architecture must be built in so that as more users upload and retrieve data of any kind, there is the capacity to store it and to allow fast retrieval.
Dividing the Functions: Service-Oriented Architecture (SOA) is the solution. Each service (write and read) can have its own functional context. And anything beyond that context will occur through the API of another service.
De-coupling the write and read functions will allow the following:

  • Problems can be isolated and addressed with greater ease

  • Each function can be scaled independently – again easier to do, and bugs with one function will not impact the other.

  • The two functions will not be competing with one another as use grows. When they are connected, a large number of writes, for example, will pull resources from read, and retrieval will be slowed.

  • Decoupling also prevents the typical issue of a web server (e.g. Apache) having a maximum number of simultaneous connections for either write or read.
Adding Shards: As use continues to grow, it is also possible to add shards to prevent bottlenecks. Users are thus distributed across shards based upon some predetermined identifying factor. While this reduces the number of users impacted by disruptions (each shard functions separately), there is then the additional need for a search service for each shard so that metadata is collated.
There is no right universal solution – each situation is unique. It will be necessary to determine future needs (e.g., concurrency levels, heavier reads or writes, or both, sorts, ranges, etc.) as well as to have a plan in place when a failure occurs.
Scalable web application architecture must be custom architecture based upon individual circumstances.
  1. Redundancy
Redundancy is not optional. Any web architecture must have it so that the loss of anything stored on one server is not "fatal." Particularly from a services standpoint, if a core functionality piece fails, there is another copy simultaneously running. Here are the key elements of redundancy:

  • Failover: When a service degrades or fails, failover to a second copy can occur automatically.

  • Shared-nothing architecture: When each node can function independently, new nodes can be added easily without coordination to some central coordination, so scaling is much easier.

  • No single failure point: Failure of one node does not mean failure of the others.
Going back to the example of the photo hosting website. Any photos that are stored would have redundant copies on hardware somewhere else, and the services for accessing the photos would be redundant as well. The same obviously applies to CDN.
  1. Partitions
Adding capacity must be planned for in advance in building scalable websites. And there are two choices here – vertical or horizontal scaling.
Vertical Scaling: In this instance, you will add more resources to a single server. So, if you have a large dataset, this could mean adding larger or harder drives. It could also mean moving the compute operation to a larger server with more memory or speedier CPU. The point is this: you are taking a single resource and increasing its handling capability.
Horizontal Scaling: Here, you will be adding more nodes. So, you might add another server to store some parts of a large dataset, or in the case of a computing resource, you would split the load over additional nodes.
Note: If horizontal is your choice, then you will need to include it in your initial design with a distributed system architecture. It will really be a chore to do modify for this scaling after the fact.
The most common horizontal scaling is the breaking up of services into partitions (shards). They can then be distributed into separate functionalities, by criteria (American customer vs. European customers, e.g.). the benefit, of course, is that partitions provide stores of added capacity.
There are some challenges with partitioning because anytime you distribute data or functions among multiple servers, not the least of which is data locality. If data is not local when needed, servers will have to "go fetch."
A second challenge is an inconsistency. If different services are writing and reading from a shared source, there can be an incident in which someone is sending a request for something at the same time it is being updated by someone else.
All in all, however, partitioning data allows manageable chunks and much easier scaling. Studying up on how to mitigate the risks involved will give you some potential solutions.

Scalable and Fast Data Access – Strategies and Methods

Simple web applications involve the Internet, an app server, and a database server. Developing scalable web applications allows to grow, and there are two challenges of access – to the app server and to the database.

In building scalable web applications, the app server usually reflects shared-nothing architecture. This allows it to be horizontally scalable. The hard work is thus moved down the stack to both the database server and to any supporting services.

While there are certainly challenges, there are some common methods to make a scalable database and other services that provide scalability of storage and quick access to data.

You have a huge amount of data, and you want to allow users to access small pieces of it. So, if user A is looking for some piece of data based upon a keyword, for example, that request goes through an API node to your huge database. Now, disk I/O reading, when there is a large amount of data, is really slow. Memory access, on the other hand, is exponentially faster, for both sequential and random reads. Even with built-in ID’s, locating a tiny piece of data can be a tough task. There are many solutions, but the key ones will be caches, indexes, proxies, and load balancing.


The principle is simple: data that has been requested recently is more likely to be requested again.

So, they are used in almost all layers of architecture, and will allow faster retrieval that going back to the original source in the database, particularly as that database continues to be scaled. There are a couple of places to insert a cache.

  1. On the request node

  • You can insert on the request node. If the data is stored there, the user retrieval is almost immediate. If it’s not there, then the node will query the disk.

  • As you scale and add more nodes, each node can hold its own cache.
The only problem with this architecture is that if your load balancer randomly sends requests to different nodes, there is a much greater potential for misses.
  1. Global Cache
While this requires adding a server when all nodes access the same cache, the chance for misses is far less. The downside is that the single cache can get overloaded as numbers of users increase. Again, a decision must really be made based upon individual circumstances. There are two architectural designs for a global cache. In one, the cache queries the database if the requested data is not held; in the other, each node moves on from the cache to the database.
  1. Distributed Cache
This architecture provides for the distribution of pieces of data throughout all of the nodes. The nodes then check with one another before fetching from the database. This can be a good structure for scaling, because as new nodes are added the, too, will be caching data. The more data that is cached closer to the user, the faster it is retrieved.
There is a number of open source caches, but most have limitations. Language-specific options such as JavaScript tend to be better.


Proxies can be a great help in scaling, by coordinating multiple server requests when they are the same or quite similar. A proxy can collapse all same requests and forward only one request to the database disk, reading the data only one time. While latency time for an individual requester may increase a bit, this is offset by the ability to improve high load incidents.

Proxies and caches can also be used together, so long as the cache is placed before the proxy, because it is working from memory and can take the load from the proxy as user volume grows.


Adding indexes to the original website architecture will give the benefit of faster reads as data and servers increase. When there are data sets of huge TB size, but a requester wants just a tiny piece, finding that tin piece is tough, especially when that data is spread out among physical devices. Indexes will solve some of this problem.

Indexes basically set up a data table based upon where that data is housed. And as more data and devices are added, that data table can be enlarged too. As a request comes in the index directs the query to the right data table where it can then be broken down even further for a specific piece of that data. Far faster than searching the whole of the data. The write end of the query may be slowed, but in the end, the data is retrieved and back to the requester faster. Indexes are a great tool for scaling.

This is an evolving architecture, as ways are sought to compress indexes, which can become quite cumbersome as the data becomes larger.

Load balancers

Load balancers are critical pieces of architecture for scalable website development. The concept is to distribute the load as the numbers of simultaneous connections increase and to route connections to request nodes. Thus a site can increase services just by adding nodes, and the load balancer will respond according to the criteria that have been set up.

Nginx is a pretty good choice for Node.js process load balancing. In addition to being pretty easy to configure, developers can assign different weights, and it will be found to be very useful for horizontal scaling.

Load balancers are usually placed up front so that requests are routed in the best manner. And if a distribution system is complex, there can be more than one load balancer put into place.

One challenge with load balancers is that the same requester may be routed differently during ensuing visits. This is generally a problem seen by e-commerce sites that want the shopping cart to remain through all visits. Thus, there is the ability to build in stickiness so that the same requester is always routed the same, but then node failures can be a problem. Browser caches and cookies can offset this somewhat, but it is something to think about when building a scalable site.


When building new sites, as for a startup, write management is pretty easy. Systems are quite simple, and writes are fast. As a site grows, however, writes can take longer and longer due to factors already discussed. To plan for this, scalable web application developers will need the architecture in place to build in asynchrony, and queues are a solid solution. This allows a client to make a request, receive acknowledgment of that request, and then move on to other work, periodically checking back. Under synchronous systems, the client simply waits, doing no other work. This is an issue during heavy loads.

The other benefit of queues is that they can be built to retry requests is one has failed for any reason. This is just better quality service.Queue.js is a pretty simple and yet efficient queue feature for JavaScript, especially for larger queues.
Originally my article was published on Romexsoft blog.
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Join & Write a Comment

Monitoring a network: why having a policy is the best policy? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the enormous benefits of having a policy-based approach when monitoring medium and large networks. Software utilized in this v…
Please read the paragraph below before following the instructions in the video — there are important caveats in the paragraph that I did not mention in the video. If your PaperPort 12 or PaperPort 14 is failing to start, or crashing, or hanging, …

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month