§14.5.

Replication

Replication is a scalability strategy that spreads the workload of large numbers of incoming requests over identical servers.

Replication is similar to the strategy used by cashiers at the supermarket. If one cashier can scan and charge 20 customers per hour, ten cashiers should be able to process 200 customers per hour.

If all the data in a system is read-only, then replication is theoretically trivial. Identical instances of the system running on separate servers can handle any incoming request.

Of course, the reality is not that simple. The distribution of incoming to servers can be difficult, and most systems are not read-only (users will change the data).

Load balancing

Given an application replicated over several servers, load balancing is the problem of allocating incoming requests across the servers.

Load balancing is analogous to the problem of deciding which queue to join a supermarket. Some cashiers are faster than other cashiers. Some customers have larger trolleys than other customers. In some supermarkets, a cashier may even leave while you are still waiting in the queue. A price check may hold up your queue for a long time. When I’m at the supermarket, I like to be in the optimal queue. Still, I probably spend more time deciding which queue to join than I spend waiting in the queue.

In a web application, load balancing presents similar challenges:

  • Not all requests take the same time to process

  • Servers may have different configurations better suited to different requests (e.g., a server with faster disks would be a better choice for requests that involve lots of disk access)

  • Some servers will have already cached the response to similar requests

  • Some servers may malfunction (and they may malfunction in ways that are difficult to detect)

Strategies to perform load balancing range from simple to complex:

  • Round-robin: sending the requests to servers in order (e.g., with three servers, sending the first request to server 1, the second request to server 2, the third request to server 3, and then for the fourth request going back to server 1 again)

  • Random: randomly sending requests to servers (on average, this should result in an even load across all the servers, but there is no guarantee)

  • Hash-based: allocating servers based on a hash of the incoming request, user or user IP (this ensures that related requests go to the same server, improving the likelihood of caching being successful)

  • Adaptive: sending requests to servers based on how much capacity the server reports that it has available or how fast it handled previous requests

Specialized components a web application typically handle load balancing (e.g., Nginx Load Balancing, AWS Elastic Load Balancing, Google Cloud Load Balancing and Azure Load Balancer). However, you can embed simple load balancing strategies in the client-side logic or DNS (for example, by clients randomly choosing from a list of servers).

Reflection: Benefits and weaknesses

What are likely advantages and disadvantages of each of the load balancing strategies (round-robin, random, hash-based, adaptive)?

Replication with writes

The main challenge of allowing writes across replicas is the difficulty of ensuring consistency and serializability.

When a system must support updates, there are contradictory design demands:

  • After a data update, every subsequent request should use the new value. Every replica needs to be kept up-to-date.

  • If one server has crashed or runs slow, it should not cause every update to fail or run slowly. Updates should not depend on every replica being online.

  • If something fails halfway through an update, then the system should be in a predictable and consistent state.

In other words, there is the need to strike a balance between requiring the replicas to act together for consistency versus working autonomously for higher performance.

Unfortunately, the CAP theorem [1] states that it is impossible to resolve this contradiction. We are forced to use faulty networks and servers, and this means that we must allow either accept some data inconsistency or some performance degradation (or a bit of both).

Modern database management systems have replication functionality built-in. The way to resolve the design demands is to carefully consider your application’s data requirements and select a database management system that supports that requirement.

In other words, what does your application really need? Consider the same questions that I introduced in Chapter 13:

  • Does it matter if your users see stale data?

  • Does it matter if your users see different versions of the same data?

  • Does it matter if you lose (some) user data?

  • Does it matter if your users experience delays?

  • Can you recover in other ways (e.g., “offline”) if something goes wrong?

  • What kind of operations will users be performing?

With these questions in mind, databases can be chosen based on their performance characteristics and replication strategies.

Strategies for replication in distributed systems with writes include the following:

Primary/secondary

The classic approach to replication is to designate one database server as a read/write primary, and all other servers as read-only secondary replicas. The primary must copy any change to each of the replicas.

This strategy is well-suited to the common situation where a database must perform many more queries than updates. Read-only replicas handle the vast majority of the workload, and the primary handles only the small number of writes.

In the event of an error in the primary, one of the secondary replicas can be ‘promoted’ to be a new primary (either manually by the system administrator or by automatic detection and fast fail-over).

The specific implementation of the replication strategy can dictate the performance characteristics. The primary can ensure high levels of consistency by waiting for all or most replicas to confirm receipt of updated data before committing a transaction. On the other hand, faster response times for writes can be achieved by asynchronously sending the updates to secondary replicas and thereby making it possible for queries to replicas to be slightly out of date.

Multi-primary replication

In multi-primary replication, many database servers are capable of performing writes. The databases may use a range of strategies:

  • Replicas act in primary or secondary roles based on the table or data (i.e., one server might be the primary for user data and the other might be the primary for orders)

  • Each replica allows writes, acting as a primary for that particular update, and synchronizing those updates with all the other replicas

  • Client code directly updates all (or a supermajority) of replicas

Eventual consistency

Where consistency or serializability guarantees are not important, the distributed data types described in Chapter 13 are an alternative to serializable write replication (e.g., the grow-only set). Provided replicas regularly synchronize data, the database will eventually become consistent across the entire system.

Quorums

A key insight behind failure-tolerant replication is that multiple reads may identify the partial failure of an earlier write.

For example, in the diagram below, there are five replicas. If there is a network failure that prevents Client A from updating Replica 4, then Client B will receive out of date data if it reads from Replica 4.

Scalability design cycle

On the other hand, if Client B ensures that it always checks at least two replicas, it can be sure that it will always see the latest version of the data, even if one server has failed. [2]

Scalability design cycle

The required number of readers or writers is a quorum. Different values for the quorum result in different trade-offs between resiliency against errors and performance.

A database management system can use quorums to manage updates to individual database records. A database management system can also use quorums to automatically nominate a primary replica among a collection of possible candidates without having any single-point-of-failure.

It is impossible to reliably detect whether a server has failed because there is no way of telling the difference between a faulty server and a fault in the server-fault-detection logic. For this reason, if replicas ‘detect’ a faulty primary replica, a democratic process is required to ensure that only one replica is safely elected to be the new primary. If each replica votes on a new primary and the vote results in a majority quorum (more than 50% of the replicas), it is impossible for the system to elect two separate primaries.


1. The acronym CAP stands for Consistency, Availability and Partition tolerance.
2. To be resilient against multiple failures, the client must read more replicas.