14.4: Caching (Advanced Internet Programming)

Caching is pervasive in modern computing. It is necessary because of the trade-off in storage technology price versus performance.

For example, the AMD64 instruction set used on 64-bit AMD and Intel processors has just sixteen 64-bit general-purpose registers. These registers contain a total of 1024 bits (or 128 bytes) of ultra-high-speed memory that update at the CPU’s speed. Gigabytes of such ultra-high-speed memory would cost millions of dollars.

Instead, there is a storage hierarchy. We can buy terabytes of slow disk storage for hundreds of dollars, gigabytes of fast RAM for hundreds of dollars, and CPUs have kilobytes of high-speed on-board cache memory, again for hundreds of dollars.

A cache is a copy of data held in high-speed or low-latency memory, to avoid the retrieval or recomputation of the same data stored in low-speed or high-latency devices.

Caching is possible because most data is rarely accessed. A tiny proportion of data is regularly accessed. Furthermore, once data has been accessed, it is often accessed again within a short period. For example, consider an email server. Emails received five years ago are seldom read, emails received today will be read once or twice, and the ‘Subject’, ‘Sender’ and ‘Date’ for the most recent twenty or so emails will be needed every time the user refreshes their screen.

Because caching is pervasive, the systems you build are already benefiting from caches. As a developer, it helps to understand what these do, how to best take advantage of them and when to add additional caching.

Reflection: Caching and benchmarking

Data collected during benchmarking (especially when using performance.now()), often shows that the first few handled requests are slow but the system becomes faster. Why might this happen?

Tip	Caching is one explanation for why the system speeds up. However, what specific things are likely to be cached that would cause request handling to become faster?

What to cache

Caching extends beyond the HTTP caches already provided by your browser. You might decide to cache the following:

JavaScript objects in memory (or retrieved from a database) when serving requests
Complete HTTP responses when handling a request
Media such as images, CSS and HTML files stored on disk
The results of slow function calls ^[1]

How to cache

Recall that layering is a design technique that hides implementation details of lower levels from higher levels.

Layering can also guide caching. There is an opportunity to introduce caching whenever the output of a lower-level layer is the same across repeated requests.

At its simplest, a cache is just a mapping from inputs to saved outputs. The following example code demonstrates how a slow layer, slow(…) can be made faster by using a dictionary to cache results:

function slow(input) {

    ... // Complex calculations go in this file

    return result;
}


let cache = {};

function faster(input) {
    if (input in cache) {
        // Use the cached value
        return cache[input];
    } else {
        // Otherwise, call the lower layer
        let result = slow(input);
        // And save the result for future use
        cache[input] = result;
        return result;
    }

}

Most of the time, development involves working with existing caches (which are also far more sophisticated than the example above), rather than implementing caches from scratch. Frequently encountered caches are listed below:

In the rendering engine (client-side JavaScript)

Data and other responses can be stored in ordinary JavaScript variables or more permanently in the browser’s window.localStorage and window.sessionStorage objects (Web Storage API).

By the browser engine

The web browser automatically performs caching of an HTTP response. The Cache-Control header in the HTTP response provides expiration and caching information to indicate an appropriate lifetime for the cached value. Other headers can control validation of the cache.

In the network

Rather than clients performing direct requests to a server, content distribution networks (CDN) provide caching services worldwide. End-users connect to a nearby CDN that attempts to cache as many responses as possible, only forwarding uncachable requests to your server.

Commercial CDNs include Cloudflare, AWS CloudFront, Google Cloud CDN and Azure Content Delivery Network.

A related idea is to deploy servers around the world, rather than in a single data center. For example, you might deploy the same code to servers on each continent (Australia, Europe, Africa, Asia, North America and South America) to ensure every user experiences low latencies.

In the server

On a server that you manage, there is a range of options for caching:

Installing a caching reverse proxy on the server. The proxy handles incoming requests and uses a cache wherever possible, but forwards requests to your Express server when not possible (popular options include Nginx, Varnish and Squid)
Saving rendered pages, results or objects in the Node.js process, using JavaScript dictionaries
Saving rendered pages, results or objects in the Node.js process, using a specialized caching/memoization library (e.g., lru-cache, cacache, fast-memoize and memoizee)
Saving rendered pages, results or objects in an external in-memory caching database (e.g., Redis or Memcached)

In the database

Databases automatically cache records in memory. In a large database, repeat queries for a record will typically return faster than the first query of that record.

In addition to the automatic caching included in a database, many databases (including PostgreSQL and MongoDB) include support for creating materialized views. A materialized view is a table that stores the results of a query: it is a cache that stores the precomputed results of a complex query.

Cache invalidation

Cache invalidation is the problem of deciding when and how to delete values stored in a cache.

There is a well-known saying in computer science, attributed to Phil Karlton:

There are only two hard things in Computer Science: cache invalidation and naming things.

Naming is difficult because a good name needs to be concise, clear, easily understood, yet also unique and timeless. It seems deceptively easy, yet causes a great deal of trouble when done poorly.

Cache invalidation is also very difficult.

Consider, the home page of a university’s website. It is a good candidate for caching because it is heavily used and only updates every few days (with different news stories). It might be reasonable to use Cache-Control headers that invalidate the cache once per day. However, should a sudden emergency (fire, bomb, terrorism) require an emergency change to the home page, caching may mean that users don’t see the changes for hours.

Some strategies to improve the responsiveness of caching, without altogether eliminating caching, include the following:

Reducing TTL: Lower settings for the expiration or time-to-live (TTL) will ensure a more current cache. If a cached page expires in ten minutes, then updates will be seen by end-users within ten minutes (or five minutes, on average). The design trade-off in reducing TTLs is lower efficiency and performance because the cache can service fewer requests.
Create immutable resources: Rather than attempting to invalidate cached values, each version may have a separate name. For example, http://www.example.com/api/press_releases_today must be invalidated daily. In contrast, http://www.example.com/api/press_releases/2020_1_1 never needs to be updated, because the news from January 1 should never change (it is immutable).
Fragment resources and use different TTLs: The expiration for cached data can be set on a per-resource basis (as opposed to a single setting across the entire cache). Long-lived content can have a prolonged expiration. Short-lived content can have a short expiration. For files that contain both long-lived and short-lived content, it may make sense to break the file up into parts that can be cached separately.

In caches held on servers under your control, there is also the possibility of directly invalidating cached data. You could use custom logic to delete cache entries (so that they need to be fetched again) or preemptively override data in the cache.

Reflection: Revving

The JavaScript code in a single-page application is often bundled into a single large file (e.g., bundle.js).

This file is large, and so should be cached as much as possible. Occasionally, a bug or security issue demands immediate replacement of a file.

Bundlers such as Webpack can incorporate a hash of the bundle contents into the filename (e.g., bundle.2346ad27d7568ba9896f1b7da6b5991251debdf2.js instead of just bundle.js).

Then the HTML for the single-page application may be a very small (156 byte) file:

<!DOCTYPE html>
<title>Single Page Application</title>
<script src="bundle.2346ad27d7568ba9896f1b7da6b5991251debdf2.js"></script>
<noscript>Please enable JavaScript</noscript>

Can you explain how including the hash in the filename can help improve the performance?

Tip	Consider how this approach interacts with HTTP caches and cache invalidation strategies.

Tip	This technique is also known as revving.

Reflection: Materialization

Precomputation or materialization is an extreme form of caching, where every possible request is calculated in advance and stored in a cache. ^[2]

What are the advantages and disadvantages of this idea? In what situations or applications might this technique be useful?

1. This kind of caching is also known as memoization or tabling (note: be careful not to confuse the word ‘memoization’ with the very similar word ‘memorization’).

2. Static site generators are a simple example of this idea.