14.3: Benchmarking (Advanced Internet Programming)

Benchmarking

The best place to start with improving performance is to collect data. Knowing a system’s current performance makes it possible to tell whether a new change is an improvement.

Data can guide efforts. If performance is satisfactory, there is no need to improve the scalability of a system. If performance is unsatisfactory, then data can guide efforts towards the most critical parts of the design, where changes will have the most significant impact.

There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, [and] will be wise to look carefully at the critical code; but only after that code has been identified.

— Donald Knuth (1974)
‘Structured programming with go to statements’, Computing Surveys, vol. 6, iss. 4.

Gathering data

The fastest way to begin benchmarking is to use the developer console in your preferred web browser (press Control+Shift+I or Command+Option+I). The networking tab will show precise timing information about each request. The performance tab will also identify slow client-side code. In Google Chrome, the Lighthouse tab provides in-depth performance, accessibility and search engine optimization (SEO) reports.

Going beyond the developer console, benchmarking tools such as ab (Apache Bench), loadtest, artillery.io, Siege and httperf can produce performance reports about your server and API.

The following example shows statistics for 4000 requests to /api/slowrequest (using four simultaneous connections):

$ npm install loadtest
...
$ npx loadtest -c 4 -n 4000 http://localhost:3000/api/slowrequest
INFO Requests: 0 (0%), requests per second: 0, mean latency: 0 ms
INFO Requests: 696 (17%), requests per second: 139, mean latency: 28.6 ms
INFO Requests: 1399 (35%), requests per second: 141, mean latency: 28.4 ms
INFO Requests: 2118 (53%), requests per second: 144, mean latency: 27.8 ms
INFO Requests: 2841 (71%), requests per second: 145, mean latency: 27.6 ms
INFO Requests: 3554 (89%), requests per second: 143, mean latency: 28 ms
INFO
INFO Target URL:          http://localhost:3000/api/slowrequest
INFO Max requests:        4000
INFO Concurrency level:   4
INFO Agent:               none
INFO
INFO Completed requests:  4000
INFO Total errors:        0
INFO Total time:          28.284452056 s
INFO Requests per second: 141
INFO Mean latency:        28.2 ms
INFO
INFO Percentage of the requests served within a certain time
INFO   50%      28 ms
INFO   90%      30 ms
INFO   95%      31 ms
INFO   99%      33 ms
INFO  100%      69 ms (longest request)
$

This report suggests there a potential problem worthy of further investigation. The server can only handle 141 requests per second. This throughput may be too low, for example, on an API that needs to handle 1000 requests per second.

Warning

You can use benchmarking tools on your computer but be careful about interpreting the results. The tools produce artificial and predictable traffic. Benchmarking can predict performance and identify issues before they arise in production. However, the only way to fully understand a production website’s performance is to collect statistics and data from the running system.

Warning

Running the server and the benchmarking software on the same computer will not include networking latency and overheads. It will be less realistic because the same CPU performs the request and response.

The JavaScript Performance API also provides high-resolution timers suitable for small benchmarks without any external tools.

The following code demonstrates how to obtain a high-resolution timer in Node.js:

// Get the JavaScript Performance API
let { performance } = require('perf_hooks');

// Get the start time
let start = performance.now();

... // perform a slow and complex task

// Print the number of milliseconds elapsed
let elapsed = performance.now() - start;
console.log(elapsed);

What contributes to low performance?

The following table, by Brendan Greg ^[1] lists the time to complete operations on a computer. It also includes a scaled version of the same numbers to provide an intuition of the differences involved.

Event	Latency	Scaled
1 CPU cycle	0.3 ns	1s
Level 1 cache access	0.9 ns	3s
Level 2 cache access	2.8 ns	9s
Level 3 cache access	12.9 ns	43s
Main memory access (DRAM, from CPU)	120 ns	6 min
Solid-state disk I/O (flash memory)	50–150 μs	2–6 days
Rotational disk I/O	1–10 ms	1–12 months
Internet: San Francisco to New York	40 ms	4 years
Internet: San Francisco to United Kingdom	81 ms	8 years
Internet: San Francisco to Australia	183 ms	19 years
TCP packet retransmit	1–3 s	105–317 years
OS virtualization system reboot	4 s	423 years
SCSI command time-out	30 s	3 millennia
Hardware (HW) virtualization system reboot	40 s	4 millennia
Physical system reboot	5 min	32 millennia

Event

Latency

Scaled

1 CPU cycle

0.3 ns

1s

Level 1 cache access

0.9 ns

3s

Level 2 cache access

2.8 ns

9s

Level 3 cache access

12.9 ns

43s

Main memory access (DRAM, from CPU)

120 ns

6 min

Solid-state disk I/O (flash memory)

50–150 μs

2–6 days

Rotational disk I/O

1–10 ms

1–12 months

Internet: San Francisco to New York

40 ms

4 years

Internet: San Francisco to United Kingdom

81 ms

8 years

Internet: San Francisco to Australia

183 ms

19 years

TCP packet retransmit

1–3 s

105–317 years

OS virtualization system reboot

4 s

423 years

SCSI command time-out

30 s

3 millennia

Hardware (HW) virtualization system reboot

40 s

4 millennia

Physical system reboot

5 min

32 millennia

These statistics are continually improving with technology advances, so they are only approximations. However, as a general guide, these numbers help programmers design better systems.

This table has important implications. If performance is critical, it is easy to justify enormous amounts of computation to avoid reading from a disk drive. ^[2]

Exercise: Benchmarking your computer

Solve the following problems using Node.js and time the results:

Add together all the integers from one to one thousand
Write the numbers one to one thousand to a file
Write the numbers one to one thousand to a file, flushing pending changes to the disk drive after each write (hint: you can flush using fs.fsync(…) or fs.fsyncSync(…))
Perform one thousand HTTP requests to a simple web API running on your computer

How much variation is there between the time it takes to run these tasks? Can you explain the results in terms of the latency table?