I’m weary of always comparing the web to so-called “native” platforms like Android and iOS. The web is streaming, meaning it has none of the resources locally available when you open an app for the first time. This is such a fundamental difference, that many architectural choices from native platforms don’t easily apply to the web — if at all.
But regardless of where you look, multithreading is used everywhere. iOS empowers developers to easily parallelize code using Grand Central Dispatch, Android does this via their new, unified task scheduler WorkManager and game engines like Unity have job systems. The reason for any of these platforms to not only support multithreading, but making it as easy as possible is always the same: Ensure your app feels great.
In this article I’ll outline my mental model why multithreading is important on the web, I’ll give you an introduction to the primitives that we as developers have at our disposal, and I’ll talk a bit about architectures that make it easy to adopt multithreading, even incrementally.
The Problem Of Unpredictable Performance
The goal is to keep your app smooth and responsive. Smooth means having a steady and sufficiently high frame rate. Responsive means that the UI responds to user interactions with minimal delay. Both of these are key factors in making your app feel polished and high-quality.
According to RAIL, being responsive means reacting to a user’s action in under 100ms, and being smooth means shipping a stable 60 frames per second (fps) when anything on the screen is moving. Consequently, we as developers have 1000ms/60 = 16.6ms to produce each frame, which is also called the “frame budget”.
I say “we”, but it’s really the browser that has 16.6ms to do everything required to render a frame. Us developers are only directly responsible for one part of the workload that the browser has to deal with. That work consists of (but is not limited to):
Detecting which element the user may or may not have tapped;
firing the corresponding events;
and compositing those layers into the final image the user sees on screen;
(and more …)
Quite a lot of work.
At the same time, we have a widening performance gap. The top-tier flagship phones are getting faster with every new generation that’s released. Low-end phones on the other hand are getting cheaper, making the mobile internet accessible to demographics that previously maybe couldn’t afford it. In terms for performance, these phones have plateaued at the performance of a 2012 iPhone.
Note: RAIL has been a guiding framework for 6 years now. It’s important to note that 60fps is really a placeholder value for whatever the native refresh rate of the user’s display is. For example, some of the newer pixel phones have a 90Hz screen and the iPad Pro has a 120Hz screen, reducing the frame budget to 11.1ms and 8.3ms respectively.
To complicate things further, there is no good way to determine the refresh rate of the device that your app is running on apart from measuring the amount of time that elapses between requestAnimationFrame() callbacks.*
The usual advice here is to “chunk your code” or its sibling phrasing “yield to the browser”. The underlying principle is the same: To give the browser a chance to ship the next frame you break up the work your code is doing into smaller chunks, and pass control back to the browser to allow it to do work in-between those chunks.
There are multiple ways to yield to the browser, and none of them are great. A recently-proposed task scheduler API aims to expose this functionality directly. However, even if we had an API for yielding like await yieldToBrowser() (or something of the sort), the technique itself is flawed: To make sure you don’t blow through your frame budget, you need to do work in small enough chunks that your code yields at least once every frame.
At the same time, code that yields too often can cause the overhead of scheduling tasks to become a net-negative influence on your app’s overall performance. Now combine that with the unpredictable performance of devices, and we have to arrive at the conclusion that there is no correct chunk size that fits all devices. This is especially problematic when trying to “chunk” UI work, since yielding to the browser can render partially complete interfaces that increase the total cost of layout and paint.
const worker = new Worker(“./worker.js”);
Before we get more into that, it’s important to note that Web Workers, Service Workers and Worklets are similar, but ultimately different things for different purposes:
A SharedWorker is a special Web Worker, in that multiple tabs or windows of the same origin can reference the same SharedWorker. The API is pretty much impossible to polyfill and has only ever been implemented in Blink, so I won’t be paying any attention to it in this article.
While Workers are the “thread” primitive of the web, they are very different from the threads you might be used to from C++, Java & co. The biggest difference is that the required isolation means workers don’t have access to any variables or code from the page that created them or vice versa. The only way to exchange data is through message-passing via an API called postMessage, which will copy the message payload and trigger a message event on the receiving end. This also means that Workers don’t have access to the DOM, making UI updates from a worker impossible — at least without significant effort (like AMP’s worker-dom).
Support for Web Workers is nearly universal, considering that even IE10 supported them. Their usage, on the other hand, is still relatively low, and I think to a large extent that is due to the unusual ergonomics of Workers.
Concurrency Model #1: Actors
My personal preference is to think of Workers like Actors, as they are described in the Actor Model. The Actor Model’s most popular incarnation is probably in the programming language Erlang. Each actor may or may not run on a separate thread and fully owns the data it is operating on. No other thread can access it, making rendering synchronization mechanisms like mutexes unnecessary. Actors can only send messages to each other and react to the messages they receive.
As an example, I often think of the main thread as the actor that owns the DOM and consequently all the UI. It is responsible for updating the UI and capturing input events. Another factor could be in charge of the app’s state. The DOM actor converts low-level input events into app-level semantic events and sends them to the state actor. The state actor changes the state object according to the event it has received, potentially using a state machine or even involving other actors. Once the state object is updated, it sends a copy of the updated state object to the DOM actor. The DOM actor now updates the DOM according to the new state object. Paul Lewis and I once explored actor-centric app architecture at Chrome Dev Summit 2018.
Of course, this model doesn’t come without its problems. For example, every message you send needs to be copied. How long that takes not only depends on the size of the message but also on the device the app is running on. In my experience, postMessage is usually “fast enough”, but there are certain scenarios where it isn’t. Another problem is to strike the balance between moving code to a worker to free up the main thread, while at the same time having to pay the cost of communication overhead and the worker being busy with running other code before it can respond to your message. If done without care, workers can negatively affect UI responsiveness.
Additionally, postMessage is a fire-and-forget messaging mechanism with no built-in understanding of request and response. If you want to employ a request/response mechanism (and in my experience most app architectures inevitably lead you there), you’ll have to build it yourself. That’s why I wrote Comlink, which is a library that uses an RPC protocol under the hood to make it seem like the objects from a worker are accessible from the main thread and vice versa. When using Comlink, you don’t have to deal with postMessage at all. The only artifact is that due to the asynchronous nature of postMessage, functions don’t return their result, but a promise for it instead. In my opinion, this gives you the best of the Actor Model and Shared Memory Concurrency.
Concurrency Model #2: Shared Memory
A SAB, like an ArrayBuffer, is a linear chunk of memory that can be manipulated using Typed Arrays or DataViews. If a SAB is sent via postMessage, the other end does not receive a copy of the data, but a handle to the exact same memory chunk. Every change done by one thread is visible to all other threads. To allow you to build your own mutexes and other concurrent data structures, Atomics provide all sorts of utilities for atomic operations or thread-safe waiting mechanisms.
In 2019, my team and I published PROXX, a web-based Minesweeper clone that was specifically targeting feature phones. Feature phones have a small resolution, usually no touch interface, an underpowered CPU, and no proper GPU to speak of. Despite all these limitations, they are increasingly popular as they are sold for an incredibly low price and they include a full-fledged web browser. This opens up the mobile web to demographics that previously couldn’t afford it.
To make sure that the game was responsive and smooth even on these phones, we embraced an Actor-like architecture. The main thread is responsible for rendering the DOM (via preact and, if available, WebGL) and capturing UI events. The entire app state and game logic is running in a worker which determines whether you just stepped on a mine black hole and, if not, how much of the game board to reveal. The game logic even sends intermediate results to the UI thread to give the user a continuous visual update.
Workers are a key tool to keep the main thread responsive and smooth by preventing any accidentally long-running code from blocking the browser to render. Due to the inherent asynchronous nature of communicating with a worker, the adoption of workers requires some architectural adjustments in your web app, but as a pay off you will have an easier time supporting the massive spectrum of devices that the web is accessed from.
You should make sure to adopt an architecture that lets you move code around easily so you can measure the performance impact of off-main-thread architecture. The ergonomics of web workers have a bit of a learning curve but the most complicated parts can be abstracted away with libraries like Comlink.
“The Main Thread Is Overworked And Underpaid,” Surma, Chrome Dev Summit 2019 (Video)
“Green Energy Efficient Progressive Web Apps,” David, Microsoft DevBlogs
“Case Study: Moving A Three.js-Based WebXR App Off-Main-Thread,” Surma
“When Should You Be Using Workers?,” Surma
“Is postMessage Slow?,” Surma
There are some questions and thoughts that are raised quite often, so I wanted to preempt them and record my answer here.
Isn’t postMessage slow?
My core advice in all matters of performance is: Measure first! Nothing is slow (or fast) until you measure. In my experience, however, postMessage is usually “fast enough”. As a rule of thumb: If JSON.stringify(messagePayload) is under 10KB, you are at virtually no risk of creating a long frame, even on the slowest of phones. If postMessage does indeed end up being a bottleneck in your app, consider the following techniques:
Breaking your work into smaller pieces so that you can send smaller messages.
If the message is a state object of which only small parts have changed, send patches (diffs) instead of the whole object.
If you send a lot of messages, it can also be beneficial to batch multiple messages into one.
As a last resort, you can try switching to a numerical representation of your message and transferring an ArrayBuffers instead of sending an object-based message.
Which of these techniques is the right one depends on the context and can only be answered by measuring and isolating the bottleneck.
I want DOM access from the Worker.
This one I get a lot. In most scenarios, however, that just moves the problem. You are at risk of effectively creating a 2nd main thread, with all the same problems, just in a different thread. Making the DOM safe to access from multiple threads would require adding locks which would introduce a slowdown to DOM operations. This would probably hurt a lot of existing web apps.
Additionally, the lock-step model has benefits. It gives the browser a clear signal at what time the DOM is in a valid state that can be rendered to the screen. In a multi-threaded DOM world, that signal would be lost and we’d have to deal with partial renders or other artifacts.
I really dislike having to put code in a separate file for Workers.
Read more: smashingmagazine.com