Cross Stitching: Elegant Concurrency Patterns for JavaScript

This post first appeared on the Big Nerd Ranch blog.

"JavaScript is single-threaded, so it doesn't scale. JavaScript is a toy language because it doesn't support multithreading." Outside (and inside) the web community, statements like these are common.

And in a way, it's true: JavaScript’s event loop means your program does one thing at a time. This intentional design decision shields us from an entire class of multithreading woes, but it has also birthed the misconception that JavaScript can’t handle concurrency.

But in fact, JavaScript's design is well-suited for solving a plethora of concurrency problems without succumbing to the "gotchas" of other multithreaded languages. You might say that JavaScript is single-threaded… just so it can be multithreaded!

Recap: Concurrency

You may want to do some homework if "concurrency" and "parallelism" are new to your vocabulary. TL;DR: for simple programs, we usually write "sequential" or ("serial") code: one step executes at a time, and must complete before the next step begins. If JavaScript could perform a "blocking" AJAX request with ajaxSync(), serial code might look like this:

console.log('About to make a request.');
let json = ajaxSync('https://api.google.com/search.json');
console.log(json);
console.log('Finished the request.');

/*
  => About to make a request.
  ... AJAX request runs ...
  ... a couple seconds later ...
  ... AJAX request finishes ...
  => { all: ['the', 'things'] }
  => Finished the request.
*/

Until the AJAX request completes, JavaScript pauses (or "blocks") any lines below from executing. In contrast, concurrency is when the execution of one series of steps can overlap another series of steps. In JavaScript, concurrency is often accomplished with async Web APIs and a callback:

console.log('About to make a request.');
ajaxAsync('https://api.google.com/search.json', json => {
  console.log(json);
  console.log('Finished the request.');
});
console.log('Started the request.');

/*
  => About to make a request.
  ... AJAX request runs in the background ...
  => Started the request.
  ... a couple seconds later ...
  ... AJAX requests finishes ...
  => { all: ['the', 'things'] }
  => Finished the request.
*/

In this second version, the AJAX request only "blocks" the code inside the callback (logging the AJAX response), but the JavaScript runtime will go on executing lines after the AJAX request.

Recap: Event Loop

The JavaScript runtime uses a mechanism, called the "event loop," to keep track of all in-progress async operations so it can notify your program when an operation finishes. If you are unfamiliar with the event loop, check out Philip Robert's exceptional 20 minute overview from ScotlandJS: "Help, I'm stuck in an event-loop."

Thanks to the event loop, a single thread can perform an admirable amount of work concurrently. But why not just reach for multithreading?

Software is harder to write (and debug) when it constantly switches between different tasks through multithreading. So unlike many languages, JavaScript finishes one thing at a time—a constraint called "run-to-completion"—and queues up other things to do in the background. Once the current task is done, it grabs the next chunk of work off the queue and executes to completion.

Since the JavaScript runtime never interrupts code that is already executing on the call stack, you can be sure that shared state (like global variables) won't randomly change mid-function—reentrancy isn't even a thing! Run-to-completion makes it easy to reason about highly concurrent code, for which reason Node.js is so popular for backend programming.

Although your JavaScript code is single-threaded and only does one thing at a time, the JavaScript Runtime and Web APIs are multithreaded! When you pass a callback function to setTimeout() or start an AJAX request with fetch(), you are essentially spinning up a background thread in the runtime. Once that background thread completes, and once the current call stack finishes executing, your callback function is pushed onto the (now empty) call stack and run-to-completion. So your JavaScript code itself is single-threaded, but it orchestrates legions of threads!

However, we need some patterns to write concurrent code that is performant and readable.

Recap: Promise Chaining

Suppose we are building a media library app in the browser and are writing a function called updateMP3Meta() that will read in an MP3 file, parse out some ID3 metadata (e.g. song title, composer, artist) and update a matching Song record in the database. Assuming the read(), parseMP3() and Song.findByName() functions return Promises, we could implement it like this:

let read     = (path) => { ... }; // returns a Promise
let parseMP3 = (file) => { ... }; // returns a Promise
let Song = {
  findByName(name) { ... } // returns a Promise
};

let updateMP3Meta = (path) => {
  return read(path)
    .then(file => {
      return parseMP3(file).then(meta => {
        return Song.findByName(file.name).then(song => {
          Object.assign(song, meta);
          return song.save();
        });
      });
    });
};

It does the job, but nested .then() callbacks quickly turn into callback hell and obscure intent… and bugs. We might try using Promise chaining to flatten the callback chain:

let updateMP3Meta = (path) => {
  return read(path)
    .then(file => parseMP3(file))
    .then(meta => Song.findByName(file.name))
    .then(song => {
      Object.assign(song, meta);
      return song.save();
    });
};

This reads nicely, but unfortunately it won't work: we can't access the file variable from the second .then() callback, nor meta from the third .then() anymore! Promise chaining can tame callback hell, but only by forfeiting JavaScript's closure superpowers. It's hardly ideal—local variables are the bread-and-butter of state management in functional programming.

Recap: Async Functions

Luckily, ES2017 async functions merge the benefits of both approaches. Rewriting our updateMP3Meta() as an async function yields:

let updateMP3Meta = async (path) => {
  let file = await read(path);
  let meta = await parseMP3(file);
  let song = await Song.findByName(file.name);
  Object.assign(song, meta);
  return song.save();
};

Hurray! async functions give us local scoping back without descending into callback hell.

However, updateMP3Meta() unnecessarily forces some things to run serially. In particular, MP3 parsing and searching the database for a matching Song can actually be done in parallel; but the await operator forces Song.findByName() to run only after parseMP3() finishes.

Working in Parallel

To get the most out of our single-threaded program, we need to invoke JavaScript's event loop superpowers. We can queue two async operations and wait for both to complete:

let updateMP3Meta = (path) => {
  return read(path)
    .then(file => {
      return Promise.all([
        parseMP3(file),
        Song.findByName(file.name)
      ]);
    })
    .then(([meta, song]) => {
      Object.assign(song, meta);
      return song.save();
    });
};

We used Promise.all() to wait for concurrent operations to finish, then aggregated the results to update the Song. Promise.all() works just fine for a few concurrent spots, but code quickly devolves when you alternate between chunks of code that can be executed concurrently and others that are serial. This intrinsic ugliness is not much improved with async functions:

let updateMP3Meta = async (path) => {
  let file = await read(path);
  let metaPromise = parseMP3(file);
  let songPromise = Song.findByName(file.name);

  let meta = await metaPromise;
  let song = await songPromise;

  Object.assign(song, meta);
  return song.save();
};

Instead of using an inline await, we used [meta|song]Promise local variables to begin an operation without blocking, then await both promises. While async functions make concurrent code easier to read, there is an underlying structural ugliness: we are manually telling JavaScript what parts can run concurrently, and when it should block for serial code. It's okay for a spot or two, but when multiple chunks of serial code can be run concurrently, it gets incredibly unruly.

We are essentially deriving the evaluation order of a dependency tree… and hardcoding the solution. This means "minor" changes, like swapping out a synchronous API for an async one, will cause drastic rewrites. That's a code smell!

Real Code

To demonstrate this underlying ugliness, let's try a more complex example. I recently worked on an MP3 importer in JavaScript that involved a fair amount of async work. (Check out my blog post or the parser source code if you're interested in working with binary data and text encodings.)

The main function takes in a File object (from drag-and-drop), loads it into an ArrayBuffer, parses MP3 metadata, computes the MP3's duration, creates an Album in IndexedDB if one doesn't already exist, and finally creates a new Song:

import parser from 'id3-meta';
import read from './file-reader';
import getDuration from './duration';
import { mapSongMeta, mapAlbumMeta } from './meta';
import importAlbum from './album-importer';
import importSong from './song-importer';

export default async (file) => {
  // Read the file
  let buffer = await read(file);

  // Parse out the ID3 metadata
  let meta = await parser(file);
  let songMeta = mapSongMeta(meta);
  let albumMeta = mapAlbumMeta(meta);

  // Compute the duration
  let duration = await getDuration(buffer);

  // Import the album
  let albumId = await importAlbum(albumMeta);

  // Import the song
  let songId = await importSong({
    ...songMeta, albumId, file, duration, meta
  });

  return songId;
};

This looks straightforward enough, but we're forcing some async operations to run sequentially that can be executed concurrently. In particular, we could compute getDuration() at the same time that we parse the MP3 and import a new album. However, both operations will need to finish before invoking importSong().

Our first try might look like this:

export default async (file) => {
  // Read the file
  let buffer = await read(file);

  // Compute the duration
  let durationPromise = getDuration(buffer);

  // Parse out the ID3 metadata
  let metaPromise = parser(file);
  let meta = await metaPromise;

  let songMeta = mapSongMeta(meta);
  let albumMeta = mapAlbumMeta(meta);

  // Import the album
  let albumIdPromise = importAlbum(albumMeta);

  let duration = await durationPromise;
  let albumId = await albumIdPromise;

  // Import the song
  let songId = await importSong({
    ...songMeta, albumId, file, duration, meta
  });

  return songId;
};

That took a fair amount of brain tetris to get the order of awaits right: if we hadn't moved getDuration() up a few lines in the function, we would have created a poor solution since importAlbum() only depends on albumMeta, which only depends on meta. But this solution is still suboptimal! getDuration() depends on buffer, but parser() could be executing at the same time as read(). To get the best solution, we would have to use Promise.all() and .then()s.

To solve the underlying problem without evaluating a dependency graph by hand, we need to define groups of serial steps (which execute one-by-one in a blocking fashion), and combine those groups concurrently.

What if there was a way to define such a dependency graph that's readable, doesn't break closures, doesn't resort to .then(), and doesn't require a library?

Async IIFEs

That's where async IIFEs come in. For every group of serial (dependent) operations, we'll wrap them up into a micro API called a "task":

let myTask = (async () => {
  let other = await otherTask;
  let result = await doCompute(other.thing);
  return result;
})();

Since all async functions return a Promise, the myTask local variable contains a Promise that will resolve to result. I prefer to call these *Task instead of *Promise. Inside the async IIFE, operations are sequential, but outside we aren't blocking anything. Furthermore, inside a task we can wait on other tasks to finish, like otherTask, which could be another async IIFE.

Let's turn the getDuration() section into a task called durationTask:

let durationTask = (async () => {
  let buffer = await readTask;
  let duration = await getDuration(buffer);
  return duration;
})();

Since these tasks are defined inline, they have access to variables in the outer closure, including other tasks!

Refactoring into Async Tasks

Let's refactor the entire importer with async IIFEs, or "tasks":

export default async (file) => {
  // Read the file
  let readTask = read(file);

  // Parse out the ID3 metadata
  let metaTask = (async () => {
    let meta = await parser(file);
    let songMeta = mapSongMeta(meta);
    let albumMeta = mapAlbumMeta(meta);
    return { meta, songMeta, albumMeta };
  })();

  // Import the album
  let albumImportTask = (async () => {
    let { albumMeta } = await metaTask;
    let albumId = await importAlbum(albumMeta);
    return albumId;
  })();

  // Compute the duration
  let durationTask = (async () => {
    let buffer = await readTask;
    let duration = await getDuration(buffer);
    return duration;
  })();

  // Import the song
  let songImportTask = (async () => {
    let albumId = await albumImportTask;
    let { meta, songMeta } = await metaTask;
    let duration = await durationTask;

    let songId = await importSong({
      ...songMeta, albumId, file, duration, meta
    });

    return songId;
  })();

  let songId = await songImportTask;

  return songId;
};

Now reading the file, computing duration, parsing metadata and database querying will automatically run concurrently or serially—we were even able to leave getDuration() in its original spot! By declaring tasks and awaiting them inside other tasks, we defined a dependency graph for the runtime and let it discover the optimal solution for us.

Suppose we wanted to add another step to the import process, like retrieving album artwork from a web service:

// Look up album artwork from a web service
let albumArtwork = await fetchAlbumArtwork(albumMeta);

Prior to the async IIFE refactor, adding this feature would have triggered a lot of changes throughout the file, but now we can add it with just a small isolated chunk of additions!

+// Look up album artwork from a web service
+let artworkTask = (async () => {
+  let { albumMeta } = await metaTask;
+  let artwork = await fetchAlbumArtwork(albumMeta);
+  return artwork;
+})();

 // Import the album
 let albumImportTask = (async () => {
+  let artwork = await artworkTask;
   let { albumMeta } = await metaTask;
-  let albumId = await importAlbum(albumMeta);
+  let albumId = await importAlbum({ artwork, ...albumMeta });
   return albumId;
 })();

Tasks are declarative, so managing concurrent vs. serial execution order becomes an "execution detail" instead of an "implementation detail"!

What if we revamped our parser() function so it could synchronously parse an ArrayBuffer instead of a File object? Before this would have triggered a cascade of line reordering, but now the change is trivial:

 // Parse out the ID3 metadata
 let metaTask = (async () => {
+  let buffer = await readTask;
-  let meta = await parser(file);
+  let meta = parser(buffer);
   let songMeta = mapSongMeta(meta);
   let albumMeta = mapAlbumMeta(meta);
   return { meta, songMeta, albumMeta };
 })();

Objections

It's tempting to take shortcuts and solve the dependency graph yourself. For example, after our changes to parser() above, all of the tasks depend on the file being read in, so you could block the entire function with await read(file) to save a few lines. However, these areas are likely to change, and organizing into serial tasks provides other benefits: these micro APIs make it is easier to read, debug, extract and reason about a complex chunk of concurrency.

Since we wrapped these tasks into async IIFEs, why not extract them into dedicated functions? For the same reason we couldn't use Promise chaining: we have to give up nested closures and lexically scoped variables. Extracting tasks into top level functions also begs a design question: if all these operations were synchronous, would we still perform this extraction?

If you find yourself extracting async functions (as we did with importAlbum() and importSong()) because of their complexity or reusability, bravo! But ultimately, design principles for breaking down functions should be independent of whether the code is async vs. sync.

Also, splitting functions or moving them too far from their context makes code harder to grasp, as Josh discusses in his post about extracting methods.

More to Come

Functional programming is well-suited to multithreading because it minimizes shared state and opts for local variables as the de facto state mechanism. And thanks to JavaScript's event loop, we can deal with shared state by merging results inside a single thread.

Next time, we'll examine functional patterns for throttling concurrency on a single thread, then wrap up with techniques for efficiently managing a cluster of Web Workers… without worrying a shred about "thread safety."

Jonathan Martin

Jonathan Martin

Globe-trotting web developer, instructor, international speaker and fine art landscape photographer from Atlanta, GA.

Read More