Node.js has been criticized a lot because of it design.
Coming from programming languages such as Java, C or Python, it seems strange that Node.js doesn’t have a direct access to threads. How do we run things concurrently?

Well, before Node.js 11 we could actually run concurrent/parallel code using the cluster module, as seen in this previous article.
But if we’re on a server that only have one core, how can we do?

In Node.js 11 we have the worker_thread module which allows us to spawn multiple threads on a single core. We could actually have used this module in Node.js 10 with the --experimental-worker flag, but with Node.js 11 we can finally avoid it!

A simple use case

Let’s say we need to create a file containing one million users with their first, middle and last name.
I’ve found this amazing GitHub repo which provides an array of first, middle and last names. We’re gonna use these JSON files in our project:
github.comdominictarr/random-name

Let’s create a new project with the following folder structure:

[root]
|
+------ main.js
|
+------ [data]
|       |
|       +-- first_name.json
|       +-- last_name.json
|       +-- middle_name.json
|
+------ [utils]
|       |
|       +-- index.js
|
|
+------ [output]
        |
        +-- data.txt

So let’s begin with the main.js file:

const fs                 = require("fs-extra");
const { getRandomIndex } = require("./utils");
const firstName          = require("./data/first_name.json");
const lastName           = require("./data/last_name.json");
const middleName         = require("./data/middle_name.json");

const limit              = 1000000;
const outputFile         = `${__dirname}/output/data.txt`;

(async () => {

  for (let i = 0; i < limit; i++) {

    const data = [firstName, middleName, lastName]
                 .map(getRandomIndex)
                 .concat("\n")
                 .join(" ");

    await fs.appendFile(outputFile, data);
  }

})();

As you can see, we’re using the fs-extra package. It works just like the fs module, but returns a promise for every function.
It solves a big problem that comes with that kind of operations: memory usage. In fact, if we try to open too many files with Node.js, it will spawn an error and will kill the main process, because it’s not able to handle all that files opened at the same time (and runs out of memory).
Inside our for loop, the await will stop the loop until the operation will be concluded: that way, we’ll always have only one opened file per iteration.

Let’s see the utils/index.js file:

function getRandomIndex(array) {
  return array[Math.floor(Math.random() * array.length)];
}

module.exports = {
  getRandomIndex
}

Here we’re just getting a random value out of any array. It’s pretty useful when we need to get a random first, middle or last name.

Running the code above on my machine (2016 MacBook Pro, 2,7 GHz Intel Core i7, 16GB RAM) it takes 3 minutes and 32 seconds to complete the task.
Let’ see how to improve performances using Node.js worker threads!

Going Multithread

In order to adopt a multithreaded approach to this simple program, we need to make some changes inside our codebase. Let’s begin with the main.js file:

const { Worker }     = require("worker_threads");
const logUpdate      = require("log-update");

const limit          = 1000000;
const threads        = 10;
const namesPerThread = limit / threads;
const outputFile     = `${__dirname}/output/data.txt`;

let names = [...Array(threads)].fill(0);

for (let i = 0; i < threads; i++) {

  const port = new Worker(require.resolve("./worker.js"), {
    workerData: { namesPerThread, outputFile }
  });

  port.on("message", (data) => handleMessage(data, i));
  port.on("error",   (e)    => console.log(e));
  port.on("exit",    (code) => console.log(`Exit code: ${code}`));

}

First of all, we need to import the Worker class from the worker_threads module. This will allow us to spawn a new worker whenever we want.
Then we can set a number of threads to be spawned: in that case I decided to spawn just 10 threads.
We need then to calculate how many names should every thread generate; that’s pretty easy, we just divide the total number of desired name by the number of threads.
For each thread, we need to spawn a new Worker. As you can see, its code will be located into the worker.js file.
We’ll send a payload to our new Worker telling how many names should it create and where to put them (the output file).
We’ll keep listening for errors and exits, so we’ll keep track of what’s going on inside our workers.

Let’s see now how the worker.js file behaves:

const fs                         = require("fs-extra");
const { parentPort, workerData } = require("worker_threads");
const { getRandomIndex }         = require("./utils");
const firstName                  = require("./data/first_name.json");
const lastName                   = require("./data/last_name.json");
const middleName                 = require("./data/middle_name.json");

const { namesPerThread, outputFile } = workerData;

(async () => {

  for (let i = 0; i < namesPerThread; i++) {

    const data = [firstName, middleName, lastName]
                 .map(getRandomIndex)
                 .concat("\n")
                 .join(" ");

    await fs.appendFile(outputFile, data);

    parentPort.postMessage(data);
  }

})();

It is basically the same code of the original main.js file. Everytime we store a new name, we send it back to the main thread, so it will keep track of what’s going on inside our threads.

The result? We did the same operation in just 1 minute and 24 seconds! 37% faster than the single threaded version!

Other use cases

Worker Threads are an amazing solution when you need to perform a CPU intensive task. They makes filesystem-related operations way faster and helps a lot when you need to perform any kind of concurrent operation. Greatest thing, as we said before, they also work on single core machines, so they can promise a better performance on any server.

I actually used Worker Threads during a massive upload operation, where I had to check millions of users and store their data into a database. Adopting a multithread approach, the operation has been about 10 times faster than its single-threaded counterpart.

I also used Worker Threads for image manipulation. I had to build three thumbnails (with different dimensions) out of a single image, and a multithreaded approach helped me again saving time for that operation.

As you can see, the Worker Thread module can help you a lot improving performances, so let me know if it helped you in some way!

Did you like this article? Consider becoming a Patron!

This article is CC0 1.0 (Public Domain) licensed.