How to share cached data with redis and a Cluster.fork() Node.js app

Hey,

Recently, I've been playing with internal micro services. I've was used to work with micro which was doing the work pretty well for little thing.

micro/micro
Micro is a distributed systems runtime for the Cloud and beyond - micro/micro
micro
Asynchronous HTTP microservices

Also recently I've come to work with fastify which is quite interesting as well. I love the internal hooks system but also the vast variety of data validation and feature it does offer out of the box.

fastify
Fast and low overhead web framework, for Node.js
fastify/fastify
Fast and low overhead web framework, for Node.js. Contribute to fastify/fastify development by creating an account on GitHub.

Also I have been used to work with in-memory caching while using Keyv.

keyv
Simple key-value storage with support for multiple backends
lukechilds/keyv
Simple key-value storage with support for multiple backends - lukechilds/keyv

Keyv is simple and also can be used with a huge variety of adapters/in-memory or not database such as redis, mongo, sqlite and so on.. I have been using Keyv as a main tool for managing small caching unit in my codebase, when I can easily identify and cache computed data that can be invalidate later on, it does save a little bit of compute, bandwidth and make your stuff work quicker when same internal data are requested multiple times.

npm install --save @keyv/redis
npm install --save @keyv/mongo
npm install --save @keyv/sqlite
npm install --save @keyv/postgres
npm install --save @keyv/mysql
Bunch of options working with Keyv

Most of my internal micro services are single threaded, but for some of them I make them run with each core or my machine. This is why I started using Cluster to spawn multiple threads of an internal service when needed.

Single threaded way

const fastify = require("fastify")({ logger: true })

const PORT = process.env.PORT || 1337

const start = async () => {
  try {
    await fastify.listen(PORT)
    fastify.log.info(`server listening on ${fastify.server.address().port}`)
  } catch (err) {
    fastify.log.error(err)
    process.exit(1)
  }
}

fastify.post("/", async (req, res) => {
  //My workflow comes here
})

start()
Single threaded Fastify server

Multi-threaded way

const cluster = require("cluster")
const os = require("os")
const numCPUs = os.cpus().length
const fastify = require("fastify")({ logger: true })

const PORT = process.env.PORT || 1337

const start = async () => {
  try {
    await fastify.listen(PORT)
    fastify.log.info(`server listening on ${fastify.server.address().port}`)
  } catch (err) {
    fastify.log.error(err)
    process.exit(1)
  }
}

fastify.post("/", async (req, res) => {
  //My workflow comes here
})

if (cluster.isMaster) {
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork()
  }
  cluster.on("exit", (worker) => log(`worker ${worker.process.pid} died`))
} else {
  start()
}
Multi-threaded Fastify server

What was the issue

The first question is how I was used to do caching ? Since I was not doing that much caching, I needed it quite quickly and did not have that huge amount of data to store (all of my data expire after few seconds or minutes TTL based). It's was a good thing to use the basic Keyv in-memory thing. This can be achieve that way:

const Keyv = require("keyv")
const ONE_SEC = 1000
const keyv = new Keyv({
  serialize: JSON.stringify,
  deserialize: JSON.parse,
})
/* Extra use of JSON serialize/deserialize, which is not a mandatory */

const workFlow = async () => {
  const key = "71f1fedd-1556-4a7d-9621-7e66a5088095"
  let myComputedValue = await keyv.get(key)
  if (!myComputedValue) {
    //do my work... and cache it
    //let myComputedValue = ...
    await keyv.set(key, myComputedValue, 60 * ONE_SEC)
  }
  //use myComputedValue value
}
Example of using Keyv in an app

So that way you'll have a classic workflow function which does naively trying to get from cache a computed value. If that value already has been cached before take it, otherwise if it never has been cached, get it the usual way and cache it afterward. However what happen if your load is spread through forks and each fork has it own context. You won't be able to share in-memory data easily without having specific mechanism to share data in-between. An easy going solution to this is an independent redis instance gathering your cache independently of your running processes.
So easiest way to setup a redis on a server was actually doing it through docker

https://hub.docker.com/_/redis/

Everything is explained into the hub but shortly

  1. Make sure docker is installed on your machine
  2. run redis docker that way

docker run --name redis -p host:external_port:6379 -d redis

Pay attention if you are exposing your redis instance publicly make sure it has the protected mode on and user/pass authentication. Check Security notes from docker's hub.

That way you'll be able to use Keyv and point it that way to your redis instance.
So remember:

npm install --save @keyv/redis

Then just reference your redis instance into Keyv

const Keyv = require("keyv")
const keyv = new Keyv("redis://localhost:7001", {
  serialize: JSON.stringify,
  deserialize: JSON.parse,
})
Keyv with serialization/deserialization and redis adapter

The advantage of working with Keyv is that you won't need to change your logic or workflow whatever the adapter your are using and this is very helpful. I was glad to enhance my caching logic, make it less thread independent and make it point to my redis instance in a blink of an eye 👀.

That's sit for today, hope you enjoyed it.