Software – Pal's Blog
Categories
Software

Pitfalls of debounced functions

The debounce technique is used to delay processing an event or state until it stops firing for a period of time. It’s best used for reactive functions that depend only a current state. A common use case is debouncing form validation – say you want to show “weak password” only if the user has stopped typing out a password.

Lodash’s debounce takes a callback to be debounced and wraps it such that the callback is invoked only once for each burst of invocations less than “n” milliseconds apart. It also provides a timeout in case you need a guarantee that an invocation eventually does occur.

Parameter aggregation

This “vanilla” debounce assumes that the end application state is unaffected by whether the callback is invoked once or multiple times. This is true if the debounced callback entirely overwrites the same piece of state in your application. In a password validator, the “password strength” state is recalculated each time. The password strength doesn’t depend on past values of any other state. That’s why the validator can be safely be debounced.

If you use debounce as a general-purpose optimization, you’ll find that this assumption is often false. A simple example is a settings page with multiple toggles that each map to a different setting in the database.

const updateSettings = debounce((diff/*: Partial<{
  username: string,
  password: string,
}>*/) => {
  fetch('/v1/api/settings', {
    method: 'POST'
    body: JSON.stringify(diff),
  })
})

Say the user changes their username and then password. The above debounced callback will save the last change − their password, but not their username. A correct implementation here would aggregate the modifications on each invocation and send the aggregate changes to the server at the end. I’ve seen this mistake quite a few times. For another practial example, a debounced undo / redo function will not behave the way you’d want it to (but that’s a not so subtle compared to the settings example!)

I propose an alternate debounce implementation that accepts an aggregator. The aggregator is a callback that is not debounced ands merges the arguments for each invocation in a lightweight fashion. The aggregator equivalent to no aggregation would be:

const no_aggregation = (previousArguments, arguments) => arguments

debounce(passwordValidator, {
  aggregator: no_aggregation, // or optional
});

But in the case of the settings updates, you would do a deep merge of each diff:

const updateSettings = debounce((diff/*: Partial<{
  username: string,
  password: string,
}>*/) => {
  fetch('/v1/api/settings', {
    method: 'POST'
    body: JSON.stringify(diff),
  })
}, {
  aggregate: (previousDiff, diff) => ({
    ...previousDiff,
    ...diff
  })
})

This should work for most use-cases. It’s important to make sure the debounced callback doesn’t read global state because that cannot be aggregated easily.

Partitioned debouncing

An interesting puzzle for me was when I added group selection for a 2D editor. I debounced sending new item positions to the server, which worked well for the single-item editor use-case:

const save = debounce((itemId, x, y) => {
  server.send('position-changed', { itemId, x, y })
})

However, when I enabled multiple item selection and dragged a group around − each item would end up in the wrong place (save for one). In hindsight, this “obviously” was because debouncing individual saves meant that each only 1 furniture would end up being saved at a time.

Debounce on individual saves causing

A way to solve this is debouncing saves per item instead of across all saves. Theoretically, we could’ve applied this to the settings example too − debouncing the saving of each setting individually (although you’d end up with multiple network requests)

A powerful debounce implementation could accept a “partitioning” callback that returns a string unique to contexts debounced individually. Somewhat as follows:

const save = debounce((itemId, x, y) => {
  server.send('position-changed', { itemId, x, y })
}, {
  partitionBy: (itemId) => itemId
})

The implementation would internally map each partition string to independent timers.

Another practical use-case for this this would be a message queue where you need to debounce messages partitioned by message type or user, so that they can be rate limited.

Debounce ≠ Async queue

Another misassumption I’ve seen is that debounced async callbacks won’t have concurrent executions.

Say, for example, that you are debouncing a toggle that opens a new tab. To open the tab, some data must be loaded asynchronously. The reverse is not true and the tab is closed synchronously. You’ve chosen to debounce this toggle to prevent double or triple clicks from borking the application.

Now, what happens when the user clicks on the toggle twice (but not double click)

  • First click, toggle on
  • Debounce timeout
  • Data starts being loaded
  • Second click, toggle off
  • Debounce timeout
  • Tab isn’t open yet, so second click does nothing
  • Data loaded, tab opened, & user annoyed 😛

The expected behavior was for the tab to close or not open after the second click. This issue is not related to debouncing − you need to either cancel the first async operation and if that’s not possible, each click needs to be processed off a queue. That way the second click is processed once the data has loaded.

This is a bad example, however, because it’s easy to simply cancel rendering a tab. In distributed applications, this isn’t necessarily the case because messages could be processed on some unknown server.

An async debounce would wait for the last invocation to resolve before doing another invocation. A rough implementation would be as follows:

let promise = null;
let queued = false;

function debounced_process() {
  if (!promise)
    let thisPromise;
    thisPromise = promise = process().then(() => {
      if (promise === thisPromise) // reset unless queued
        promise = null;
    });
  else if (!queued) {
    queued = true;
    let thisPromise;
    thisPromise = promise = promise.then(() => {
       queued = false;
       return process();
    }).then(() => {
      if (promise === thisPromise)
         promise = null;
    });
  } else {
    // We've already queued "process" to be called again after
    // the current invocation. We shouldn't queue it again.
  }
}

This specialized debounce is perhaps better than the vanilla debounce for long-running async tasks.

Categories
Software webdoc

Offline documentation with webdoc

Before going on a long flight, I download PDFs of reference documentation for whatever software library I will be programming with. Having the documentation handy means, I won’t get stuck on an unfamiliar edge case. It would be very convenient if documentation websites could be cached offline − and that’s why I added an offline storage option in webdoc. The documentation for PixiJS and melonJS can now be downloaded for offline use! I’ll walk you through how I did it − the technique can be replicated for any static website.

How is it done?

It’s done using a service worker!

A service worker is a script that acts as a proxy between a browser and the web server. It can intercept fetches done on the main thread and respond with a cached resource, or a computed value, or allow the request to continue the “normal” way to the web server. The service worker controls a cache storage and can decide to put or delete resources from its caches. Note that this “cache storage” is separate from the browser’s regular HTTP cache.

If your static website is hosted on a free service like GitHub Pages, being able to control the caching policy can be very handy. GitHub Pages’ caching policy sets a 10-minute TTL for all HTTP requests; this adds redundant downloads to repeat visits. A service worker can be leveraged to evict cached resources only when a web page has been modified.

A service worker runs in a separate thread from the page’s main thread. It’s stays active when a device is not connected to the Internet so it can serve cached pages even when it is offline!

webdoc’s caching policy

webdoc’s service worker uses two separate caches:

  1. The “main cache” holds the core assets of the website – react, react-dom, mermaid.js, material-ui icons. These assets have versioned URLs so they won’t need to be cache evicted. The main cache is simple and has no eviction policy.
  2. The “ephemeral cache” holds all assets that might be modified when documentation is regenerated. This is the generated HTML and webdoc’s own CSS / JS. To facilitate cache eviction, webdoc generates a md5 hash of its documentation manifest and inserts it into the website it generates.
    1. This hash is refetched on the main thread and stored in web storage on each page load.
    2. The service worker tags cached resources with the associated hash when they are downloaded. The tagging is done by appending a “x-manifest-hash” header to responses.
    3. A hash mismatch between web storage and a cached response effectuates a refetch from the web server, and the cache is updated with the new response.

Let’s dive into the code

Registration

const registration = await navigator.serviceWorker.register(getResourceURI("service-worker.js"));

The first step is to register the service worker so that the browser downloads and runs it. getResourceURI is a helper to locate a static resource in a webdoc site.

Before the main thread can communicate with the service worker, the browser must activate it so the second step is to wait for the registration to activate.

const waitOn = navigator.serviceWorker.controller ?
    Promise.resolve() :
    new Promise((resolve, reject) => {
      const worker = registration.installing ?? registration.waiting ?? registration.active;
      if (!worker) return reject(new Error("No worker found"));
      else {
        worker.onstatechange = (e) => {
          if (e.target.state === "active") {
            resolve();
          }
        };
      }
    });

// This hangs on the first page load because the browser doesn't
// activate the service worker until the second visit.
await waitOn;

navigator.serviceWorker.controller is what lets the main thread control and communicate with service workers. Its value is null until the service worker activates – which is signaled by the “statechange” event on the registration object.

Note that the service worker won’t activate on the first page load; the browser activates it on the second page load. That’s why it’s important to wait for the controller to become non-null.

Hash verification

Once the service worker is registered, the website hash must be downloaded and compared to what is in web storage. The local hash would be null after the service worker is registered for the first time, however; this means a hash mismatch would occur (which is desired)

    // https://github.com/webdoc-labs/webdoc/blob/a52570c22fc3161e1f19f4997eb96081d1ea9d34/core/webdoc-default-template/src/protocol/index.js#L47
    if (!APP_MANIFEST) {
      throw new Error("The documentation manifest was not exported to the website");
    }

    // Use webdoc's IndexedDB wrapper to pull out the manifest URL & hash
    const {manifest, manifestHash, version} = await this.db.settings.get(APP_NAME);
    // Download the latest hash from the server
    const {hash: verifiedHash, offline} = await webdocService.verifyManifestHash(manifestHash);

    // If the manifest URL or hash don't match, when we need to update IndexedDB
    // and send a message to the service worker!
    if (manifest !== APP_MANIFEST ||
          manifestHash !== verifiedHash ||
          version !== VERSION) {
      console.info("Manifest change detected, reindexing");
      await this.db.settings.update(APP_NAME, {
        manifest: APP_MANIFEST,
        manifestHash: verifiedHash,
        origin: window.location.origin,
        version: VERSION,
      });
      if (typeof APP_MANIFEST === "string") {
        this.postMessage({
          type: "lifecycle:init",
          app: APP_NAME,
          manifest: APP_MANIFEST,
        });
      }
    }

lifecycle:init

The service worker receives the lifecycle:init message on a hash mismatch and uses it to download the manifest data and recache the website if offline storage is enabled.

  case "lifecycle:init": {
    // Parse the message
    const {app, manifest} = (message: SwInitMessage);

    try {
      // Open the database & fetch the manifest concurrently
      const [db, response] = await Promise.all([
        webdocDB.open(),
        fetch(new URL(manifest, new URL(registration.scope).origin)),
      ]);
      const data = await response.json();

      // Dump all the hyperlinks in the manifest into IndexedDB. This is used by
      // "cachePagesOffline" to locate all the pages in the website that need
      // downloaded for offline use.
      await db.hyperlinks.put(app, data.registry);

      // Caches the entire website if the user has enabled offline storage
      const settings = await db.settings.get(app);
      if (settings.offlineStorage) await cachePagesOffline(app);
    } catch (e) {
      console.error("fetch manifest", e);
    }

    break;
  }

fetch

Now let’s walk through how webdoc caches resources on the website. The caching policy is what makes the website work when a user is offline, and it also makes the pages load instantly otherwise. The fetch event is intercepted by the service worker and a response is returned from the cache if available.

// https://github.com/webdoc-labs/webdoc/blob/a52570c22fc3161e1f19f4997eb96081d1ea9d34/core/webdoc-default-template/src/service-worker/index.js#L21
// Registers the "fetch" event handler
self.addEventListener("fetch", function(e: FetchEvent) {
  // Skip 3rd party resources like analytics scripts. This is because
  // the service worker can only fetch resources from its own origin
  if (new URL(e.request.url).origin !== new URL(registration.scope).origin) {
    return;
  }

The respondWith method on the event is used to provide a custom response for 1st party fetches. The caches global exposes the cache storage API used here.

  // https://github.com/webdoc-labs/webdoc/blob/a52570c22fc3161e1f19f4997eb96081d1ea9d34/core/webdoc-default-template/src/service-worker/index.js#L26
  e.respondWith(
    // Open the main & ephemeral cache together
    Promise.all([
      caches.open(MAIN_CACHE_KEY),
      caches.open(EPHEMERAL_CACHE_KEY),
    ]).then(async ([mainCache, ephemeralCache]) => {
      // Check main cache for a hit first - since we know hash verification
      // isn't required for versioned assets in that cache
      const mainCacheHit = await mainCache.match(e.request);
      if (mainCacheHit) return mainCacheHit;

      // Check the ephemeral cache for the resource and also pull out the hash
      // from IndexedDB
      const ephemeralCacheHit = await ephemeralCache.match(e.request);
      const origin = new URL(e.request.url).origin;
      const db = await webdocDB.open();
      const settings = await db.settings.findByOrigin(origin);

      if (settings && ephemeralCacheHit) {
        // Get the tagged hash on the cached response. Remember responses are
        // tagged using the x-manifest-hash header
        const manifestHash = ephemeralCacheHit.headers.get("x-manifest-hash");

        // If the hash matches, great!
        if (settings.manifestHash === manifestHash) return ephemeralCacheHit;
        // Otherwise continue and fetch the resource again    
        else {
          console.info("Invalidating ", e.request.url, " due to bad X-Manifest-Hash",
            `${manifestHash} vs ${settings.manifestHash}`);
        }
      }

If the main & ephemeral cache don’t get hit, then the resource is fetched from the web server by the service worker. A fetchVersioned helper is used to add the “x-manifest-hash” header to the returned response. The response is put into the appropriate cache so a future page load doesn’t cause a download.

// https://github.com/webdoc-labs/webdoc/blob/a52570c22fc3161e1f19f4997eb96081d1ea9d34/core/webdoc-default-template/src/service-worker/index.js#L49
      try {
        // Fetch from the server and add "x-manifest-hash" header to response
        const response = await fetchVersioned(e.request);

        // Check if the main cache can cache this response
        if (VERSIONED_APP_SHELL_FILES.some((file) => e.request.url.endsWith(file))) {
          await mainCache.put(e.request, response.clone());
        // Check if the ephemeral cache can hold the response (all HTML pages are included)
        } else if (
          settings && (
            EPHEMERAL_APP_SHELL_FILES.some((file) => e.request.url.endsWith(file)) ||
          e.request.url.endsWith(".html"))) {
          await ephemeralCache.put(e.request, response.clone());
        }

        return response;
      } catch (e) {
        // Finish with cached response if network offline, even if we know it's stale
        console.error(e);
        if (ephemeralCacheHit) return ephemeralCacheHit;
        else throw e;
      }
    }),
  );
});

Note that at the bottom, a catch block is used to return a cached response even if we know the hash didn’t match. This occurs when the resource is stale but the user isn’t connected to the Internet so downloading the latest resource from the web server isn’t possible.


webdoc is the only documentation generator with offline storage that I know of. It supports JavaScript and TypeScript codebases. Give it a try and let me know what you think!

Categories
Software

How I wrote Node.js bindings for GNU source-highlight

GNU source-highlight is a C++ library for highlighting source code in several different languages and output formats. It reads from “language definition” files to lexically analyze the source code, which means you can add support for new languages without rebuilding the library. Similarly, “output format” specification files are used to generate the final document with the highlighted code.

I found GNU source-highlight while looking for a native highlighting library to use in webdoc. I added sources publishing support to webdoc 1.5.0, but Highlight.js more than 6xed webdoc’s single thread execution time; after running Highlight.js with 18 worker threads, the final result was still twice the original execution time. So I decided to finally learn how to write Node.js bindings for native C++ libraries!

The Node.js bindings are available on npm – source-highlight. Here, I want to share the process of creating Node.js bindings for native libraries.

Node Addon API – Creating the C++ addon

The node-addon-api is the official module for writing native addons for Node.js in C++. The entry point for addons is defined by registering an initialization function using NODE_API_MODULE:

#include <napi.h>

Napi::Object Init (Napi::Env env, Napi::Object exports) {
    // TODO: Initialize module and expose APIs to JavaScript

    return exports;
}

NODE_API_MODULE(my_module_name, Init);

Here, the Init function accepts two parameters:

  • env – This represents the context of the Node.js runtime you are working with.
  • exports – This is an opaque handle to module.exports. You can set properties on this object to expose APIs to JavaScript code.

To expose a hello-world “function”, you can set a property on the exports object to a Napi::Function.

// hello_world.cc

// Disable C++ exceptions since we are not using. Otherwise, we'd need to configure
// node-gyp to install them when compiling.
#define NAPI_DISABLE_CPP_EXCEPTIONS

#include <iostream>
#include <napi.h>

// Our C++ hello-world function. It takes a single JavaScript
// string and outputs "Hello <string>" to the standard output.
//
// It returns true on success, false otherwise.
Napi::Value helloWorld(const Napi::CallbackInfo& info) {
    // Extract our context early so we can use it to create primitive values.
    Napi::Env env = info.Env();

    // Return false if no arguments were passed.
    if (info.Length() < 1) {
        std::cout << "helloWorld expected 1 argument!";
        return Napi::Boolean::New(env, false);
    }

    // Return false if the first argument is not a string.
    if (!info[0].IsString()) {
        std::cout << "helloWorld expected string argument!";
        return Napi::Boolean::New(env, false);
    }

    // Convert the first argument into a Napi::String,
    // then cast it into a std::string.
    std::string msg = (std::string) info[0].As<Napi::String>();

    // Output our "hello world" message to the standard output.
    std::cout << "Hello " << msg;

    // Return true for success!
    return Napi::Boolean::New(env, true);
}

Napi::Object Init (Napi::Env env, Napi::Object exports) {
    // Wrap helloWorld in a Napi::Function
    Napi::Function helloWorldFn = Napi::Function::New<helloWorld>(env);

    // Set exports.helloWorld to our hello world function
    exports.Set(Napi::String::New(env, "helloWorld"), helloWorldFn);

    return exports;
}

NODE_API_MODULE(hello_world_module, Init);

helloWorld uses the the Napi::Boolean wrapper to create boolean values. The wrappers for all JavaScript values are listed here. All of these wrappers extend the abstract type Napi::Value.

Instead of declaring a Napi::String parameter and returning a Napi::Boolean directly, helloWorld accepts CallbackInfo reference and returns a Napi::Value. Using this generic signature is required to wrap it in Napi::Function. The CallbackInfo contains the arguments passed by the caller in JavaScript code. To protect from the native code from throwing an exception, the function validates the arguments.

After creating the binary module for this addon, it should be useable from JavaScript:

// hello_world.js

const { helloWorld } = require('./build/Release/hello_world');

helloWorld('world, you did it!');

node-gyp – Building the binary module from C++ code

node-gyp is a cross-platform tool for compiling native Node.js addons. It uses a fork of the gyp meta-build tool – “a build system that generates other build systems”. More specifically, node-gyp will configure the build toolchain specific to your platform to compile native code. Instead of creating a Makefile for Linux, Xcode project for macOS, and a Visual Studio project for Windows – you need to create a single binding.gyp file. node-gyp will handle the rest for you; this is particularly useful when you want the native code to compile on the user’s machine and not serve prebuilt binaries.

To use node-gyp and node-addon-api, you’ll need to create an npm package (run npm init). Then install node-addon-api locally and node-gyp globally,

npm install --save node-addon-api
npm install -g node-gyp

Now, to build the example hello-world addon, we’ll need a very simple binding.gyp configuration:

{
    "targets": [
        {
            "target_name": "hello_world",
            "sources": ["hello_world.cc"],
            "include_dirs": [
                "<!@(node -p \"require('node-addon-api').include_dir\")"
            ]
        }
    ]
}

This configuration defines one build-target: our “hello_world” addon. We have one source file “hello_world.cc”, and we want to include the header files provided by node-addon-api. Here, the <!@(...) directive tells node-gyp to evaluate the code ... and use the resulting string. node-addon-api exports the include_dir variable, which is the path to the directory containing its header files.

We can finally run node-gyp,

# This will create the Makefile/Xcode/MSVC project
node-gyp config

# This will invoke the platform-specific project's build toolchain and build the addon
node-gyp build

You can also run node-gyp rebuild instead of running two commands. node-gyp should now have created the binary module for the hello_world addon at ./build/Release/hello_world.node. You can require or import it like any other Node.js! Run the hello_world.js file to test it!

I created this Replit so you can run the hello_world addon right in your browser!

Linking your C++ addon with another native library

Now that we’ve created a simple addon, we want to write bindings for another library. There are two ways to do this:

  • Include the sources of the native library in your repository (using a git submodule) and include that in your node-gyp sources. node-libpng does this. It has two additional gyp configurations in the deps/ folder for compiling libpng and zlib. Since compiling libraries can slow down an npm install, node-libpng prebuilds the binaries and its install script downloads them. sharp also does this and falls back to locally compiling libvips if there isn’t a prebuilt binary for the client platform.
  • Statically link to library preinstalled on the client machine. The downside of this approach is that your user must install the native library before using your bindings. This might be necessary if the native library uses a sophisticated build system that’s hard to replicate using node-gyp. I did this for node-source-highlight because source-highlight depends on the Boost library.

To statically link to a preinstalled copy of the library on the client machine, you can add the following snippet to your addon target in binding.gyp:

"link_settings": {
    "libraries": [
        "-l<name>"
    ]
}

where <name> is the “name” in “libname” of the library you are linking. For example, you would use “-lsource-highlight” to link to “libsource-highlight”. Now, assuming you’ve correctly installed the native library on your machine, you can use its headers in your C++ code.

You can wrap the underlying APIs of a native library and expose them to JavaScript. In node-source-highlight, the SourceHighlight class wraps an instance of srchilite::SourceHighlight.

// SourceHighlight.h
#include <napi.h>

class SourceHighlight : public Napi::ObjectWrap<SourceHighlight> {
  public:
      static Napi::Object Init(Napi::Env env, Napi::Object exports);
      Napi::Value initialize(const Napi::CallbackInfo& callbackInfo);
  private:
      srchilite::SourceHighlight instance;
}

// SourceHighlight.cc
#include <napi.h>
#include "SourceHighlight.h"

Napi::Value SourceHighlight::initialize(const Napi::CallbackInfo& info) {
   this->instance.initialize();
   return info.Env().Undefined();
}

Napi::Object SourceHighlight::Init(Napi::Env env, Napi::Object exports) {
    Napi::Function fn = DefineClass(env, "SourceHighlight", {
       InstanceMethod("initialize", &SourceHighlight::initialize)
   });

   Napi::FunctionReference* constructor = new Napi::FunctionReference();
   *constructor = Napi::Persistent(func);

   env.SetInstanceData(constructor);

   exports.Set("SourceHighlight", func);

   return exports;
}

// Initialize native add-on
Napi::Object Init (Napi::Env env, Napi::Object exports) {
    SourceHighlight::Init(env, exports);
    return exports;
}

NODE_API_MODULE(sourcehighlight, Init);

In this snippet, SourceHighlight::Init does the heavy lifting of creating a class constructor function and attaching it to the exports. The SourceHighlight class holds the underlying srchilite::SourceHighlight instance and each method invokes the corresponding method on that instance after validating the arguments passed.

The full sources are available here.