microsandbox v0.4: Deleting FUSE Filesystem Yields 47× Speed

The FUSE tax was killing performance

A user on Discord complained that microsandbox felt slow. Listing every file in the Python standard library took 5.3 seconds inside the sandbox; in Docker it took milliseconds. The culprit: every file operation inside the VM had to bounce out to the host through FUSE, Linux's mechanism for letting userspace programs act as filesystems. Opening a file required the VM to hand the request to the host process, which walked every OCI layer looking for the file and sent the answer back. A single Python import triggered dozens of these round trips before code even started running, and a ten-layer image multiplied the cost of each one.

v0.3's attempt: caching didn't close the gap

The team spent the next stretch of v0.3 trying to make that path faster: better caching, fewer syscalls, smaller responses. Each change shaved a few percent. None changed the order of magnitude. Docker doesn't have this problem because Docker uses the kernel's own layered-filesystem driver (overlayfs), so file operations never leave the kernel. microsandbox was trying to match a kernel filesystem from outside the kernel; no cache could close that gap.

Deleting the filesystem

So they deleted it. The new plan: build a Linux filesystem image ahead of time, hand it to the VM as a virtual disk, and let the VM's own kernel mount it. With FUSE out of the loop, file operations inside the VM stay inside the VM. The filesystem they picked is EROFS: read-only, in-tree since the kernel needed it for Android, and easy to author. EROFS also solved the macOS problem: the VM's own kernel is Linux regardless of what's running outside it, so once the disk image is built, the host's filesystem stops mattering.

Writing filesystem authors in Rust

microsandbox runs on both Linux and macOS, and macOS lacks the host-side tools you'd normally use to build a filesystem image: no mkfs.ext4, no mkfs.erofs, no loopback mounts. So they wrote the image writers themselves in Rust. Three small pieces do the work:

An EROFS writer that emits the read-only image of an OCI layer.
An ext4 writer that emits the sparse, journaled scratch area each sandbox gets.
A VMDK descriptor that stitches everything into one virtual disk.

Nothing in the pipeline shells out, asks for root, or mounts a loopback device, and the same Rust code path builds the images on Linux and Apple Silicon. The EROFS artifacts round-trip through a reader they also wrote, and CI boots the full stack under the real VM kernel.

Merged metadata: one disk to rule all layers

The first cut used one EROFS image per OCI layer, but Python images run around ten layers and CUDA images more, pushing past the microVM's virtio device cap. The EROFS folks pointed them at a feature: EROFS can build a metadata-only image, just the merged directory tree plus a pointer per file saying which underlying blob holds its bytes and at what offset. The kernel reads that image, treats the whole bundle as one virtual disk, and answers every lookup with a single calculation instead of a search across layers.

The pipeline becomes:

Pull the OCI layers as usual.
Build one small metadata image describing the merged tree.
Hand the VM one virtual disk that stitches the metadata and the layer blobs together.

The VM now only attaches two rootfs block devices, no matter how many layers the original image had: one read-only VMDK-backed stack for the image, and one writable ext4 upper for the sandbox. Overlayfs only ever combines those two. This is the version shipped in v0.4, with a small libkrunfw kernel config tweak (CONFIG_EROFS_FS_XATTR + CONFIG_EROFS_FS_SECURITY) so EROFS exposes the xattrs overlayfs needs for whiteouts.

47× faster, 5,300 lines shorter

Across fourteen mixed guest-visible filesystem workloads, the geometric mean speedup is 47.18×. The eight biggest movers:

file_delete_1k: 1109.94×
rename_1k: 876.58×
small_file_create_1k: 240.78×
metadata_scan_stdlib: 240.28× (was 500ms, now ~2ms)
read_all_py_stdlib: 116.40×
deep_tree_traverse: 47.16×
concurrent_read_4t: 20.93×
random_read_stdlib: 4.01×

The host filesystem code is about 5,300 lines shorter. Linux's overlayfs is a large spec, and v0.3 reimplemented most of it in user space, still chasing edge cases. v0.4 doesn't reimplement any of it — the VM's own kernel does the merging, and the bugs are gone.

Side benefits

macOS case-sensitivity: APFS is case-insensitive by default. Linux images with files differing only by case used to collapse. Now the EROFS writer streams the tar straight into a binary image where both names live as distinct entries.
OCI patches: Rootfs patches get baked into upper.ext4 before boot instead of through a runtime overlay protocol.
Shared lower layers: Per-layer EROFS artifacts are content-addressed by diff ID, so two sandboxes sharing a base image share those bytes.
Snapshots: A sandbox's writable state is a single ext4 file; preserving or copying it is a file copy.

What didn't improve

First pulls aren't faster — more work happens at pull time to build the images, though it's parallel across layers and bounded by tar decompression. Bind volumes (host directories shared into the VM) still go through the old FUSE path, since their contents can change at any time.

Lessons

The boring primitive in the kernel often beats the clever one in user space. Both monofs and the v0.3 overlay were ambitious designs, but EROFS is a boring, in-kernel file format. They spent months tuning user-space code before accepting that the structural answer was to stop competing with the kernel and use it.

NIH is fine when the existing thing breaks your design. Shelling out to mkfs.ext4 or mkfs.erofs would have meant either a helper VM or a Linux-only split, both of which would have undone microsandbox's "single self-contained binary" promise. Writing the writers themselves was the cost of keeping that promise.

Stay open to better ideas while shipping. The first cut was already a big win, but holding the PR open another week to absorb EROFS's merged metadata turned a one-off optimization into something they are happy to support long term.

Run benchmarks inside the VM. Timing from the host would have hidden the worst of the FUSE round-trip costs and made the win look smaller than it was. Time the thing your user actually waits on.

Try it

This ships in microsandbox 0.4 and later. Install the CLI:

curl -sSL https://install.microsandbox.dev | sh

Or use the SDK for your language:

uv add microsandbox       # python
npm install microsandbox  # typescript
cargo add microsandbox    # rust

microsandbox v0.4: Deleting FUSE Filesystem Yields 47× Speedup

The FUSE tax was killing performance

v0.3's attempt: caching didn't close the gap

Deleting the filesystem

Writing filesystem authors in Rust

Merged metadata: one disk to rule all layers

47× faster, 5,300 lines shorter

Side benefits

What didn't improve

Lessons

Try it

Editor's Take

Key Takeaways

Why It Matters

Get the weekly digest

You might also like

Block PR merges on vulnerabilities with Synapse security gate

DocuBrowse 0.9.0: Local AI Document Search with SQLite FTS5 and Ollama

DKIM2 and DMARCbis Land: Email Auth Gets a Chain of Custody

Truecaller vs TRAI: Spam labeling ban hurts call blocking

Next.js 16 Optimistic UI: The Rapid-Click Bug That Breaks Your Toggle

Rate Limiting by IP Broke My API: Fixing Shared Provider Quotas