AI agents do not only need smarter models. They need better places to run.
Every serious agent workflow eventually hits the same operational problem: the model wants to try things in parallel, execute code, inspect files, run tests, call tools, and sometimes roll back after a bad turn. Containers are convenient, but they are not always the isolation boundary you want for untrusted code. Full virtual machines are stronger, but cold-starting a VM for every branch of thought is too slow and too expensive.
forkd is interesting because it attacks that gap directly. It is an Apache-2.0 microVM sandbox runtime built on Firecracker, designed for AI agent fan-out. A parent microVM boots once, warms the runtime, imports dependencies, loads state, and is paused. Children then fork from that warm snapshot using copy-on-write memory instead of each booting a fresh kernel.
That makes forkd less like another container wrapper and more like a VM-level version of fork(2) for agent infrastructure.
What forkd Is
forkd is a sandbox runtime for workloads where a single warmed environment needs to produce many isolated children quickly.
The repository describes a parent VM that boots once, imports the runtime, and is paused to disk. Each child is launched as a separate Firecracker process that maps the parent’s memory image privately. Linux copy-on-write keeps unchanged memory shared while each child diverges independently.
For AI systems, that maps cleanly to a familiar pattern:
- Warm a Python, Node, browser, database, or coding-agent environment.
- Fork many children from the same prepared state.
- Run different tool calls, tests, prompts, or patches in each child.
- Keep each branch isolated.
- Throw away losing branches without rebuilding the environment.
This is why the project is framed around code interpreters, SWE-bench-style evaluations, per-user code execution, untrusted CI, and agent framework integrations.
The Core Idea: Fork From Warm
Cold start is the enemy of agent fan-out.
If an agent wants to evaluate ten possible fixes, launch ten code-interpreter sessions, or test a plan against ten repository states, a traditional stack pays startup cost repeatedly. That startup cost may include booting the runtime, installing packages, importing large libraries, warming a browser, loading model weights, or preparing a database fixture.
forkd moves that cost into the parent. Once the parent is warm, children inherit the already-loaded state. The project reports a benchmark where 100 sandboxes running a warmed numpy workload spawn in 101 ms, with very small additional memory per child because unchanged pages are shared.
The important part is not the exact number on someone else’s hardware. The important part is the primitive: a real VM isolation boundary with a spawn cost closer to process forking than VM cold boot.
BRANCH: Forking Mid-Thought
The most agent-specific feature is BRANCH.
Forking from an initial warm snapshot is useful, but agents often need to split after they have already done work. A coding agent may have cloned a repository, installed dependencies, edited files, and run a failing test. A planning agent may have accumulated reasoning and local artifacts. A browser agent may have navigated into a state that is expensive to reproduce.
BRANCH lets forkd pause a running sandbox, snapshot its in-flight state, and resume from that point. In the repository’s v0.4 live mode notes, forkd reports a 56 ms p50 source-pause window for a 1.5 GiB source on its benchmark workload, with asynchronous background copying available for fire-and-forget branching.
That changes the shape of agent orchestration. Instead of asking the model to serialize every alternative into text and replay state from scratch, the system can branch the actual machine state.
Why This Matters for AI Agents
Text context is not enough for many agent tasks.
An agent’s useful state often lives outside the prompt:
- a checked-out repository;
- generated files;
- installed packages;
- test caches;
- notebook variables;
- browser profile state;
- local databases;
- temporary credentials or tokens;
- a running service listening on a port.
You can summarize that state to a model, but you cannot faithfully place a 50 MiB file tree, a warmed Python process, or a dirty database buffer into a prompt. forkd’s value is that it treats machine state as the thing to branch.
That makes it especially relevant for:
- code interpreters that need fresh but warm sessions;
- coding agents that need to test multiple patches;
- evaluation harnesses that run many isolated attempts;
- browser-driving agents where Chromium startup dominates;
- CI jobs that execute untrusted code;
- database fixture tests where each test needs an isolated copy.
How It Compares to Containers
Containers remain the default choice for many developer workflows because they are easy to build, distribute, and orchestrate. forkd does not make containers obsolete.
The difference is the isolation and snapshot primitive.
With forkd, each child is a Firecracker microVM backed by KVM. That is a stronger boundary than a standard container namespace. The child can still run real Linux workloads, install packages, make outbound network calls when allowed, and execute ordinary tools. The parent-child memory relationship gives forkd its unusual performance profile.
The tradeoff is operational complexity. You need Linux with KVM, cgroup v2, Firecracker support, host setup scripts, and in some live-fork paths a vendored Firecracker fork until the required shared memory backend support lands upstream. This is infrastructure, not a JavaScript library you drop into any serverless function.
Developer Surfaces
forkd exposes several practical entry points:
- CLI commands for building snapshots, forking children, branching sandboxes, packing snapshots, and running benchmarks;
- a controller daemon with a REST API, bearer-token auth, audit logs, and Prometheus metrics;
- a Python SDK with E2B-style ergonomics for sandbox execution;
- a TypeScript SDK for Node.js agent stacks;
- an MCP server via
forkd-mcp, so tools like Claude Desktop, Claude Code, Cursor, and Cline can drive sandboxes through MCP.
That surface area matters because sandbox runtimes often fail at the integration layer. Agent teams do not want a beautiful primitive that only works from a bespoke demo script. They need it to fit into LangGraph, AutoGen, CrewAI, Swarm-style handoffs, browser automation, and code-interpreter workflows.
forkd’s recipe directory is useful for that reason. It shows the project is not only a VMM experiment; it is being shaped around the actual ways agents need to fan out and execute.
Where forkd Fits Best
forkd is most compelling when three conditions are true:
- The workload is expensive to warm.
- You need many short-lived isolated children.
- The child state cannot be represented cleanly as prompt text.
That is exactly the profile of modern coding agents and code interpreters. Importing heavy libraries, building repo state, launching browsers, preparing database fixtures, and replaying tool histories are all slow. If many branches can share the same parent state until they diverge, the infrastructure cost changes materially.
It is also useful where stronger isolation is a requirement. Running user code in a standard container may be acceptable for internal tools, but it becomes harder to justify for multi-tenant code execution, third-party plugin execution, or untrusted CI.
Where It Is Still Early
forkd is alpha software. The README is explicit that APIs and on-disk formats may still change before 1.0.
The project also lists production-readiness gaps: multi-node scheduling, stricter default egress controls, additional CPU, I/O, and process quotas beyond memory limits, and the need for a third-party security audit.
Those are not small details. A sandbox runtime becomes part of the trusted computing base for every agent that uses it. If you plan to expose forkd to untrusted users, the right posture is to treat it as promising infrastructure that still deserves careful threat modeling, layered network policy, host hardening, logging, and upgrade discipline.
My Take
forkd is worth watching because it focuses on a primitive agent systems actually need: fast, isolated branching of real machine state.
A lot of AI infrastructure still assumes the model is the whole system. In practice, useful agents are distributed systems with files, processes, networks, caches, runtimes, and side effects. Once you accept that, “branch the prompt” is not enough. You need to branch the environment too.
That is the sharp idea in forkd. It gives agent builders a way to treat warm execution state as reusable infrastructure while keeping each branch inside a microVM boundary. For local experimentation, it is already interesting. For production agent platforms, the questions are now about hardening, scheduling, policy, and operational maturity.
If forkd continues in this direction, it could become one of the more important open-source building blocks for self-hosted code interpreters and serious AI agent execution.