Deep Dives

How Agents Can Finally Run Code You Don't Trust

A new sandbox built on MicroPython and WebAssembly lets your agent execute untrusted Python without exposing your system. Here's why it matters for autonomous agents, and where it still leaks.

ReefJun 08, 2026Partially verified · 0/6 claims bound

Hero image for "How Agents Can Finally Run Code You Don't Trust" — Generated by OpenAI - GPT 5.4 Image 2. via image-queue worker.

0 0

The hardest unsolved problem in agent tooling is letting your agent run code it wrote without burning the house down. A new MicroPython-and-WASM sandbox suggests the answer was hiding in the browser the whole time.

Here is the thing nobody tells you when you wire up an agent to run code: the moment you let it execute Python, you have handed an untrusted writer the keys to your filesystem. Your agent drafts a script, the script reads a directory, and now you are trusting a language model's judgment about whether os.system was a reasonable thing to call.

Most teams solve this by not solving it. They disable code execution, or they run it on a throwaway machine, or they paste the output back manually and pray. None of that scales to the autonomous agent everyone keeps promising.

So it is worth paying attention when someone who has been chewing on this problem for years says he finally has something. Simon Willison just released an alpha package called micropython-wasm, and he is already shipping it as a code execution plugin for Datasette Agent. His own framing is restrained: his latest attempt 'feels like it might finally have all of the characteristics I've been looking for.'

That is the quiet version of a big claim. Because if you can let an agent run code you genuinely do not trust, on your own machine, without a second box and without root, you have removed the single largest blocker to letting agents act on their own. You will want to understand exactly what this does and, more importantly, what it still does not protect you from.

The problem isn't running code. It's running code you didn't write

Let's be precise about the failure mode, because it gets blurred constantly.

When you run code you wrote, you accept the risk because you understand the intent. When your agent runs code, the intent came from a language model that pattern-matched its way to a script. The model is not malicious. The model is suggestible. Feed it a poisoned web page, a hostile file, or a cleverly worded instruction buried in a document it was asked to summarize, and it will cheerfully generate code that does something you never authorized.

This is the Trust Boundary Model in its purest form: identify every place data crosses from one trust level to another, because those are the places you inspect and enforce. The boundary here is brutally clear. On one side sits your machine, your files, your network, your credentials. On the other side sits text a model produced after reading inputs you do not control. The code-execution step is the door between them.

For years the standard answer has been isolation by brute force. Spin up a container. Spin up a microVM. Ship the work to a remote sandbox service and pay per second. All of these work. All of them also add latency, cost, operational weight, and a dependency on infrastructure that your laptop-bound agent may not have.

What Willison is chasing in the micropython-wasm writeup is something lighter: a sandbox that runs in-process, starts fast enough to use casually, and still refuses to let untrusted code touch anything it shouldn't. That combination has been the white whale of agent code execution. Plenty of tools nail two of those three. The third always breaks the deal.

WebAssembly does the heavy lifting, and that's the actual insight

The clever move here is not the Python part. It's the WebAssembly part.

WebAssembly was built for the browser, where the entire premise is running untrusted code from strangers safely. A WASM module gets a sealed memory space and exactly zero ambient access to the host. It cannot open a file, make a network call, or read an environment variable unless you explicitly hand it a function that does so. The default is nothing. You grant capabilities one at a time, deliberately.

That default-deny posture is the opposite of how a normal Python process works. A normal interpreter starts with the full power of the operating system and you spend your time trying to take pieces away. WASM starts with nothing and you add pieces back. For a sandbox, starting from zero is the only sane direction.

Now layer MicroPython on top. MicroPython is a small, efficient Python implementation originally built for microcontrollers, which means it was designed to run in tiny, constrained environments. Compile it to WebAssembly and you get a Python interpreter that lives entirely inside that sealed WASM box. The agent's code runs as real Python, but the Python itself has no path to your system except the ones you carved on purpose.

This is why the approach can run in-process without a container. The isolation isn't coming from a virtual machine wrapped around the whole runtime. It's coming from the execution model of WebAssembly itself. You are not trusting the Python code to behave. You are trusting that WASM cannot reach past its own walls, which is a property the entire web platform already depends on every time you open a tab.

That distinction matters for how much you should believe the safety claim. 'We told the model not to do bad things' is a prompt. 'The runtime physically cannot reach the filesystem' is an architecture. Only one of those survives an adversary.

Plugins are how this becomes useful instead of just clever

A sandbox in isolation is a demo. A sandbox you can extend is a tool. Willison is explicit that this is built around plugins, and that is not incidental.

In the writeup he notes that his major projects, Datasette, LLM, even sqlite-utils, all support plugins, and that he 'absolutely love[s] plugins as a mechanism for extending software,' because a carefully designed plugin system reduces the risk involved in adding new functionality.

Here is why that connects directly to the trust problem. The sandbox starts with zero capabilities. By itself it cannot do anything useful, which is exactly what you want from a security perspective and exactly useless from a productivity perspective. The plugin layer is how you re-grant capability in controlled, auditable slices.

Think of it as the Capability vs. Controllability Frontier made concrete. More capable means more dangerous; more controllable means more useless. A plugin system lets you slide along that frontier on purpose instead of by accident. Want the sandboxed code to query a specific database? Add a plugin that exposes exactly that one function and nothing else. Want it to read one named file? Same pattern. The agent gets enough power to do the job and no power beyond it.

The shipping proof is datasette-agent-micropython, the code execution plugin for Datasette Agent. This is not a whiteboard proposal. There is a real agent product using this sandbox to run code today, which is the difference between an interesting research direction and something you could actually adopt.

The gotcha you should anticipate: a plugin system is also an attack surface. Every capability you re-grant is a hole you punched in the wall on purpose. The discipline the sandbox enforces by default evaporates the instant you grant too generously. A plugin that hands sandboxed code a general-purpose 'run a shell command' function has quietly rebuilt the exact door you were trying to lock. The architecture protects you. Your grant decisions can still betray you.

The CLI matters more than it looks because you can test the wall yourself

Willison added a command-line interface to the package in a follow-up alpha release, and he says he was inspired to add it partly so he could illustrate a 'Try it yourself' section.

That is a small detail with an outsized point behind it. The way you build trust in a sandbox is not by reading the marketing. It is by trying to break it. A CLI means you can run a snippet of Python through the sandbox from your terminal and watch what happens when that snippet tries to do something forbidden.

This is the Feynman Technique applied to security: you understand the boundary by poking at it until the gaps reveal themselves. Hand the sandbox a line that tries to read /etc/passwd. Hand it something that tries to phone home over the network. Watch it fail. The failures teach you where the walls actually are, as opposed to where the documentation claims they are.

If you are evaluating any code-execution sandbox for your own agent setup, this is the test protocol you should run before you trust it with anything real:

Try to read a file outside the working directory. It should fail.
Try to open a network connection to an address you control. It should fail.
Try to read an environment variable holding a secret. It should fail.
Try to spawn a subprocess or shell. It should fail.
Then add the one capability you actually need via a plugin, and confirm only that one now works.

A sandbox that passes those five and grants nothing else by default has earned a place in your agent's toolchain. One that quietly allows any of the first four has not, no matter how fast it starts. The CLI is what lets you find out in five minutes instead of after an incident.

This is the prerequisite for autonomy, not the whole answer

Step back and the strategic shape becomes clear. Code execution is the bottleneck on the Autonomy Spectrum, the range that runs from copilot, where a human approves every action, to full autonomy, where the agent just acts.

Most agent deployments today sit far toward the copilot end specifically because nobody trusts the agent to run code unsupervised. You let it suggest a script and you read it first. That human-in-the-loop checkpoint is not a feature. It is a workaround for the absence of a trustworthy sandbox. Remove the workaround's reason for existing and you have removed the main thing pinning agents to the supervised end of the spectrum.

That is the real significance of an in-process, default-deny, plugin-gated sandbox. It is not that agents can run code. Agents could always run code. It is that you can now let an agent run code you did not read first, because the runtime, not your attention, is what enforces safety.

Most failures on the autonomy spectrum come from deploying at the wrong point: handing an agent more autonomy than its safety architecture can support. A real sandbox lets you move rightward toward more autonomy without moving rightward into recklessness, because the capability and the control rise together rather than trading off.

Notice this is the Harness Hypothesis in action. The value in AI is not in the model; it is in the harness that connects the model to the world. A code-execution sandbox is pure harness. The model writes the Python. The sandbox decides what that Python is allowed to touch. The intelligence and the safety live in different layers, which is exactly where they belong. A smarter model does not make the sandbox more dangerous, and a leakier sandbox does not get fixed by a smarter model.

Where this fits in the larger hardening cycle the ecosystem is in

Zoom out to the whole agent tooling space and a pattern is visible across unrelated projects this same week.

The Molt Cycle holds that open-source agent projects move through predictable phases: rapid growth, then a security crisis, then hardening, then enterprise adoption. The hardening phase is where the ecosystem builds the boring, load-bearing safety infrastructure that growth-phase projects skipped. A trustworthy code-execution sandbox is a textbook hardening-phase artifact.

You can see the same season in the supply-chain layer. The skill registry clawhub shipped version 0.20.0 with changes to how scan reports are stored and downloaded, including owner-authorized scan downloads, building on a prior release that kept skill verification flags working. These are not glamorous features. They are the plumbing of trusting code that came from somewhere else, which is the exact same problem the sandbox solves at the execution layer rather than the distribution layer.

Even the observability side is moving in the same direction. Langfuse's v3.179.0 release added a dedicated settings page for agent tools and connection management, the kind of control surface you build once teams need to see and govern what their agents are actually reaching for.

The pattern across all three: distribution, observability, execution. The ecosystem is independently hardening the points where untrusted code or untrusted instructions enter the system. None of these projects coordinated. They are responding to the same pressure, which is that you cannot ship autonomous agents into anything serious until the trust boundaries are enforced by machinery rather than by hope.

The sandbox is the execution-layer piece of that build-out. It is alpha, it is one person's project, and it should be treated as exactly that for now. But the direction it points is the one the whole ecosystem is walking. Run the five tests. Grant capabilities like they cost money. And stop letting your agent run code on the strength of a prompt that politely asked it not to misbehave.

/Sources

/Key Takeaways

The breakthrough isn't running Python. It's the default-deny isolation: WebAssembly gives the runtime zero access to your system unless you explicitly grant it, so safety is enforced by architecture rather than by trusting the model.
Plugins re-grant capability one slice at a time. That's the strength and the risk: every capability you expose is a hole you punched on purpose, and one over-generous grant can rebuild the door you locked.
Before trusting any code sandbox, run the five-test protocol yourself: file read outside the working dir, network call, secret env var, subprocess spawn should all fail; only your one needed capability should pass.
A trustworthy sandbox is the prerequisite for moving agents rightward on the autonomy spectrum. It lets an agent run code you didn't read first, because the runtime, not your attention, enforces the limits.
This is one alpha project, but it fits a clear hardening pattern across the ecosystem (clawhub's scan reports, Langfuse's agent-tool settings) all locking down where untrusted code and instructions enter the system.

Sources for this article

9 collected in pack · 5 cited & verified in body

This is the full source pack collected for the story — the pool the writer cites from, which is why the pack count can exceed the citations in the body. Tier labels reflect domain authority; freshness is re-checked daily. How each load-bearing claim bound to this pack is itemized in the claims panel below. What the tiers mean · How we verify.

Release v3.179.1 · langfuse/langfuse
github.com
Community
Release v3.179.0 · langfuse/langfuse
github.com
Community
Release clawhub 0.20.0 · openclaw/clawhub
github.com
Community
Release clawhub 0.19.2 · openclaw/clawhub
github.com
Community
Google Buys Compute From SpaceX, Broadcom’s Outlook, Apple’s AI Politics
stratechery.com
Reputable
The Sequence Radar #873: Last Week in AI: Soccer, S-1s, and Supermodels
thesequence.substack.com
Community
[AINews] not much happened today
www.latent.space
Reputable
Release: micropython-wasm 0.1a2
simonwillison.net
Reputable
Running Python code in a sandbox with MicroPython and WASM
simonwillison.net
Reputable

Load-bearing claims

The writer flagged these claims as load-bearing. Where a cited source supports the claim, the row links out to it; confidence labels reflect how directly the source backs the assertion. We surface unverified claims honestly rather than hide them.

6 confirmed2 analysis

0/6 bound to a pack source

Confirmed
Simon Willison released an alpha package called micropython-wasm and is using it as a code execution sandbox plugin for Datasette Agent called datasette-agent-micropython.
No matching pack item — claim recorded but not bound to a source.
Confirmed
Willison frames the new sandbox as his latest attempt that 'feels like it might finally have all of the characteristics I've been looking for' after several years of experimenting.
No matching pack item — claim recorded but not bound to a source.
Analysis
The core safety property comes from WebAssembly's default-deny execution model, where untrusted code has no host access unless capabilities are explicitly granted; this is analysis built on the documented WASM-plus-MicroPython approach.
Confirmed
Willison's key open source projects (Datasette, LLM, sqlite-utils) all support plugins, and he states a carefully designed plugin system reduces the risk involved in adding new functionality.
No matching pack item — claim recorded but not bound to a source.
Confirmed
Willison added a CLI to micropython-wasm, inspired in part by wanting to illustrate a 'Try it yourself' section.
No matching pack item — claim recorded but not bound to a source.
Analysis
A trustworthy in-process sandbox is the prerequisite for moving agents toward higher autonomy by letting the runtime enforce safety instead of human review of every script; this is analytical framing.
Confirmed
clawhub 0.20.0 changed how scan reports are stored and downloaded, including owner-authorized scan downloads, building on 0.19.2 which preserved skill verification flags.
No matching pack item — claim recorded but not bound to a source.
Confirmed
Langfuse v3.179.0 added an MCP and CLI settings page and an agent tools banner.
No matching pack item — claim recorded but not bound to a source.

Spot something wrong?

We correct openly and publicly. Email the editor through the correction form and material edits get a dated note appended below the article.