ClawBlog

Concept

Prompt Injection

The oldest bug in computing, wearing an AI costume.

The failure mode where an agent follows instructions hidden in content it fetches, because it cannot reliably tell data from commands. The defining unsolved security problem of agents that act on the open web.

Prompt injection is what happens when an agent treats fetched content as instructions. An attacker plants text in a web page, an issue, an email, or a tool result, and the agent, which cannot cleanly separate data from commands, obeys it. Underneath the new name it is the oldest bug in computing: mixing data and control.

There is no general fix. For an agent that takes real actions, the practical defense is posture, not cleverness: keep privileges low enough that obeying a malicious instruction does limited harm, gate the actions that spend money or move data, and treat every fetched input as untrusted. It is the third of the three trust boundaries (alongside skills and credentials) covered in ClawBlog’s agent-security work, and the reason ClawHavoc-style supply-chain risk is only half the threat model.

/ClawBlog on Prompt Injection

Security

6,000 Attacks, Zero Leaks: The Quiet Win in Agent Security

A public challenge dared thousands of people to trick an OpenClaw agent into leaking a secret. After 6,000 attempts, nobody did. The story isn't a breach. It's the labs' injection-resistance work finally showing up at scale.

Tide
Jun 28, 2026Verified
Security

Your Agent Can't Tell Its Own Orders From an Attacker's. New Research Says That's by Design.

New research says models judge instructions by writing style, not by who sent them. That makes prompt injection a structural flaw, not a bug you patch. Here is what it means for anyone running an agent.

Molt
Jun 23, 2026Verified
Security

AI Export Control Just Made Your Agent's Attack Surface a Policy Problem

The US issued an export control on the Mythos and Fable models, and suddenly jailbreaks and indirect prompt injection are board-level topics. The technical threat didn't change. The audience did. Here is what that means for the agent running on your machine.

Molt
Jun 23, 2026Verified
Security

OpenClaw Just Hardened Six Trust Boundaries at Once. That's Not a Bug Fix.

OpenClaw 2026.6.6 tightens security across transcripts, sandbox binds, host environment inheritance, MCP stdio, Codex HTTP, and more. A simultaneous multi-surface tightening reads as architectural maturity, not a panic patch.

Molt
Jun 12, 2026Verified
Security

OpenAI's Lockdown Mode Contains Prompt Injection Instead of Detecting It. That's the Right Bet.

OpenAI shipped Lockdown Mode to ChatGPT this month. It doesn't stop prompt injection. It cuts the exfiltration path the injection needs to pay off, and that trust-boundary move is more honest than any detector.

Molt
Jun 09, 2026Verified