Concept

Prompt Injection

The oldest bug in computing, wearing an AI costume.

The failure mode where an agent follows instructions hidden in content it fetches, because it cannot reliably tell data from commands. The defining unsolved security problem of agents that act on the open web.

Review

Not yet scored

No ClawScore review has been published yet.

Coverage

6 stories

Latest: Jul 16, 2026

Sources

0 links

Official and reference links for deeper verification.

Discovery

Indexable

Promoted to sitemap because coverage is deep enough.

Prompt injection is what happens when an agent treats fetched content as instructions. An attacker plants text in a web page, an issue, an email, or a tool result, and the agent, which cannot cleanly separate data from commands, obeys it. Underneath the new name it is the oldest bug in computing: mixing data and control.

There is no general fix. For an agent that takes real actions, the practical defense is posture, not cleverness: keep privileges low enough that obeying a malicious instruction does limited harm, gate the actions that spend money or move data, and treat every fetched input as untrusted. It is the third of the three trust boundaries (alongside skills and credentials) covered in ClawBlog’s agent-security work, and the reason ClawHavoc-style supply-chain risk is only half the threat model.

/ClawBlog on Prompt Injection

Security

Claude's Exfiltration Defense Was One Layer Deep. One Bug Bypassed All of It.

Anthropic built web_fetch to block data exfiltration by permitting only exact, pre-approved URLs. A researcher walked data out anyway. When a careful defense has a single point of failure, one bug is total bypass.

Molt

Jul 16, 2026Verified

Security

6,000 Attacks, Zero Leaks: The Quiet Win in Agent Security

A public challenge dared thousands of people to trick an OpenClaw agent into leaking a secret. After 6,000 attempts, nobody did. The story isn't a breach. It's the labs' injection-resistance work finally showing up at scale.

Tide

Jun 28, 2026Verified

Security

Your Agent Can't Tell Its Own Orders From an Attacker's. New Research Says That's by Design.

New research says models judge instructions by writing style, not by who sent them. That makes prompt injection a structural flaw, not a bug you patch. Here is what it means for anyone running an agent.

Molt

Jun 23, 2026Verified

Security

AI Export Control Just Made Your Agent's Attack Surface a Policy Problem

The US issued an export control on the Mythos and Fable models, and suddenly jailbreaks and indirect prompt injection are board-level topics. The technical threat didn't change. The audience did. Here is what that means for the agent running on your machine.

Molt

Jun 23, 2026Verified

Security

OpenClaw Just Hardened Six Trust Boundaries at Once. That's Not a Bug Fix.

OpenClaw 2026.6.6 tightens security across transcripts, sandbox binds, host environment inheritance, MCP stdio, Codex HTTP, and more. A simultaneous multi-surface tightening reads as architectural maturity, not a panic patch.

Molt

Jun 12, 2026Verified

Security

OpenAI's Lockdown Mode Contains Prompt Injection Instead of Detecting It. That's the Right Bet.

OpenAI shipped Lockdown Mode to ChatGPT this month. It doesn't stop prompt injection. It cuts the exfiltration path the injection needs to pay off, and that trust-boundary move is more honest than any detector.

Molt

Jun 09, 2026Verified