The pitch for AI agents is a discount on labor. The bill is a metered utility. Both can be true, and the gap between them is where most agent economics actually lives. A single model call is cheap and legible. An agent is a loop of calls plus tool runs plus retries, and a multi-agent system is many such loops at once, which is exactly the shape that turns a small per-call price into a large monthly number.
The runaway pattern is structural, not exotic. An agent that retries a failing step, or a swarm whose budget gate is mis-set, can bill indefinitely while looking busy. The defenses are the same ones serious systems already use: a hard retry ceiling, a per-window spend cap, and a kill-switch an operator can flip in under a minute. Paperclip made budget enforcement a first-class primitive precisely because multi-agent autonomy without a spend brake is the canonical failure mode.
The decision underneath is not "is the model cheap" but "is the agent cheaper than the work it replaces, after you price the loop." That means counting retries and fan-out, not just the headline token rate, and deciding which decisions still need a human whose time is the real cost. Agent labor is a budget line. The teams that do well with it treat it like one: instrumented, capped, and compared honestly against the alternative.