/Signal
Georgi Gerganov, the engineer behind llama.cpp and ggml, posted a quiet endorsement that says more about the agent economy than most launch keynotes. He runs Qwen3.6-27B locally, daily, on either an M2 Ultra or an RTX 5090 box, and calls it "a very capable local model for coding tasks."
The model is not the headline. A 27-billion-parameter model running on a single consumer-grade machine, good enough that a senior maintainer reaches for it without ceremony, is now ordinary. We have crossed into the part of the curve where capable inference is something you own, not something you rent by the token.
What Gerganov reaches for is more revealing. His setup is "a very lightweight harness, the pi agent with everything stripped" plus "a short system prompt to align it a bit with my style." No sprawling orchestration. No marketplace of skills. A stripped binary and a paragraph of instructions.
Then the line that should stop anyone selling agent autonomy: "I think I would be using it much more, if I didn't have to spend a lot of my time on reviewing PRs." The constraint on a world-class engineer's use of a capable local model is not the model's capability. It is the human review loop the model creates more of, not less.
/Framework
Wardley Mapping is the cleanest lens here. Map the agent value chain from genesis to commodity and watch where each component sits.
The model layer is sliding fast toward commodity. When a maintainer of Gerganov's standing runs a 27B model on hardware he already owns and treats it as unremarkable, the model is no longer a scarce, differentiated good. It is utility. Wardley's prediction for any component reaching the commodity stage is that value migrates to the layers above and below it.
That is exactly what his setup demonstrates, and it is The Harness Hypothesis in miniature: the value isn't in the model, it's in the harness that connects the model to the work. Except Gerganov's harness is deliberately minimal. "Everything stripped." The thin harness plus a short style prompt does the job.
So the value didn't pool in a heavyweight orchestration layer. It pooled somewhere the vendors mostly ignore: the review step. That is the genesis-stage component in this value chain, the part nobody has commoditized, because trust does not commoditize on the same schedule as inference.

/Analysis
Read his note as a Wardley map and the strategic picture inverts the prevailing pitch.
The industry sells two stories. One: bigger, smarter, hosted models are the moat (the model layer as the product). Two: rich agent harnesses (skill marketplaces, multi-agent orchestration, managed runtimes) are where the durable business lives. Gerganov, in three sentences, undercuts both.
He undercuts story one by running a capable model locally and offline, on commodity silicon, with zero token bill. When a 27B model is good enough for daily maintainer work on an RTX 5090, the marginal value of the next hosted model tier shrinks for a large class of tasks. Commoditize Your Complement explains the open-weight flood that put him there: every firm that needs cheap inference adjacent to its actual product has an incentive to drive model cost toward zero. Open-weight models are that subsidy made real. The complement is being commoditized exactly as theory predicts.
He undercuts story two more subtly. The heavyweight-harness thesis assumes operators want more scaffolding. Gerganov wants less. He strips the agent to its core and adds a paragraph. That is not a man under-investing in tooling. That is a man who understands that scaffolding you don't review is scaffolding you can't trust. Every additional autonomous capability is another thing whose output lands in his PR queue.
Which brings us to the actual constraint. He says, plainly, that he would use the model more if he weren't spending so much time reviewing pull requests. Sit with that. The model generates work faster than he can verify it. The agent is not a labor-saving device here. It is a labor-shifting device, moving effort from authoring to reviewing, and review is the part that doesn't parallelize, doesn't get faster with a bigger GPU, and doesn't go away when you bolt on more skills.
This is the Capability vs. Controllability Frontier showing up in a maintainer's workday. A more capable model produces more plausible output. Plausible output is harder to review, not easier, because the errors hide better. The verification burden scales with capability, and verification is human-bound.
For anyone running agents in production, the lesson is uncomfortable. The vendor demos optimize the authoring step: watch the agent write the code, file the PR, close the ticket. The bottleneck Gerganov names sits after that, in the review step the demos cut away from. If your agent rollout measured success by output volume, you measured the easy half. The cost lives in the queue of things a competent human now has to check.
This also reframes the best openclaw skills and agent orchestration patterns conversations. The marginal skill you add to a harness is marginal output you must review. There is a point on the Autonomy Spectrum where adding capability subtracts net value, because it adds review faster than it adds finished work. Gerganov sits below that point on purpose, and he is better at this than almost anyone selling you the layer above it.
/Counterpoint
The obvious objection: Gerganov is an n-of-1, and a peculiar one. He maintains a high-stakes open-source codebase where a bad merge is expensive and his standards are extreme. Most operators don't review at his bar, and many tasks don't need it. For low-stakes drudgery, plenty of teams happily let agents run with light review and accept the occasional miss.
That is fair, and it sharpens rather than refutes the point. The review bottleneck doesn't disappear for lower-stakes work; it relocates. Skip review and you haven't removed the cost, you've deferred it to whoever inherits the bug, the security gap, or the silent data corruption three weeks later. The Swiss Cheese Model applies: light review is a defense layer with large holes, and capable models that produce confident, wrong output are very good at finding them.
The honest reading is that Gerganov is early, not unusual. He hit the review wall first because his stakes surfaced it first. Everyone running agents at scale meets the same wall eventually, just with a delay proportional to how much they're willing to not look. The question every operator should ask is not "how capable is the model" but "who reviews its output, and is that the thing I'm actually short on."
/Sources
/Key Takeaways
- A capable 27B coding model now runs daily on a single consumer machine, offline and token-free; the model layer is commoditizing.
- The most valuable harness in this story is the thinnest one: a stripped agent plus a short style prompt.
- The real constraint isn't model capability, it's the human review the model's output creates, and review doesn't parallelize.
- More capable models produce more plausible output, which is harder to verify, not easier; the verification burden scales with capability.
- If your agent rollout measures success by output volume, you measured the easy half; the cost lives in the review queue.

