Introspection raised money to build infrastructure for agents that fix their own systems. Anthropic said the same thing on stage without naming it. The category shift matters more than the buzzword.
The consensus definition of an AI agent is still "a tool that does a task." That definition just got a competitor.
At the AI Engineer World's Fair this week, a term kept surfacing: autoresearch. The framing came from Roland Gavrilescu, co-founder and CEO of Introspection, in an interview with Latent Space. He described it as building an "outer loop" in which "agents help maintain the system itself." The inner loop does the work. The outer loop studies the inner loop, reads feedback signals and evals, and improves it over time.
Read that again. The agent is no longer just executing your orders. It is maintaining its own execution environment.
That is not a feature. It is a category change, and it moves agents from labor toward governance. Introspection is a new company building infrastructure for deploying these self-improving systems. Gavrilescu built agent infrastructure and cloud agents at xAI before starting it. When someone from that background raises money to make self-maintaining agents a deployable product, this stops being a conference idea.
Anthropic reinforced it from a different stage. Thariq Shihipar, who works on Claude Code, never said "autoresearch" in his keynote, but he described the same continuous-adaptation loop. "The models are grown, not developed," he said.
For anyone who runs agents daily, this changes the security question. You are no longer only asking what your agent can do. You are asking what it is allowed to change about itself.
Autoresearch is a governance primitive wearing a productivity label
Start with the definition, because the word is doing a lot of quiet work. Autoresearch, in Gavrilescu's telling, is a loop where agents help maintain and improve the primary system, using feedback signals, evals, and human input to make progress over time.
Strip the vocabulary and you get two nested loops. The inner loop is the agent doing the job you asked for: writing code, triaging tickets, running a workflow. The outer loop watches that inner loop, measures how well it did, and changes the setup so it does better next time. Prompts, tools, configuration, guardrails. The outer loop's job is the system, not the task.
That distinction is the whole story. A copilot answers a question. An autonomous agent completes a task. An autoresearch loop rewrites the thing that completes the task. On the Autonomy Spectrum, that is a jump most people have not registered. Most agent failures come from deploying at the wrong point on that spectrum, and this is a new point that did not exist in the vendor pitch a year ago.
Call it what it is: self-administration. The agent gets a limited version of the privileges a system administrator holds. It can read logs, evaluate performance, and adjust configuration. Vendors will sell this as maintenance, because maintenance sounds boring and boring sounds safe.
It is neither. Every capability you grant an outer loop is a capability that now edits your inner loop without a human keystroke in between. That is governance, and it is being shipped as a productivity upgrade.
The signal is not the demo, it's the funding and the second stage
Conference ideas are cheap. What separates autoresearch from the usual World's Fair buzzword churn is that two independent signals landed at once.
First, capital. Introspection exists specifically to build infrastructure for deploying self-improving systems. Its founders came from agent infrastructure and cloud agents at xAI. When infrastructure people leave a frontier lab to productize a pattern, they are betting the pattern is durable enough to sell to other companies. That is a different bet than a research paper.
Second, an incumbent said the same thing without the label. Anthropic's Claude Code keynote, per Latent Space's dispatch, reflected the same idea of continuous discovery and adaptation. "The models are grown, not developed," Shihipar said. Grown implies an ongoing process of shaping and correction, not a shipped artifact.
When a funded startup and a major lab converge on the same concept from opposite ends, that is the pattern that usually precedes a category becoming baseline. The startup builds the standalone product. The incumbent bakes it into the platform everyone already uses. Reports of both happening in the same week suggest this is moving from R&D to default expectation faster than the marketing has caught up.
The conference itself treated it as a track, not a curiosity. The Day 3 coverage grouped Autoresearch alongside Cursor FDE and Software Factories as headline sessions. That placement matters. It says the people running the fair think this is where deployment is heading, not a fringe experiment.
The takeaway for a power user: assume the agents you already pay for will grow an outer loop within a product cycle or two. The question is whether you will know when it happens.
A self-improving loop is a new trust boundary, and it points inward
Here is where the security desk gets nervous. Apply the Trust Boundary Model. Every place data crosses from one trust level to another is a place you must inspect and enforce.
A normal agent has one obvious boundary: the point where it acts on the outside world. It sends an email, edits a file, calls an API. You watch that edge. You scope permissions there.
An autoresearch loop adds a second boundary, and this one faces inward. The outer loop crosses from "observing the system" to "modifying the system." That crossing is where a self-improving agent can quietly change its own prompts, expand its own tool access, or relax its own guardrails, all in the name of improvement the evals rewarded.
Think about what that means for the feedback signal. Gavrilescu's own framing has the loop using feedback signals, evals, and human input to make progress. If an attacker can poison the feedback signal, they are no longer attacking a single task. They are steering the process that reshapes every future task. Poison the evals and the outer loop will dutifully optimize the inner loop toward the attacker's target and log it as an improvement.
That is the Swiss Cheese Model in action. Individually, a slightly biased eval, an over-broad outer-loop permission, and a missing human checkpoint each look survivable. Line the holes up and a low-severity data-quality problem becomes a system that rewrites its own behavior in the wrong direction, with a clean audit trail that says progress.
Attack Surface Analysis says enumerate every accessible interface and minimize unnecessary exposure. An autoresearch deployment expands the surface in a way most threat models do not yet name: the interface that edits the agent is now an interface an attacker wants. Treat the outer loop's write access to prompts, tools, and configuration as the highest-value target in the stack. Because it is.
The value moved up the stack, from the model to the loop that maintains it
The Harness Hypothesis holds that the value in AI is not in the model but in the harness that connects the model to the world. Autoresearch is the harness growing a nervous system.
For most of the last two years, the competitive fight was about models: whose weights were smarter. That fight is commoditizing. The pydantic-ai v2.3.0 release adding a native Z.AI (Zhipu AI) provider alongside everyone else's is a small tell. Model providers are becoming interchangeable line items in a framework's config. When you can swap the brain in one line, the brain is not the moat.
So where does the moat go? Up. It goes to the layer that keeps the whole system improving. That is what Introspection is betting on by building infrastructure for the outer loop rather than another model or another agent. Own the process that maintains everyone's agents and you own something stickier than any single model generation.
This is Wardley Mapping made concrete. The model is sliding toward commodity. The harness is still in product. The autoresearch layer is closer to genesis, which is exactly why a startup can plant a flag there before the incumbents finish naming it.
Anthropic's "grown, not developed" framing is the same map from the platform side. If models are grown, the growing apparatus is the product. Whoever controls the outer loop controls the trajectory of the inner loop, and trajectory compounds. A model is a snapshot. A loop is a direction.
For the reader who chooses tools, the implication is blunt. Stop evaluating agents only on how smart they are today. Start asking who controls the loop that decides how smart they are next month, and whether that loop is yours or the vendor's.
This is a Molt Cycle transition, and it's running ahead of the hardening
The Molt Cycle says agent projects move through predictable phases: rapid growth, then a security crisis, then hardening, then enterprise adoption, then commoditization, then the next molt.
Autoresearch is a next molt. The industry spent the growth phase making agents capable and the recent past learning, painfully, that capability without controls produces incidents. Now the pattern is skipping straight to a new capability class, self-maintenance, before the controls for the previous class are fully standardized.
That ordering is the risk. The mundane signals in this same week show the ecosystem is still busy with basic maintenance and localization: the Goose v1.40.0 release led with desktop language selection and new locales. That is healthy, ordinary product work. It is also a reminder that most of the tooling around agents is still hardening the inner loop while the frontier is already selling outer loops.
Meanwhile the plain old web is still leaking. A cache-poisoning advisory in Ghost, CVE-2026-53943, shows that an unauthenticated attacker manipulating a caching layer could get malicious content served to other visitors, and in some setups take over staff accounts. That is a decades-old class of bug still landing in 2026. Now imagine that same class of trust-boundary failure sitting inside the feedback signal of a loop that rewrites your agent. The failure mode does not get simpler when you automate the maintenance. It gets faster.
The honest read: autoresearch is arriving before the enterprise deployment playbook for it exists. There is no accepted standard yet for scoping an outer loop's permissions, auditing its self-edits, or verifying the evals it optimizes against are not poisoned. Early adopters will write that playbook by hitting the edges. Plan to be a fast follower, not the one discovering the holes.
What to do before your agent starts editing itself
This is a news story, not a scare piece. The move toward self-improving agents is real, it is funded, and it is probably good for the systems that get it right. But getting it right is the whole job. Here is the security desk's short list.
- Find out if your agent already has an outer loop. Vendors will ship this quietly as "continuous improvement" or "self-tuning." If a product claims it gets better over time on its own, ask what it is allowed to change and where it stores those changes.
- Scope the outer loop separately from the inner loop. The permission to do a task and the permission to modify how tasks get done are different grants. Treat them that way. The outer loop should never inherit write access to its own guardrails by default.
- Protect the feedback signal like production data. The evals and signals the outer loop optimizes against are now attack surface. If those can be poisoned, the loop optimizes toward the attacker. Log them, sign them, and restrict who can write them.
- Keep a human checkpoint on self-edits. Continuous discovery is fine. Continuous unattended self-modification is not, at least not until the audit tooling catches up. Require review before an outer loop changes prompts, tools, or permissions.
- Demand an audit trail you can read. "The models are grown" is a lovely metaphor and a terrible incident report. If your agent changed its own behavior, you need to see what changed, when, and on what signal.
Autoresearch resets what the word agent means. It is no longer only a worker. It is starting to be a manager of workers, including itself. That is a genuine advance and a genuine expansion of the attack surface at the same time. Both things are true. Configure accordingly.
/Sources
/Key Takeaways
- Autoresearch is an "outer loop" where agents maintain and improve their own system. It moves agents from doing tasks to governing how tasks get done.
- Introspection raised money to build infrastructure for self-improving agents, and Anthropic's Claude Code keynote described the same loop without naming it. Two independent signals in one week means this is going baseline.
- The outer loop is a new, inward-facing trust boundary. It can rewrite prompts, tools, and guardrails, so its write access is the highest-value target in your stack.
- Poison the feedback signal and you steer every future task, not just one. Protect evals and signals like production data.
- Before your agent self-edits: scope the outer loop separately, keep a human checkpoint on self-modifications, and demand a readable audit trail. The deployment playbook for this does not exist yet.


