Monday, June 22, 2026

Loop Engineering – O’Reilly

The next article initially appeared on Addy Osmani’s weblog and is being reposted right here with the creator’s permission.

Loop engineering is changing your self as the one who prompts the agent. You design the system that does it as an alternative. A loop right here might be considered a recursive purpose the place you outline a function and the AI iterates till full. I imagine this can be the way forward for how we work with coding brokers. Nevertheless, it’s nonetheless early; I’m skeptical, and also you completely have to be cautious about token prices (utilization patterns can differ wildly if you’re token wealthy or poor), so I wish to unpack what it’s and what it means.

Peter Steinberger lately mentioned: “You shouldn’t be prompting coding brokers anymore. You ought to be designing loops that immediate your brokers.” Equally, Boris Cherny, head of Claude Code at Anthropic, mentioned, “I don’t immediate Claude anymore. I’ve loops working that immediate Claude and determining what to do. My job is to write down loops”.

Okay, so what does any of that imply?

For like two years, the way in which you bought one thing out of a coding agent was you wrote a superb immediate and shared sufficient context. You kind a factor, you learn what got here again, you kind the subsequent factor. The agent is a device and you’re holding it the whole time, one flip after the opposite. That half is form of over, or at the very least some suppose it’s going to be.

Now you construct a small system that finds the work, palms it out, checks it, writes down what is completed after which decides the subsequent factor, and also you let that system poke the brokers as an alternative of you. I wrote earlier than in regards to the cousin of this, agent harness engineering, which is making the setting one single agent runs inside and the manufacturing unit mannequin—the system that builds the software program. Loop engineering sits one ground above the harness. The harness but it surely runs on a timer, it spawns little helpers, and it feeds itself.

The factor that stunned me is this isn’t actually a device factor anymore. A yr in the past in case you wished a loop you wrote a pile of bash and also you maintained that pile endlessly and it was yours and solely yours. Now the items simply ship contained in the merchandise. Steinberger’s listing maps nearly precisely onto the Codex app, after which nearly the identical onto Claude Code. And when you discover the form is similar, you cease arguing about which device. You simply design a loop that also works irrespective of which one you occur to be sitting in.

The 5 items, after which notes

A loop wants 5 issues after which one place to recollect stuff. Let me listing it first after which map it.

  1. Automations that go off on a schedule and do discovery and triage by themselves
  2. Worktrees so two brokers working in parallel don’t step on one another
  3. Abilities to write down down the challenge information the agent would in any other case simply guess
  4. Plugins and connectors to plug the agent into the instruments you already use
  5. Subagents so one among them has the thought and a special one checks it

Then the sixth factor, the reminiscence. A Markdown file, or a Linear board, something that lives outdoors the one dialog and holds what’s achieved and what’s subsequent. Sounds too dumb to matter. Nevertheless it’s the identical trick each long-running agent relies on, and I went into it in “Lengthy-Operating Brokers”: The mannequin forgets all the pieces between runs so the reminiscence needs to be on disk and never within the context. The agent forgets; the repo doesn’t.

Each merchandise have all 5 now.

Primitive Job within the loop Codex app Claude Code
Automations Discovery + triage on a schedule Automations tab: choose challenge, immediate, cadence, setting; outcomes land in a Triage inbox; /purpose for run-until-done Scheduled duties and cron, /loop, /purpose, hooks, GitHub Actions
Worktrees Isolate parallel options Constructed-in worktree per thread git worktree, --worktree, isolation: worktree on a subagent
Abilities Codify challenge information Agent Abilities (SKILL.md), invoked with $title or implicitly Agent Abilities (SKILL.md)
Plugins and connectors Join your instruments Connectors (MCP) plus plugins for distribution MCP servers plus plugins
Subagents Ideate and confirm Subagents outlined as TOML in .codex/brokers/ Activity subagents in .claude/brokers/, agent groups
State observe what’s achieved Markdown or Linear through a connector Markdown (AGENTS.md, progress recordsdata) or Linear through MCP

The names are a bit completely different right here and there, however the functionality is similar factor. Let me go one after the other as a result of truthfully the small print are the place a loop both holds collectively or quietly leaks in every single place.

Automations, that is the heartbeat

Automations are what make a loop an precise loop and never only one run you probably did as soon as. Within the Codex app you make one within the Automations tab and also you choose the challenge, the immediate it should run, how usually, and if it runs in your native checkout or on a background worktree. The runs that discover one thing go to a Triage inbox, and the runs that discover nothing simply archive themselves which is sweet. OpenAI makes use of them internally for boring stuff like each day subject triage, summarizing CI failures, writing commit briefings, and searching bugs someone added final week. And an automation can name a talent, so you retain the recurring factor maintainable; you hearth $skill-name as an alternative of pasting a large wall of directions right into a schedule that no one will ever replace.

Claude Code will get to the identical place however by scheduling and hooks. You’ll be able to run a immediate or a command on a interval with /loop, you may schedule a cron job, you may hearth shell instructions at sure factors within the agent lifecycle with hooks, otherwise you push the entire thing to GitHub Actions if you would like it to maintain working after you shut the laptop computer. Identical thought precisely, you outline an autonomous job, you give it a cadence, and the findings come to you so you aren’t the one going round checking.

There’s a second in-session primitive value understanding, and it’s the one nearer to what this entire publish is about. /loop re-runs on a cadence. /purpose retains going till a situation you wrote is definitely true, and after each flip a separate small mannequin checks whether or not you’re achieved, so the agent that wrote the code isn’t the one grading it. You give it one thing like “all checks in check/auth go and lint is clear” and stroll away. Codex has the identical factor, additionally known as /purpose: It retains working throughout turns till a verifiable stopping situation holds, with pause and resume and clear. Identical primitive, each instruments, which is form of the sample for this entire article.

So that is the half that surfaces the work. The remainder of the loop is what acts on it.

Worktrees, so parallel doesn’t flip into chaos

The second you run multiple agent, the recordsdata begin colliding; that turns into the failure. Two brokers writing the identical file is the very same headache as two engineers committing to the identical strains and no one talked to one another first. A Git worktree fixes it. It’s a separate working listing by itself department sharing the identical repo historical past, so one agent’s edits actually can’t contact the opposite one’s checkout.

Codex builds the worktree help proper in so a number of threads hit the identical repo directly and don’t stumble upon one another. Claude Code offers you an identical isolation with git worktree, a --worktree flag to open a session in its personal checkout, and a isolation: worktree setting you stick on a subagent so every helper will get a recent checkout that cleans itself up after. (I wrote in regards to the human facet of all this in “The Orchestration Tax.”) The worktrees take away the mechanical collision, however YOU are nonetheless the ceiling. Your evaluation of bandwidth decides what number of you may really run, not the device.

Abilities, so that you cease explaining your challenge each single time

A talent is the way you cease reexplaining the identical challenge context each session like a goldfish. Each instruments use the identical format: a folder with a SKILL.md inside holding directions and metadata, after which optionally available scripts, references, and property. Codex runs a talent if you name it with $ or /expertise, or by itself when your job matches the talent description, which is the explanation a good, boring description beats a intelligent one. Claude Code does it the identical method and I wrote the sample up in “Agent Abilities.”

Abilities are additionally the place intent stops costing you time and again. I argued in “The Intent Debt” that an agent begins each session chilly and it’ll fill any gap in your intent with a assured guess. A talent is that intent written down on the skin, the conventions, the construct steps, the “we don’t do it like this due to that one incident,” written one time the place the agent reads it each run. With out expertise the loop rederives your entire challenge from zero each cycle; with expertise it form of compounds.

One factor to maintain straight: The talent is the authoring format, and a plugin is the way you ship it. While you wish to share a talent throughout repos or bundle a number of collectively, you bundle them as a plugin. True in Codex, true in Claude Code.

Plugins and connectors, the loop touches your actual instruments

A loop that may solely see the filesystem is a tiny loop. Connectors, that are constructed on MCP, let the agent learn your subject tracker, question a database, hit a staging API, or drop a message in Slack. Codex and Claude Code each communicate MCP so the connector you wrote for one normally simply works within the different. And plugins bundle connectors and expertise collectively so your teammate installs your setup in a single go as an alternative of rebuilding the entire thing from reminiscence.

That is the distinction between an agent that claims “right here is the repair” and a loop that opens the PR, hyperlinks the Linear ticket, and pings the channel as soon as CI is inexperienced by itself. The connectors are the explanation the loop can act inside your precise setting as an alternative of simply telling you what it might do if it might.

Subagents, maintain the maker away from the checker

Essentially the most helpful structural factor in a loop, by far, is splitting the one who writes from the one who checks. The mannequin that wrote the code is method too good grading its personal homework. A second agent with completely different directions and generally a special mannequin catches the stuff the primary one talked itself into.

Codex solely spawns subagents if you ask, runs them on the similar time, after which folds the outcomes again into one reply. You outline your individual brokers as TOML recordsdata in .codex/brokers/, every with a reputation, an outline, directions, and optionally available mannequin and reasoning effort, so your safety reviewer could be a sturdy mannequin on excessive effort whereas your explorer is a few quick read-only factor. Claude Code does the identical with subagents in .claude/brokers/ and agent groups that go work between them. The same old break up in each is one agent explores, one implements, and one verifies towards the spec.

I made this case twice already, as soon as as “The Code Agent Orchestra” and as soon as as “Adversarial Code Evaluate.” The rationale it issues particularly inside a loop is the loop runs while you’re not watching, so a verifier you really belief is the one motive you may stroll away. Subagents do burn extra tokens since every one does its personal mannequin and gear work, so spend them the place a second opinion is value paying for. That is additionally mainly what Claude Code’s /purpose does below the hood: A recent mannequin decides if the loop is completed as an alternative of the one which did the work, the maker and checker break up utilized to the cease situation itself.

What one loop seems to be like

Stick it collectively and a single thread turns into a bit management panel. Right here is one form I maintain utilizing.

An automation runs each morning on the repo. Its immediate calls a triage talent that reads yesterday’s CI failures, the open points, and the latest commits and writes the findings right into a Markdown file or a Linear board. For every discovering that’s value doing, the thread opens an remoted worktree and sends a subagent to draft the repair, and a second subagent evaluations that draft towards the challenge expertise and the present checks.

Connectors let the loop open the PR and replace the ticket. Something the loop can’t deal with lands within the triage inbox for me. The state file is the backbone of the entire thing; it remembers what bought tried, what handed, and what’s nonetheless open, so tomorrow morning the run picks up the place at the moment stopped.

And have a look at what you really did there. You designed it one time. You didn’t immediate any of these steps. That’s Steinberger’s entire level made actual, and it’s the identical loop in Codex or in Claude Code as a result of the items are the identical items.

What the loop nonetheless doesn’t do for you

The loop modifications the work; it doesn’t delete you from it. And three issues really get sharper because the loop will get higher, not simpler.

Verification continues to be on you. A loop working unattended can also be a loop making errors unattended. The entire motive you break up the verifier subagent from the maker is to make the loop’s “it’s achieved” imply one thing, and even then “achieved” is a declare and never a proof. I maintain saying the identical line from “Code Evaluate within the Age of AI”: Your job is to ship code you confirmed works.

Your understanding nonetheless rots in case you enable it. The quicker the loop ships code you didn’t write, the larger the hole between what exists and what you really get. That’s comprehension debt and a clean loop simply makes it develop quicker until you learn what the loop made.

And the comfy posture is the damaging one. When the loop runs itself, it’s very tempting to cease having an opinion and simply take no matter it offers again. I known as that “cognitive give up.” Designing the loop is the treatment if you do it with judgment and the accelerant if you do it to keep away from considering: similar motion, reverse outcome.

Construct the loop. Keep the engineer.

I believe it is a preview of how our work goes to evolve. That mentioned, if I weren’t reviewing the code myself or if I relied totally on automated loops to repair it, my product’s high quality would endure. I’d seemingly find yourself caught in a downward spiral, repeatedly digging myself right into a deeper gap.

Go forward and arrange your loops, however don’t neglect that prompting your brokers instantly can also be efficient. It’s all about discovering the precise steadiness.

Loops may end in completely different outcomes relying on you. Two individuals can construct the very same loop and get fully reverse outcomes. One makes use of it to maneuver quicker on work they perceive deeply. The opposite makes use of it to keep away from understanding the work in any respect. The loop doesn’t know the distinction. You do.

That’s what makes loop design tougher than immediate engineering. Cherny’s level isn’t that the work bought simpler. It’s that the leverage level moved.

Construct the loop. However construct it like somebody who intends to remain the engineer, not simply the one who presses go.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles