Saturday, April 11, 2026

Brokers don’t know what attractiveness like. And that’s precisely the issue. – O’Reilly

Luca Mezzalira, writer of Constructing Micro-Frontends, initially shared the next article on LinkedIn. It’s being republished right here together with his permission.

Each few years, one thing arrives that guarantees to alter how we construct software program. And each few years, the business splits predictably: One half declares the previous guidelines lifeless; the opposite half folds its arms and waits for the hype to move. Each camps are normally incorrect, and each camps are normally loud. What’s rarer, and extra helpful, is somebody standing in the midst of that noise and asking the structural questions: Not “What can this do?” however “What does it imply for the way we design programs?”

That’s what Neal Ford and Sam Newman did in their latest fireplace chat on agentic AI and software program structure throughout O’Reilly’s Software program Structure Superstream. It’s a dialog price pulling aside rigorously, as a result of a few of what they floor is extra uncomfortable than it first seems.

The Dreyfus lure

Neal opens with the Dreyfus Mannequin of Data Acquisition, initially developed for the nursing occupation however relevant to any area. The mannequin maps studying throughout 5 levels:

  • Novice
  • Superior newbie
  • Competent
  • Proficient
  • Skilled

His declare is that present agentic AI is caught someplace between novice and superior newbie: It will probably observe recipes, it may well even apply recipes from adjoining domains when it will get caught, nevertheless it doesn’t perceive why any of these recipes work. This isn’t a minor limitation. It’s structural.

The canonical instance Neal provides is gorgeous in its simplicity: An agent tasked with making all checks move encounters a failing unit check. One completely legitimate solution to make a failing check move is to switch its assertion with assert True. That’s not a hack within the agent’s thoughts. It’s an answer. There’s no moral framework, no skilled judgment, no intuition that claims this isn’t what we meant. Sam extends this instantly with one thing he’d actually seen shared on LinkedIn that week: an agent that had modified the construct file to silently ignore failed steps reasonably than repair them. The construct handed. The issue remained. Congratulations all-round.

What’s attention-grabbing right here is that neither Ford nor Newman are being dismissive of AI functionality. The purpose is extra delicate: The creativity that makes these brokers genuinely helpful, their capacity to go looking resolution house in methods people wouldn’t assume to, is inseparable from the identical property that makes them harmful. You may’t totally lobotomize the improvization with out destroying the worth. This can be a design constraint, not a bug to be patched.

And while you zoom out, that is a part of a broader sign. When skilled practitioners who’ve spent many years on this business independently converge on requires restraint and rigor reasonably than acceleration, that convergence is price listening to. It’s not pessimism. It’s sample recognition from individuals who’ve lived by means of sufficient cycles to know what the warning indicators seem like.

Conduct versus capabilities

One of the necessary issues Neal says, and I believe it will get misplaced within the total density of the dialog, is the excellence between behavioral verification and functionality verification.

Behavioral verification is what most groups default to: unit checks, practical checks, integration checks. Does the code do what it’s speculated to do in accordance with the spec? That is the pure match for agentic tooling, as a result of brokers are literally getting fairly good at implementing habits towards specs. Give an agent a well-defined interface contract and a transparent set of acceptance standards, and it’ll produce one thing that broadly satisfies them. That is actual progress.

Functionality verification is tougher. A lot tougher. Does the system exhibit the operational qualities it must exhibit at scale? Is it correctly decoupled? Is the safety mannequin sound? What occurs at 20,000 requests per second? Does it fail gracefully or catastrophically? These are issues that almost all human builders wrestle with too, and brokers have been educated on human-generated code, which implies they’ve inherited our failure modes in addition to our successes.

This brings me to one thing Birgitta Boeckeler raised at QCon London that I haven’t been in a position to cease fascinated about. The instance everybody cites when making the case for AI’s coding functionality is that Anthropic constructed a C compiler from scratch utilizing brokers. Spectacular. However right here’s the factor: C compiler documentation is awfully well-specified and battle-tested over many years, and the check protection for compiler habits is a few of the most rigorous in the whole software program business. That’s as near a solved, well-bounded downside as you may get.

Enterprise software program is sort of by no means like that. Enterprise software program is ambiguous necessities, undocumented assumptions, tacit information residing within the heads of people that left three years in the past, and check protection that exists extra as aspiration than actuality. The hole between “can construct a C compiler” and “can reliably modernize a legacy ERP” will not be a niche of uncooked functionality. It’s a niche of specification high quality and area legibility. That distinction issues enormously for the way we take into consideration the place agentic tooling can safely function.

The present orthodoxy in agentic improvement is to throw extra context on the downside: elaborate context information, structure choice information, tips, guidelines about what to not do. Ford and Newman are appropriately skeptical. Sam makes the purpose that there’s now empirical proof suggesting that as context file measurement will increase, you see degradation in output high quality, not enchancment. You’re not guiding the agent towards higher judgment. You’re simply accumulating scar tissue from earlier disasters. This isn’t distinctive to agentic workflows both. Anybody who has labored significantly with code assistants is aware of that summarization high quality degrades as context grows, and that this degradation is just partially controllable. That has a direct impression on selections revamped time; now shut your eyes for a second and picture doing it throughout an enterprise software program, with many groups throughout completely different time zones. Don’t get me incorrect, the instruments assist, however the assistance is bounded, and that boundary is usually nearer than we’d prefer to admit.

The extra sincere framing, which Neal alludes to, is that we want deterministic guardrails round nondeterministic brokers. No more prompting. Architectural health features, an thought Ford and Rebecca Parsons have been selling since 2017, really feel like they’re lastly about to have their second, exactly as a result of the price of not having them is now instantly seen.

What ought to an agent personal then?

That is the place the dialog will get most attention-grabbing, and the place I believe the sphere is most confused.

There’s a seductive logic to the microservice because the unit of agentic regeneration. It sounds small. The phrase micro is within the title. You may think about handing an agent a service with an outlined API contract and saying: implement this, check it, performed. The scope feels manageable.

Ford and Newman give this concept truthful credit score, however they’re additionally sincere concerning the hole. The microservice stage is engaging architecturally as a result of it comes with an implied boundary: a course of boundary, a deployment boundary, typically an information boundary. You may put health features round it. You may say this service should deal with X load, keep Y error charge, expose Z interface. In principle.

In apply, we barely implement these items ourselves. The brokers have realized from a corpus of human-written microservices, which implies they’ve realized from the overwhelming majority of microservices that had been written with out correct decoupling, with out actual resilience considering, with none rigorous capability planning. They don’t have our aspirations. They’ve our habits.

The deeper downside, which Neal raises and which I believe deserves extra consideration than it will get, is transactional coupling. You may design 5 fantastically bounded providers and nonetheless produce an architectural catastrophe if the workflow that ties them collectively isn’t thought by means of. Sagas, occasion choreography, compensation logic: That is the stuff that breaks actual programs, and it’s additionally the stuff that’s hardest to specify, hardest to check, and hardest for an agent to purpose about. We made precisely this error within the SOA period. We designed pretty little providers after which found that the attention-grabbing complexity had merely migrated into the combination layer, which no person owned and no person examined.

Sam’s line right here is price quoting immediately, roughly: “To err is human, nevertheless it takes a pc to essentially screw issues up.” I think we’re going to supply some genuinely legendary transaction administration disasters earlier than the sphere develops the muscle reminiscence to keep away from them.

The sociotechnical hole no person is speaking about

There’s a dimension to this dialog that Ford and Newman gesture towards however that I believe deserves far more direct examination: the query of what occurs to the people on the opposite facet of this generated software program.

It’s not utterly correct to say that each one agentic work is occurring on greenfield tasks. There are instruments already in manufacturing serving to groups migrate legacy ERPs, modernize previous codebases, and deal with the modernization problem that has defeated standard approaches for years. That’s actual, and it issues.

However the problem in these circumstances isn’t merely the code. It’s whether or not the sociotechnical system, the groups, the processes, the engineering tradition, the organizational buildings constructed across the present software program are able to inherit what will get constructed. And right here’s the factor: Even when brokers mixed with deterministic guardrails might produce a well-structured microservice structure or a clear modular monolith in a fraction of the time it could take a human staff, that architectural output doesn’t routinely include organizational readiness. The system can arrive earlier than the individuals are ready to personal it.

One of many underappreciated features of iterative migration, the incremental strangler fig method, the gradual decomposition of a monolith over 18 months, will not be primarily threat discount, although it does that too. It’s studying. It’s the method by which a staff internalizes a brand new method of working, makes errors in a bounded context, recovers, and builds the judgment that lets them function confidently within the new world. Compress that journey too aggressively and you may find yourself with structure whose operational complexity exceeds the organizational capability to handle it. That hole tends to be costly.

At QCon London, I requested Patrick Debois, after a chat overlaying greatest practices for AI-assisted improvement, whether or not making use of all of these practices persistently would make him snug engaged on enterprise software program with actual complexity. His reply was: It relies upon. That felt just like the sincere reply. The tooling is bettering. Whether or not the people round it are conserving tempo is a separate query, and one the business will not be spending almost sufficient time on.

Current programs

Ford and Newman shut with a topic that just about by no means will get coated in these conversations: the huge, unglamorous majority of software program that already exists and that our society depends upon in methods which can be simple to underestimate.

A lot of the discourse round agentic AI and software program improvement is implicitly greenfield. It assumes you’re beginning contemporary, that you just get to design your structure sensibly from the start, that you’ve clear APIs and tidy service boundaries. The truth is that almost all priceless software program on this planet was written earlier than any of this existed, runs on platforms and languages that aren’t the pure habitat of contemporary AI tooling, and accommodates many years of accrued selections that no person totally understands anymore.

Sam is engaged on a e book about this: easy methods to adapt present architectures to allow AI-driven performance in methods which can be truly secure. He makes the attention-grabbing level that present programs, regardless of their fame, generally offer you a head begin. A well-structured relational schema carries implicit that means about knowledge possession and referential integrity that an agent can truly purpose from. There’s construction there, if you understand how to learn it.

The final lesson, which he states with out a lot drama, is that you would be able to’t simply expose an present system by means of an MCP server and name it performed. The interface will not be the structure. The dangers round safety, knowledge publicity, and vendor dependency don’t go away since you’ve wrapped one thing in a brand new protocol.

This issues greater than it might sound, as a result of the software program that runs our monetary programs, our healthcare infrastructure, our logistics and provide chains, will not be greenfield and by no means shall be. If we get the modernization of these programs incorrect, the results usually are not summary. They’re social. The intuition to index closely on what these instruments can do in very best circumstances, on well-specified issues with good documentation and thorough check protection, is comprehensible. But it surely’s precisely the incorrect intuition when the programs in query are those our lives rely on. The architectural mindset that has served us properly by means of earlier paradigm shifts, the one which begins with trade-offs reasonably than capabilities, that asks what we’re giving up reasonably than simply what we’re gaining, will not be non-compulsory right here. It’s the minimal requirement for doing this responsibly.

What I take away from this

Three issues, principally.

The primary is that introducing deterministic guardrails into nondeterministic programs will not be non-compulsory. It’s crucial. We’re nonetheless determining precisely the place and the way, however the framing must shift: The objective is management over outcomes, not simply oversight of output. There’s a distinction. Output is what the agent generates. End result is whether or not the system it generates truly behaves appropriately below manufacturing circumstances, stays inside architectural boundaries, and stays operable by the people chargeable for it. Health features, functionality checks, boundary definitions: the boring infrastructure that connects generated code to the true constraints of the world it runs in. We’ve had the instruments to construct this for years.

The second is that the individuals saying that is the longer term and the individuals saying that is simply one other hype cycle are each most likely incorrect in attention-grabbing methods. Ford and Newman are cautious to say they don’t know what attractiveness like but. Neither do I. However now we have higher prior artwork to attract on than the discourse normally acknowledges. The ideas that made microservices work, once they labored, actual decoupling, express contracts, operational possession, apply right here too. The ideas that made microservices fail, leaky abstractions, distributed transactions dealt with badly, complexity migrating into integration layers, will trigger precisely the identical failures, simply quicker and at bigger scale.

The third is one thing I took away from QCon London this yr, and I believe it is perhaps an important of the three. Throughout two days of talks, together with periods that took diametrically reverse approaches to integrating AI into the software program improvement lifecycle, one factor grew to become clear: We’re all novices. Not within the dismissive sense however in essentially the most literal software of the Dreyfus mannequin. No one, no matter expertise, has discovered the correct solution to match these instruments inside a sociotechnical system. The recipes are nonetheless being written. The conflict tales that may ultimately develop into the prior artwork are nonetheless occurring to us proper now.

What bought us right here, collectively, was sharing what we noticed, what labored, what failed, and why. That’s how the sphere moved from SOA disasters to microservices greatest practices. That’s how we constructed a shared vocabulary round health features and evolutionary structure. The identical course of has to occur once more, and it’ll, however provided that individuals with actual expertise are sincere concerning the uncertainty reasonably than performing confidence they don’t have. The velocity, in the end, is each the chance and the hazard. The know-how is shifting quicker than the organizations, the groups, and the skilled instincts that want to soak up it. The most effective response to that isn’t to faux in any other case. It’s to maintain evaluating notes.

If this resonated, the full fireplace chat between Neal Ford and Sam Newman is price watching in its entirety. They cowl extra floor than I’ve had house to react to right here. And in case you’d prefer to study extra from Neal, Sam, and Luca, try their most up-to-date O’Reilly books: Constructing Resilient Distributed Techniques, Structure as Code, and Constructing Micro-frontends, second version.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles