We kicked off our new weekly collection This Week in AI on Monday, and we lined a number of floor in half-hour, together with an AI mannequin that discovered safety holes quicker than many years of human auditing, a knowledge heart in Utah the scale of two Manhattans, and a sensible argument for why the harness you construct round a mannequin now issues greater than which mannequin you decide.
Listed here are a couple of takeaways from the dialog between host Eric Freeman, college member at UT Austin and a longtime pal of O’Reilly, and visitor John Berryman, founding father of Arcturus Labs, an early manufacturing engineer on GitHub Copilot, and coauthor of O’Reilly’s Immediate Engineering for LLMs. Watch your complete episode to seek out out why you ought to be constructing your individual agent and why John believes finally there can be no web for people.
AI’s safety drawback is now a coverage drawback
You’ve most likely already heard about Mythos. Anthropic’s inner testing of the frontier mannequin surfaced hundreds of beforehand unknown safety vulnerabilities throughout main working methods, browsers, and monetary infrastructure, together with a 27-year-old bug in OpenBSD. Anthropic selected to not launch the mannequin publicly and as an alternative launched Venture Glasswing, a restricted program giving monitored entry to a small group of trusted companions for defensive patching.
That call moved quick in Washington. In roughly six weeks, the dialog shifted from the light-touch nationwide AI coverage launched in March to reported White Home discussions of an government order assessment course of modeled on how the FDA handles medicine. Safety researcher Bruce Schneier has questioned whether or not Mythos is uniquely succesful right here or whether or not comparable outcomes are achievable with cheaper public fashions, however as Freeman famous (paraphrasing Schneier), both method, it’s an issue that’s coming.
The compute race is getting stranger
Anthropic leased xAI’s total Colossus 1 supercluster in Memphis: greater than 200,000 GPUs and 300 megawatts of energy. A month earlier than that deal, Anthropic expanded its settlement with Google and Broadcom for 3.5 gigawatts of capability coming on-line in 2027. For context, that’s roughly 10 instances the facility output of the Colossus 1 deal, in a single contract. After this episode aired, Anthropic introduced that that deal has been expanded to Colossus 2 as effectively.
Field Elder County, Utah, simply authorised a 40,000-acre AI information heart known as the Stratos challenge, backed by investor and TV character Kevin O’Leary (a.okay.a. Mr. Great). It’s deliberate for 9 gigawatts at full buildout. That’s a footprint greater than twice the scale of Manhattan, powered by the equal of 9 business nuclear reactors. And like many information heart offers going ahead, together with Colossus above, it was authorised over native protests.
Infrastructure at this unimaginable scale takes years to return on-line, and the businesses making these bets are pricing in a world the place mannequin functionality retains scaling. Whether or not that assumption holds will decide loads about what’s economically viable to construct within the subsequent decade.
The harness issues greater than the mannequin
John was readily available to rethink the agent harness, which as he identified, entered a brand new part with the step change in mannequin functionality that occurred in November and December of final 12 months. He took Eric by way of the arc of AI product growth, from doc completion and chat loops to tool-calling brokers, DAG-based workflows, and now the harness period represented by instruments like Claude Code. Every development added functionality, John famous, but in addition complexity, and every generated a brand new class of issues round reliability and management. In our present second, which John has dubbed the “age of the unharnessed agent,” brokers at the moment are inside attain of everybody, not simply software program builders.
The payoff of this “unharnessed” period is management. John described a shopper engagement the place he changed a bespoke utility with a skills-driven agent. Now area specialists with no growth expertise can learn the agent’s habits written in plain English and higher perceive it. As John defined,
Moderately than constructing a bespoke agent. . ., I simply constructed one thing that was simply the agent harness—the agent—and I simply gave it expertise that describe what principally I discovered in interviewing their specialists, how they’d work with these brokers. And it labored completely. Not solely does the agent keep on monitor and do what it must do today, nevertheless it’s coded, so far as my shopper is worried, in English.
The specialists don’t need to complain to builders “this doesn’t work.” The specialists can have a look at the English description of what’s occurring and see issues, and perhaps even repair it themselves. And I’m actually excited to principally give that energy into the arms of the those who know greatest the way to change it, the specialists.
That’s a unique relationship between the specialists and the software than something a wrapped business product affords.
As Eric identified, current Stanford analysis helps this broader level: Efficiency gaps between a naked mannequin and a well-designed harness now usually matter greater than which underlying mannequin you’re utilizing. The benchmark that used to dominate shopping for choices, which mannequin scores highest, has been displaced by a more durable query about which harness suits the duty.
John closed with a demo of his private agent shifting from an Obsidian pocket book into Wikipedia and again, carrying context throughout environments. He used it as an example an idea he known as the “open agent protocol,” his time period for a not-yet-existing normal the place an agent receives environment-specific expertise because it strikes between contexts. The protocol doesn’t exist but, however the demo made the path clear.
What’s subsequent
Be a part of us and a rotating lineup of professional visitors for weekly dwell software demos and deeper dives into the matters that matter in AI. We’re taking subsequent week off for Memorial Day within the US, however we’ll be again on June 1 with host Andreas Welsch and visitors Maya Mikhailov and Doug Shannon to chop by way of one other week of AI headlines and separate what truly drives enterprise worth from what seems good in a demo however goes nowhere in manufacturing. Our first few episodes are free and open to all in the event you’d wish to attend dwell—register right here.
We’ll proceed to share full episodes and publish our takeaways right here on Radar every Friday. It’s also possible to watch or hear on YouTube, Spotify, Apple, or wherever you get your podcasts.
