Cat Wu leads product for Claude Code and Cowork at Anthropic, so she’s well-versed in constructing dependable, interpretable, and steerable AI techniques. And since 90% of Anthropic’s code is now written by Claude Code, she’s additionally deeply acquainted with becoming them into routine day-to-day work. Final month, Cat joined Addy Osmani at AI Codecon for a fireplace chat on the way forward for agentic coding and, equally necessary, agentic code assessment, how Anthropic really makes use of the instruments they’re constructing, and what expertise matter now for builders.
The suggestions loop is itself a product
Boris Cherny initially constructed Claude Code as a aspect undertaking to check Anthropic’s APIs. Then he shared the instrument in a pocket book, and inside two months all the firm was utilizing it. That natural development, Cat stated, was a part of what satisfied the staff it was price releasing externally.
However what actually made that inner adoption seen was the response on Anthropic’s inner “dog-fooding” Slack channel. The Claude Code channel will get a brand new message each 5 to 10 minutes across the clock, and this suggestions instantly and instantly informs the product expertise. Cat described it this manner:
We rent for individuals who love sharpening the person expertise. And so lots of our engineers really dwell on this channel and discover when there’s points with new options that they’ve labored on and so they proactively lay out the fixes.
The staff ships new variations of Claude Code to inner customers many instances a day. The suggestions loop is tight sufficient that it features as a steady integration system for product high quality, not simply code high quality.
Cat instructed Addy how she as soon as by chance launched a small interplay bug between prompts and auto-suggestions. However by the point she began engaged on an answer, she discovered one other staff member had already overwhelmed her to it. It seems, he had arrange a scheduled process in Claude Code to scan the suggestions channel for something that hadn’t been responded to in 24 hours and open a PR for it. Since Cat hadn’t gotten to it but (whoops!), her teammate’s Claude noticed the unaddressed subject and stuck it for her. And Cat solely came upon when “[her own] Claude observed that his Claude had already landed a change.”
The infrastructure for fast enchancment, in different phrases, is now partly automated. The brokers are writing the code, then monitoring the suggestions and shutting the loop.
The bottleneck has shifted to assessment
There’s no query that AI-assisted coding has created a growth in output. Anthropic engineers are producing roughly 200% extra code than they have been a yr in the past, Cat famous. At the moment the principle constraint is reviewing all that code to make sure it’s production-ready.
Cat’s staff concluded that you may purchase lots of further robustness for not that a lot further value.
We opted for the heaviest, most strong model [of code review]. We really plot what number of brokers and the way complete of a assessment Claude does after which what number of bugs does it recall. And we picked a variety of very excessive recall and determined we should always ship this, as a result of in case you really need AI code assessment to be a load-bearing a part of your course of, you really most likely simply need essentially the most complete doable assessment.
The assessment agent doesn’t simply have a look at the diff. It traces code throughout a number of recordsdata and catches bugs in adjoining code that has nothing to do with the change in query. Cat gave two examples. One was a ZFS encryption refactor the place the agent discovered a key cache invalidation bug that wasn’t associated to the creator’s change in any respect however would have invalidated it. The opposite was a routine auth replace that turned out to have a foul aspect impact, caught premerge. In each circumstances, engineers manually reviewing the code probably would have missed the bugs.
The human assessment that is still is intentionally small in scope. For many PRs, the human reviewer skims for design precept violations and apparent issues and assumes useful correctness has been dealt with. 5 to 10 brokers run in parallel, every given barely totally different duties, returning independently after which deduplicating what they discovered.
The cultural shift that made this work, although, was possession. The staff moved to a mannequin the place the engineer who authors a PR owns it finish to finish, together with postdeploy bugs, and doesn’t lean on peer reviewers to catch errors. “In any other case,” as Cat identified, “you’ve conditions the place junior engineers put out a bunch of PRs after which your senior engineers are like drowning in AI-generated stuff the place they’re undecided how completely it’s been examined.”
Full possession meant the AI assessment needed to really be reliable, which drove the choice to go for prime recall slightly than a lighter contact. That stated, engineers are nonetheless anticipated to know each line of code an agent creates. . .for now. As Cat defined, it’s the one solution to really stop “unknown safety vulnerabilities and to have the ability to shortly reply to incidents if they’re to occur.”
Everybody’s sort of an engineer now
Cowork, Anthropic’s agent instrument for nontechnical customers, is the corporate’s try to take what Claude Code does for engineers and convey it to data work extra broadly. Cat sketched an image of somebody taking a look at 5 or 6 agent duties operating concurrently in a aspect panel, managing a fleet of brokers the best way a senior engineer manages a PR queue.
Within the nearer-term, she’s holding tabs on the shift towards individuals utilizing Claude Code to construct issues for themselves, their groups, or their households that wouldn’t have justified skilled improvement effort or “in any other case been doable.” The prototype is the storage undertaking, the household expense tracker, the instrument {that a} small staff really wants however that no SaaS product fairly addresses. Cat’s aim and hope is that Claude Code helps individuals “remedy their very own issues for themselves” and “stewards a brand new future of private software program.”
Product style as the brand new technical talent
Extra individuals constructing extra software program is unambiguously good. Boris Cherny has even floated the concept that coding as we all know it’s “solved.” However what does that imply for the craft of software program engineering? Cat’s learn of the present second is extra nuanced:
I believe pre-AI, the abilities that have been essential have been having the ability to take a spec and implement it effectively. And I believe now the actually necessary talent is product style. Even for engineers. Can you employ code to ingest an enormous quantity of person suggestions? Do you’ve good instinct about which function to construct to handle these wants, as a result of it’s typically totally different than precisely what customers are asking you for? After which, when Claude builds it, are you establishing the proper bar in order that what you ship individuals really love?
Cat’s not alone in highlighting the significance of style in a world the place code is a commodity. Steve Yegge, Wes McKinney, and plenty of others, myself included, see style and judgment as a uniquely human worth. This has sensible implications for a way engineers ought to spend their time now, and for what the following era must study.
For junior engineers particularly, Cat described a development: Begin by utilizing Claude Code to know the codebase (ask all of the “dumb questions” with out embarrassment), take these solutions to a senior engineer for calibration, after which shut the loop by updating the CLAUDE.md with no matter was lacking.
Consider Claude Code as your intern that you just’re attempting to stage up. Like, educate it again to Claude. Add a
/confirmslash command. Put it within the CLAUDE.md or the agent README. Strategy this as senior engineers serving to you stage up, and then you definitely serving to Claude and different brokers stage up.
The advance course of, in different phrases, must be bidirectional. Engineers get higher at utilizing the instruments and the instruments get higher via the engineers’ gathered data. And considerably, this course of retains people firmly within the loop, enjoying a task that’s “energetic, steady, and expert.”
You’ll be able to watch Cat and Addy’s full chat, plus all the pieces else from AI Codecon on the O’Reilly studying platform. Not a member? Join a free 10-day trial, no strings hooked up.
