Your AI Agent Already Forgot Half of What You Instructed It – O’Reilly

May 28, 2026

35

That is the seventh article in a sequence on agentic engineering and AI-driven growth. Learn half one right here, half two right here, half three right here, half 4 right here, half 5 right here, and half six right here.

That is the newest article in my Radar sequence on AI-driven growth and agentic engineering, and I’ve to confess that this one took a little bit of a flip I wasn’t anticipating.

In my final article I talked about context and context administration and I promised to offer you some actual sensible ideas for utilizing it. It was initially meant to be about particular, sensible context administration strategies that had been actually useful to me constructing Octobatch and the High quality Playbook, two open supply initiatives the place I work with AIs to plan and orchestrate all the work and each line of code is written by AI instruments like Claude Code and Cursor.

However as I used to be scripting this, I discovered that I’d tailored those self same strategies to my work writing articles like this one. Which is stunning! I’ve been doing all this work discovering methods to assist individuals creating AI expertise enhance context administration, so their expertise run extra effectively. It seems that those self same precise strategies apply to anybody utilizing AI instruments, even while you’re utilizing chatbots like Claude.ai or ChatGPT.

Full disclosure: I exploit a number of AI instruments to handle this text sequence. My major instruments are Claude Cowork for brainstorming and managing my article analysis, notes, and backlog and Gemini’s cellular app for studying drafts aloud and taking my notes whereas I’m away from my desk. And I wish to let you know about one thing that occurred whereas I used to be utilizing these instruments, as a result of I believe it actually helps present why context administration isn’t only a downside for builders.

Whereas I used to be writing this text, I used to be utilizing Gemini’s cellular app to learn the draft aloud and take my notes. Partway by way of the session I requested it to return and verify whether or not there have been earlier notes it hadn’t included but. It instructed me it didn’t have entry to the earlier notes, which appeared bizarre and insane, since we had simply taken these notes a couple of prompts earlier within the session. I may scroll again up and see them earlier within the dialog, however in some way it didn’t “know” about them.

Right here’s what occurred. Gemini had compacted our dialog with out telling me, and the notes from the primary half of the session had been simply… gone.

If you happen to’ve ever had an internet chat AI simply appear to neglect stuff you talked about earlier, you’ve skilled context compaction, similar to I did. Understanding even the fundamentals of context and context home windows could make a giant distinction in stopping that form of frustration.

This all jogged my memory of one thing I wrote greater than 20 years in the past in Utilized Software program Undertaking Administration (again in 2005!): “Vital data is found throughout the dialogue that the staff might want to refer again to throughout the growth course of, and if that data shouldn’t be written down, the staff should have the dialogue yet again.”

Jenny Greene and I wrote that about human groups and undertaking conferences, however it applies to AI classes simply as properly.

Which brings me again to context, which I wrote about in my final article, and which I’ll write extra about within the subsequent one, as a result of it’s probably the most essential ideas to maintain prime of thoughts when working with AI.

Context loss could also be invisible, however that doesn’t make it any much less irritating

Context is the whole lot the AI is holding in its working reminiscence throughout a dialog: what you’ve instructed it, what it’s instructed you, any recordsdata or directions it’s learn, and no matter inside notes the system has made alongside the way in which. All of that lives in a fixed-size context window—consider that as your AI’s short-term reminiscence, the stuff it’s fascinated with proper now—and when the window fills up, the AI has to start out letting issues go. Totally different instruments deal with this otherwise: Some truncate older messages, some compress the dialog right into a abstract (which suggests particulars get misplaced regardless that the abstract seems full), and a few simply begin behaving inconsistently so you’ll be able to’t inform whether or not the AI forgot one thing or by no means understood it within the first place. The consequence is identical: The AI loses observe of stuff you instructed it, selections you made collectively, or particulars it seen earlier within the session. And it received’t let you know it forgot. It’ll simply hold producing confident-sounding output primarily based on no matter it nonetheless has.

Earlier than we dive in a bit deeper, I wish to do a fast jargon verify. If you happen to’ve seen the phrases “expertise” and “brokers” floating round however aren’t positive what they’re, consider expertise as libraries for AIs and brokers as interactive executables. These aren’t completely exact definitions, however if you happen to’re a developer they’re shut sufficient for this dialogue.

Whenever you’re coding expertise and brokers, you run into context issues rapidly. The work you’re asking the AI to do is usually advanced sufficient that the context window fills up, and the AI has to start out compacting: compressing or dropping older components of the dialog to make room for brand new ones. Compaction all the time appears to occur on the most irritating and inconvenient time, which is sensible when you concentrate on it. You hit context limits exactly while you’ve put essentially the most data into the dialog, which is strictly when shedding that data prices you essentially the most.

That’s why I believe it could possibly usually assist to consider AIs as having the identical shortcomings that human groups do, besides these shortcomings are exaggerated by their AI nature. An individual who forgets one thing from a gathering final week would possibly bear in mind it while you remind them. An AI that misplaced one thing to context compaction received’t, as a result of the data is gone. However there’s one thing you are able to do about it, and it seems the strategies that assist are the identical whether or not you’re constructing autonomous AI expertise or simply attempting to get a chatbot to recollect what you instructed it 20 minutes in the past.

I’ve landed on 4 strategies that I come again to over and over. Each exists as a result of sooner or later the AI forgot one thing essential and I responded by placing that factor in a file the place it couldn’t be forgotten. None of them require particular tooling. And to my shock, all of those strategies have turned out to be helpful for each constructing software program and managing a writing undertaking like this one, whether or not I’m chatting with Claude, ChatGPT, or Gemini, or utilizing a desktop software like Claude Cowork or Codex. These are the strategies I discover most useful:

Cut up discovery from documentation: Don’t ask the AI to determine one thing out and produce polished output in the identical move.
Use handoff paperwork, not continuation prompts: Earlier than closing a stale session, have the AI write down the whole lot the following session must know.
Give the AI an acceptance criterion, not a process: Inform it what “finished” seems like as a substitute of spelling out the steps.
Use spec paperwork because the bridge between AI instruments: Make a shared doc the one supply of fact that each one your instruments learn from.

Cut up discovery from documentation

Whenever you ask an AI to do one thing advanced, you’re usually asking it to do two issues without delay with out realizing it. You’re asking it to determine one thing out and produce polished output on the identical time. The issue is that figuring issues out takes consideration, and producing output takes consideration, and the mannequin solely has a lot of it. Whenever you mix each duties in the identical immediate, the mannequin begins slicing corners on considered one of them, and you may’t inform which one it shortchanged.

I bumped into this with the High quality Playbook, an open supply AI coding talent I constructed that runs structured code opinions towards any codebase. One of many issues it does is derive necessities from supply code: It reads by way of the code, identifies what the code guarantees to do (I name these behavioral contracts), after which produces a necessities doc. Initially this all occurred in a single move. The issue was that single-pass requirement technology ran out of consideration after about 70 necessities. The mannequin forgot behavioral contracts it had seen earlier within the code, and the forgetting was fully invisible. There was no stack hint or error message, simply incomplete output and no strategy to know what was lacking. I fastened it by splitting the work into two separate prompts:

Learn every supply file and write down each behavioral contract you observe as a easy record in CONTRACTS.md.

Learn CONTRACTS.md and the documentation, then derive necessities from them and write REQUIREMENTS.md.

Then a 3rd move checks whether or not each contract has a corresponding requirement, and if there are gaps, goes again to the first step for the recordsdata with gaps.

The important thing concept is that CONTRACTS.md is exterior reminiscence. When the mannequin “forgets” a few behavioral contract it seen earlier, that forgetting is often invisible. With a contracts file, each statement is written down earlier than any necessities work begins, so an uncovered contract is a visual, greppable hole. You possibly can see what was forgotten and repair it.

The precept: Don’t ask the AI to determine what exists and write formatted output in the identical move. The mannequin runs out of consideration attempting to do each without delay. Everytime you’re asking an AI to do one thing advanced, contemplate whether or not you’re truly asking it to do two issues without delay. “Analyze this codebase and write a report” is 2 duties. “Learn this doc and recommend enhancements” is 2 duties. Cut up them, and let the primary move write its observations to a file earlier than the second move begins working with them.

Use handoff paperwork, not continuation prompts

Anybody who’s spent a protracted session with an AI coding software has felt the second when the context begins to go stale. The AI stops monitoring particulars it was dealing with wonderful an hour in the past, or it contradicts one thing it mentioned earlier. The session will get sluggish, and also you’re usually restarting as a result of the AI appears to have gotten slowed down and stuffed up on what you instructed it. You get the sense that if you happen to hold going, you’re going to spend extra time correcting it than making progress.

Most builders reply to their session getting too lengthy in considered one of two methods: They push by way of the issue, or they begin a recent one and attempt to reexplain the whole lot from scratch. Each of these approaches could cause the AI to lose context. The primary loses it to compaction; the second loses it to incomplete reexplanation. And each are irritating! Particularly since you simply spent a lot time increase all that context with the AI.

There’s a 3rd choice. Earlier than you shut the session, ask the AI to put in writing a handoff doc: a file that captures the whole lot the following session must know, written whereas the present session nonetheless has full context. The secret’s that you simply’re asking the AI to put in writing this whereas the related particulars are nonetheless recent within the working context, and in a approach that it or one other AI can learn.

I constructed this into the High quality Playbook as a core a part of how phases talk. After I break up the playbook from a single immediate to unbiased phases, I wanted every section to run as a totally unbiased session with no context carryover. So every section bought its personal kickoff immediate as a standalone file. Right here’s the construction every one follows:

Write a handoff doc {that a} recent session may use to select up this work chilly. Embody the whole lot it might must know.

Each kickoff opens with what prior phases completed, contains specific boundaries about what’s frozen, and names which future section owns each bit of remaining work, as a result of with out it the AI will helpfully begin doing Section 3 work when you’re nonetheless in Section 2. Every section additionally ends with a required forward-looking handoff the place the finishing agent writes down what the following session must know.

The precept: Every handoff is a whole state snapshot. The incoming AI agent by no means must learn prior kickoff prompts or chat historical past. Every little thing it wants is within the present handoff file: present state, uncommitted modifications, rapid subsequent job, pending duties, file places, and something that was found throughout the prior session. A recent AI session can choose it up chilly.

If you happen to’re deep right into a Claude Code or Copilot session and you may really feel the context getting stale, ask the AI to put in writing a handoff doc earlier than you shut the session. Inform it to incorporate the whole lot a recent session would want to proceed the work. Then begin a brand new session and level it at that file. A recent session with a great handoff doc will normally outperform a stale session, as a result of it’s beginning with clear context as a substitute of compacted, fragmented context.

Give the AI an acceptance criterion, not a process

Whenever you give an AI a multistep job, the pure intuition is to spell out the steps. First do that, then try this, then mix the outcomes. The issue is that step-by-step procedures are the very first thing the AI forgets when the context window fills up. It’ll skip steps, merge phases, or quietly drop duties, and there’s nothing within the process itself that may assist the AI discover what it missed. The process tells the AI what to do, however it doesn’t inform the AI what “finished” seems like.

I realized this the arduous approach with the High quality Playbook. The playbook runs a number of iteration passes over a codebase, and the outcomes have to be cumulative. It retains an inventory of all of the bugs it finds within the code being examined in a file referred to as BUGS.md. Early on, I gave the AI a process to run 4 occasions after which replace that file:

First run the primary move, then run 4 iteration passes, then merge the findings into BUGS.md.

The AI didn’t reply properly to that instruction.

It seems that while you ask an AI to do a really advanced job a selected variety of occasions, it could possibly lose depend. Actually, from my experimentation, it appears that evidently depend is without doubt one of the first casualties of context compaction. More often than not the AI determined three iterations was sufficient, or merged findings from solely two passes, and irrespective of what number of other ways I attempted to rephrase that instruction, there was nothing I may give you that prevented the issue.

Nonetheless, the whole lot modified once I changed the “run 4 occasions” instruction with an acceptance criterion, or a selected situation that tells the AI when to cease looping:

You might be finished solely when BUGS.md incorporates the cumulative findings from the primary run plus all 4 itration passes.

Even when the AI misplaced observe of intermediate steps, it may verify the output towards the criterion and know whether or not it was completed. And I may confirm the output towards the identical criterion, which gave me a strategy to audit the agent’s work with out watching each step.

In developer phrases, the AI is absolutely unhealthy at loops like for (i = 0; i < 4; i++) as a result of it loses observe of the worth of the iterator i when it compacts its context. Nevertheless it’s actually good at loops like whereas (!finished) as a result of it could possibly verify finished primarily based on the present state with out counting on historical past.

The precept behind all that is that an acceptance criterion survives context strain as a result of the AI can all the time verify “Am I finished?” towards a concrete take a look at. That is truly the identical precept behind test-driven growth: write the take a look at earlier than the code so you recognize while you’re finished. The acceptance criterion is the take a look at to your AI session. Whenever you’re giving an AI a job that has a number of steps, don’t describe the steps. Describe what “finished” seems like, and let the AI determine get there.

Use spec paperwork because the bridge between AI instruments

Most builders working with AI don’t use only one software. You would possibly use Claude for design, Cursor for coding, and Copilot for fast edits. You would possibly even use a number of fashions inside the identical software, like GPT-5.5 and Opus 4.7 in separate Copilot chats inside VS Code. It’s widespread to have one mannequin for coding, one other for evaluate, and a 3rd for orchestration and undertaking administration. The issue is that none of those instruments or chats know what you instructed the others. Claude doesn’t know what you determined with Cursor. Two separate Copilot chats in the identical editor don’t share context. You’re the one carrying context between them, and that’s precisely the form of lossy handoff that causes drift. A design determination you made in a single dialog will get misplaced or distorted by the point it reaches the software that should implement it.

The repair is to make the spec doc the one supply of fact that each one your AI instruments learn from. I used this when constructing a recreation prototype, the place I had Claude dealing with design and planning and Cursor doing the coding. They by no means talked to one another immediately, so the spec paperwork served because the shared contract: Claude wrote the specs, and Cursor learn them. The rule I adopted was easy:

By no means inform the AI coder one thing that isn’t already within the specs. If you happen to make a design determination in dialog, write it into the spec first, then level the coder on the spec.

If I made a design determination in a dialog with Claude, that call needed to be written into the spec earlier than I instructed Cursor about it. If I found one thing throughout implementation, I wrote it into the suitable doc first, then pointed the coder at it. The spec was all the time the one supply of fact. When Claude and I modified the wound topology (eradicating one wound kind, selling one other), we up to date the docs first, then instructed Cursor to reread them. Once we determined so as to add a brand new UI factor, we wrote it into the UI spec first, then instructed Cursor to reread the doc.

The important thing was together with rationale within the specs. Not simply “present 5 progressive labels” however why: “The participant shouldn’t be instructed what they’re preventing. They need to uncover it.” This helps the AI coder make higher selections when the spec doesn’t cowl an edge case as a result of it is aware of the intent behind the requirement.

The precept: The spec doc is the shared context that each one your instruments can learn. It prevents the drift that occurs when design intent lives solely in chat historical past that the opposite software can’t see. This method works any time you’re utilizing a couple of AI software on the identical undertaking, which at this level is most initiatives.

How these strategies mix: Managing this text sequence

These 4 practices got here out of AI-driven growth work, however they apply to nearly any AI work. And whereas these strategies emerged for me whereas engaged on brokers and expertise, I believe it’s helpful to display them in a nondevelopment context, so I’ll share an instance from my work on the article sequence you’re studying now.

Over time, the method for a way my AI assistant and I handle this text backlog developed organically in dialog, however it was by no means written down wherever besides within the AI’s context window. Which implies each time the session compacted or I began a recent chat, the method was gone and I needed to reexplain it. I caught this when the AI did one thing barely flawed and I wished to verify we had been on the identical web page. So I requested:

Each time I recommend a brand new article concept, you add an entry to the backlog, after which create a brand new markdown file with the supply materials, proper?

That’s break up discovery from documentation. I didn’t say “doc our course of.” I mentioned “affirm what we do.” Discovery first, then documentation as a separate step. If I’d mentioned “write up our course of” with out confirming first, the AI might need written one thing believable however flawed, and I wouldn’t have caught the discrepancy.

As soon as we’d confirmed the method, I requested the AI to create two recordsdata. AGENTS.md is an rising normal for AI-readable undertaking context—a single file that tells any AI session what it must learn about a undertaking. You possibly can study extra concerning the conference at brokers.md. CONTEXT.md serves the same position as a bootstrapping doc—it’s much less established as a regular, however the follow of asking the AI to dump the whole lot it is aware of right into a context file so the following session can choose it up chilly has been probably the most helpful habits I’ve developed. Right here’s the immediate I used:

Replace the backlog file to elucidate what it’s and the way we keep it. Create a CONTEXT.md with the whole lot you’d must bootstrap a brand new chat. Create an AGENTS.md to make it simple to bootstrap with a single-line immediate.

That immediate is a handoff doc. I used to be explicitly asking the AI to put in writing down the whole lot it knew whereas it nonetheless had full context, particularly as a result of I knew that context can be misplaced to compaction. The CONTEXT.md file is a handoff from this session to no matter recent session picks up the work subsequent week.

Discover what I didn’t say. I didn’t give step-by-step directions for what ought to go in these recordsdata. I mentioned “the whole lot you would want to bootstrap this course of once more in case we misplaced it” and “a whole dump of all the context you would want to bootstrap a brand new chat and get it to the purpose the place this present chat is.” These are acceptance standards, not procedures. The AI had to determine what belonged in these recordsdata. If I’d given it a process (“first write the publication historical past, then the voice guidelines, then the file places”), it might have adopted the record and missed something I forgot to incorporate. The acceptance criterion is tougher to fulfill however extra sturdy: the take a look at is “Might a recent session bootstrap from these recordsdata alone?”

And the AGENTS.md file itself is a spec doc as a bridge between instruments. It’s the shared contract that any AI session, whether or not it’s Claude, Gemini, Cowork, or a recent chat, can learn to get aligned with the undertaking. This session wrote it; the following session reads it. The 2 classes by no means talk immediately, so the spec file bridges the hole between them.

That’s all 4 practices in two prompts, utilized to one thing as extraordinary as managing a writing undertaking. It didn’t require pipelines or codebases or batch orchestration. The practices work as a result of they remedy the identical underlying downside whatever the area: essential data dwelling within the AI’s context window as a substitute of on disk.

Context administration is a growth talent

Each follow I’ve described on this article and the final one is one thing builders have all the time been instructed to do: write issues down, report your rationale, be deliberate about what you save and what you let go, write ADRs and design docs and inline feedback explaining nonobvious decisions. We’ve all the time identified we should always do extra of it. Whenever you’re working with AI, the price of not doing it turns into rapid and visual.

The practices on this article all come right down to the identical factor: placing the essential data in recordsdata the place compaction can’t contact it, so you’ll be able to see what the AI is aware of and confirm that it matches actuality. Within the subsequent article, I’ll go deeper on the debugging angle: use externalized recordsdata to know what your AI is definitely doing, with sensible strategies that work even if you happen to’re not constructing brokers however are simply utilizing a chatbot.

The High quality Playbook is open supply and works with GitHub Copilot, Cursor, and Claude Code. It’s additionally obtainable as a part of awesome-copilot.

Disclosure: Points of the strategy described on this article are the topic of US Provisional Patent Utility No. 64/044,178, filed April 20, 2026 by the creator. The open supply High quality Playbook undertaking (Apache 2.0) features a patent grant to customers of that undertaking below the phrases of the Apache 2.0 license.

Your AI Agent Already Forgot Half of What You Instructed It – O’Reilly

Context loss could also be invisible, however that doesn’t make it any much less irritating

Cut up discovery from documentation

Use handoff paperwork, not continuation prompts

Give the AI an acceptance criterion, not a process

Use spec paperwork because the bridge between AI instruments

How these strategies mix: Managing this text sequence

Context administration is a growth talent

Related Articles

What Evil Useless Burn’s Field Workplace Means For The Future Of The Franchise

6 Greatest Net Content material Administration Software program I’d Use in 2026

15 Meals That Spoil Quicker Than You Assume

LEAVE A REPLY Cancel reply

Latest Articles

What Evil Useless Burn’s Field Workplace Means For The Future Of The Franchise

6 Greatest Net Content material Administration Software program I’d Use in 2026

15 Meals That Spoil Quicker Than You Assume

Within the wake of China’s “embodied AI” push, there’s an urgency amongst China’s 100+ humanoid startups to launch IPOs; LimX Dynamics raises $200M in...

Tenting in Cyprus by Campervan: Guidelines, Campsites, and Life on the Street