The follow of tokenmaxxing seems to be dying out, even earlier than I had an opportunity to write down about it. Good riddance. Burning tokens to create the looks of productiveness was fated to final solely till the accountants discovered about it, and the strictest of all accountants is one’s private checkbook. What received many builders eager about the price of AI was the change in GitHub Copilot’s utilization prices. The price of Copilot went from a month-to-month payment with limitless use to a month-to-month payment that bought a restricted variety of credit, that are used to pay the AI supplier of your alternative. One credit score is equal to US$0.01; while you’ve used up your credit, you may improve your account or pay for added credit as you go.
The query isn’t why this didn’t occur earlier; it’s why this occurred now. Tokenmaxxing is each the creation and sufferer of two large-scale traits in AI. First, beginning with OpenAI, the foremost AI suppliers had been all taking part in a blitzscaling recreation that prioritized consumer development over profitability. Giving AI providers away totally free received you extra customers, and in the long term, scalers would work out how you can earn cash from end-user charges, promoting consumer information, or promoting. This course of inevitably ends in enshittification, and remains to be very a lot the street we’re on.
Second, token utilization exploded late in 2025. The looks of “reasoning fashions,” which use tokens to take care of an inside dialog in the middle of fixing an issue, elevated the variety of tokens used to reply to every immediate. Reasoning tokens are a mannequin’s dialog with itself about potential responses to the immediate, and are sometimes extra quite a few than the immediate and response themselves. Whether or not or not customers see the reasoning course of (typically they don’t), reasoning tokens add to the invoice. They’re steadily counted as “output tokens” as a result of they’re generated by the mannequin, and are costlier than enter tokens.
The looks of brokers additionally multiplied the speed at which customers consumed tokens. In Might, 2025, Simon Willison quoted Anthropic’s Hannah Moran’s definition of an agent: “Brokers are fashions utilizing instruments in a loop.” The Tredence weblog writes: “The agent loop is a repeating cycle during which the AI reads the present information, thinks by way of what it means, chooses an motion, carries it out, checks what occurs and begins over.” For those who’ve ever watched Claude Code, OpenClaw, or every other agent work, a single request can turn out to be many calls to a mannequin, each utilizing lots of of tokens, if not hundreds. Along with the present request, one agent-generated invocation can comprise the duty’s whole accrued context and related paperwork. Between reasoning tokens and brokers, token utilization goes up by an element of lots of.
The rise in token utilization won’t be a difficulty if it leads to issues being solved and duties accomplished extra successfully. But it surely collides with the loss-leader pricing of the blitzscalers; their willingness to function at a loss to achieve management of a market has limits. No matter whether or not the variety of AI customers is rising, the quantity of computation, and subsequently price, per consumer grows as using brokers will increase. Reasoning fashions elevated token utilization; brokers compounded the issue; and that led to cost will increase.1 Microsoft/GitHub doesn’t need to pay Copilot clients’ AI payments. We haven’t but seen across-the-board value will increase from the AI suppliers themselves. However now we have seen GitHub’s token credit, and now we have seen Anthropic and OpenAI value extra succesful fashions considerably larger than older or much less succesful fashions. Fable is twice as costly as Opus 4.8, and whereas some writers have referred to as this pricing “implausible,” that’s in all probability as a result of they had been anticipating a good better enhance. Whereas Fable can delegate duties to Anthropic’s inexpensive fashions, most early customers observe that with Fable, token use goes up quite than down. Anthropic’s swap to token-based billing for its agent SDK (at the moment on maintain) is one other sign that the times of cheap AI are coming to an finish. OpenAI’s story is analogous: GPT 5.5 prices twice as a lot GPT 5.4 per million tokens.
It’s additionally vital to take capability into consideration. Enormous information facilities have been within the information, however these information facilities haven’t been constructed but. Extra vital, {the electrical} infrastructure wanted to help these information facilities—transmission traces, mills—hasn’t been constructed both, and that’s not an funding over which AI corporations have a lot management. They will construct their very own energy era amenities on a knowledge middle campus, however that’s an enormous funding in applied sciences that they’re not aware of. And even if you happen to generate energy regionally, you want other forms of infrastructure: rail for coal, pipelines for gasoline. This isn’t (but) an essay about information middle energy consumption and its penalties, however it’s one other issue that limits elevated token utilization. We’ve seen Anthropic’s outages blamed on capability, and Anthropic has responded by leasing unused information middle capability from SpaceX. However the different method to reply to elevated demand that may’t be met by present capability is to extend costs, limiting clients to those that can afford to pay. That enhance is being seen by managers, accountants, and unbiased builders.
Token optimization and accountability are the inevitable consequence of upward stress on token value. One method to construct accountability is thru higher governance, a route Bennie Haelen describes in “The Subsidy Ended: What Software-Utilizing Brokers Really Value.” Higher governance is achieved by way of constructing an observability layer that permits you to see precisely what the brokers and fashions are doing. With a well-designed observability layer, you may see whether or not the info despatched to the mannequin is rising with every invocation, whether or not the mannequin is utilizing acceptable instruments, whether or not instruments are being referred to as repeatedly, and numerous different info that may let you know whether or not your agent is working effectively.
One other piece of token accountability is knowing which fashions are working your agent’s requests. Common-purpose reasoning fashions vary from costly high-performance fashions like Claude Fable or Opus 4.8 to fashions like Gemma 4 26B that may run on a well-equipped laptop computer, and a few fashions which might be even smaller. Whereas it’s tempting to say “I would like the very best; I’ll run Opus 4.8 or Fable with most reasoning,” most requests don’t require that degree of reasoning or expense. Brokers will have the ability to determine what mannequin is greatest for processing each request. Fable can delegate, and we anticipate different frontier suppliers to observe as fashions incorporate agent capabilities. And there’s an energetic world of open fashions exterior of the frontier AI suppliers. Vicki Boykis writes that fashions working regionally now work virtually in addition to frontier fashions. Instruments like OpenRouter provide you with a model-independent method of routing requests to totally different fashions, together with open fashions that run regionally. OpenRouter may be built-in with OpenClaw, Claude Code, Cursor, Codex, and different brokers to supply clever routing.
Tokenmaxxing is dying. It can little doubt take time for its vestiges to die away, and there’ll all the time be builders who assume they’ll recreation the trail to a promotion, together with managers who insist on being “all in” with AI. However spending tokens responsibly is now the norm, whether or not you pay with your individual checkbook or an organization account. Token optimization will solely turn out to be extra vital as per-token prices enhance. They undoubtedly will.
