Back to News
June 3, 2026Blog

Spend on Purpose: Controlling AI Cost in the Agent Era

Kevin RileyKevin Riley· CEO & Co-Founder

A company's accidental $500 million Claude invoice highlights how easily unmonitored AI usage can spiral out of control. Because autonomous agents trigger complex, cascading workflows that can consume up to 1,000 times as many tokens as standard human queries, traditional spreadsheet budgeting is completely obsolete. To survive this shift, organizations must implement real-time tracking, policy-based model routing, and strict quotas to ensure AI spend is driven by purpose and proven ROI, as they get in actAVA.ai.

Spend on Purpose: Controlling AI Cost in the Agent Era

In May 2026, an AI consultant told Axios that one of their clients spent half a billion dollars on Claude in a single month. The cause was not an expensive model. It was a missing control: nobody had set usage limits on employee licenses, and nobody could see what the spend was buying until the invoice arrived.

This is not an isolated story. Uber's CEO has said there is no clear link yet between rising AI token spend and shipping a useful product. Amazon scrapped an internal AI usage leaderboard after employees ran needless tasks to climb it. The pattern is consistent: organizations are spending heavily on AI and struggling to connect that spend to value.

The lesson is not "AI costs too much." The lesson is that AI spending without attribution, routing, and limits is an infrastructure gap — and that gap is about to get much larger.

THE SCALE SHIFT

A Sidekick Is Predictable. An Agent Is Not.

A person using an AI sidekick generates roughly predictable, visible spend. An agent does not. A single agentic workflow can fan out into hundreds or thousands of model calls and tool invocations. Industry reporting now puts agentic token consumption at up to 1,000× a standard query.

That means the cost-control approach that barely worked for sidekicks — a license count and a quarterly review — breaks completely for agents. You cannot govern a thousand-fold increase in nonlinear, machine-generated spend with a spreadsheet. It has to be instrumented at the point where the spend happens: the agent.

$500M
Spent on Claude AI in a single month — accidentally — due to missing usage controls on employee licenses
1,000×
Reported multiplier for agentic token consumption vs. a standard single-turn AI query
0
Clear link identified by Uber's CEO between rising AI token spend and shipping a successful product
TCAC
Total Cost of Agent Capacity — actAVA's per-agent cost metric that ties every dollar to the work it performed
A NEW CATEGORY OF SPEND

Non-Human Resources Need a Salary Line

This concept goes well beyond a simplistic ROI measure. Agents are digital co-workers that organizations pay a salary — in token cost — to either perform work or augment a human worker. CFOs justify and compare every salary dollar spent on their human workforce. They must start thinking the same way for their non-human counterparts.

This post is the first in a series we are launching to help CFOs understand how to properly budget for the proliferation of their non-human resources. The framing matters: you are not buying software licenses. You are paying wages to a workforce that scales nonlinearly, works continuously, and generates a bill with every action it takes.

Spend without judgment is the real risk. The $500M story is not a story about a company that paid too much for AI. It is a story about a company that handed a corporate card to a workforce with no approval flow, no attribution, and no statement until the bill arrived. That is a governance failure wearing a technology costume.
FOUR MOVES

Four Moves That Put You in Control

Here is how to think about getting a return on your agents at the highest level. These are not features to buy — they are structural practices that separate governed AI spend from ungoverned AI spend.

01

Reframe the problem

The risk is not the size of the bill. It is spend you cannot see, route, or tie to an outcome. Fixing visibility comes first — everything else builds on it.

02

Attribute every dollar at the agent level

Every run must be tied to the work it performed. Total Cost of Agent Capacity (TCAC) is the per-agent view of what capacity actually costs — so cost is never a mystery discovered after the fact.

03

Route work and set hard caps

Policy-based model routing matches model tier to the value and complexity of the task — routine work runs on lower-cost models, high-stakes work reaches frontier models. Enterprise quota management sets hard usage limits per team and per agent. This is the exact control the $500M company did not have.

04

Prove the return, continuously

Agent Spend/ROI dashboards tie spend to hours saved, dollars saved, and run cost. Define the ROI assumptions up front and measure against them as agents run — so the business case is evidence, not a projection.

THE INFERENCE COST CURVE

Own Your Cost Curve. Don't Just Watch the Meter.

Most tools help you watch the meter. The structural win is driving the cost down, so the meter matters less. Because KORA is model-agnostic, customers can A/B test quality against cost for every agent and decide what level of inference they are willing to pay for — by workflow, and even by employee group.

Over time, that path leads to customer-owned models that are post-trained on the customer's data and workflows, so the expensive frontier call becomes unnecessary for most routine work. That is the inference cost curve in action: not spending less on AI, but spending progressively less per unit of work as the system learns what each task actually requires.

Cost Per Unit of Agent Work — Over Time with Governance Governance Maturity → (Visibility → Attribution → Routing → Owned Models) Cost per Unit of Work Ungoverned Visibility Attribution Routing Owned Models Governed spend Recoverable cost gap
As governance maturity increases — from visibility through attribution, routing policy, and eventually owned models — the cost per unit of agent work declines. The gap between the ungoverned baseline and the governed curve is recoverable value.
HOW ACTAVA APPROACHES THIS

What This Looks Like Inside KORA

Quota Management has been a priority at actAVA since day one. Admins can set usage limits, monitor consumption, and allocate agent resources across teams and departments with precision. They can also track performance and adjust distribution in real time based on demand and strategic priorities.

KORA's context window tracks token usage across the models customers deploy in real time — computing a usage percentage from total tokens over max tokens, with a breakdown across system prompt, messages, native tools, MCP tools, and skills. Every agent run is attributed. Every dollar has a job description.

ROI tracking at the agent level is built in — not as an afterthought or a reporting add-on, but as the foundation of how capacity is allocated. Spend dashboards tie run cost to hours saved and dollars saved in real time, so the business case is measured against actual outcomes, not projected at the start of a pilot and then abandoned.

actAVA adds a platform cost on top of the token cost. The case for that is straightforward: governed spend, minus our fee, beats ungoverned spend. On top of that, the platform replaces separate tooling you would otherwise buy and maintain for evaluation, monitoring, voice, and reporting. Total cost of ownership — not token price alone — is the right comparison.
THE BOTTOM LINE

The Companies That Win Won't Have the Biggest Model

They will be the ones who control cost-per-outcome, and who can prove it to their board. The next phase of enterprise AI is not about finding a bigger model. It is about knowing what your agents are doing, what it costs, and what it returns — at the agent level, in real time, with hard limits in place before the invoice arrives.

The goal was never to spend less. It is to spend on purpose — and prove the return.

This is the first post in our CFO-focused series on budgeting for non-human resources in the agent era. Future posts will go deeper on TCAC methodology, model routing strategy, and how to structure the ROI assumptions that your board will actually believe.


Sources: Business Insider, Amazon says it shut down a token leaderboard: 'Don't use AI just to use AI', Brent D. Griffiths · Mint.com, When AI costs spiral: A company accidentally spent $500 million in one month on Claude AI, Sayantani Biswas · Yahoo Finance, Uber chief warns no link yet between AI tokenmaxxing and shipping successful products, Mark Tyson