
Dominic Phillips
Software Developer
Your Engineers Have a Burn Rate Now
I pulled up our Anthropic Console on Monday morning to check April spend. 22 days in, just under $4,000 on the clock, cap column set to Unlimited. The total bothered me less than the shape underneath it: two engineers doing the same kind of work on the same codebase, one burning a quiet, disciplined amount of tokens, the other burning 8x as much. Same output, different harness. That is the new line finance is missing. Not AI tools. AI compute.
Salary, laptop, compute
For twenty years the cost of a software engineer was two lines on a spreadsheet. Fully loaded salary. A laptop and a boring bundle of SaaS seats that rounded to a few hundred dollars a month. You could forecast it. You could compare teams against each other. It showed up in the engineering P&L as a tidy, almost uninteresting line item.
There is a third line now, and it behaves nothing like the other two. It is variable, metered, and highly sensitive to configuration choices made inside an IDE at 11pm on a Tuesday. In our portfolio, a single senior engineer’s monthly AI coding stack now ranges from around $50 at the low end to five figures at the top, depending almost entirely on how they set the tools up and how hard they use them. That is a real range, and no one’s CFO is modeling it properly yet.
This is not an argument that AI coding tools are too expensive. In most teams, they are a bargain. The problem is that they turned part of engineering spend from a SaaS-shaped fixed cost into something that behaves like cloud infrastructure. Most operators are still budgeting it like a seat.
The numbers are bigger than you think
Before getting to frameworks, it is worth being concrete about what these bills actually look like in 2026. The screenshot below is from one of our own Enterprise consoles, taken mid morning on the day I started writing this piece. 22 days into April. Unlimited cap. Just under $4,000 already spent, on track to clear $6,000 for the month.
A Cade Partners Anthropic Console, 22 days into April 2026. The spend limit column used to say a dollar number. On Enterprise it now says Unlimited, which is its own editorial statement.
That is one account. The wider market looks like this. A founder and CTO posting on Hacker News in November: $638 on Cursor in six weeks, on track for over $5,500 a year for a single seat, and he described himself as genuinely shocked. Another developer reporting that a friend was burning $2,000 a month on Cursor. A commenter under an Opus thread noting that an agentic loop on the API can run at roughly $100 an hour if left unattended. Inside portfolio companies, we regularly see individual power users clearing $3,000 to $5,000 a month on their own, and one account last quarter cleared $10,000.
None of these are hypothetical. They are the new ceiling. A year ago a senior engineer cost a company fully loaded salary plus maybe $50 a month in tooling. Today the same engineer, used hard, can book five figures a year in tokens before anyone signs anything off. The decision-rights question this raises is real: in almost no organization we work with has anyone formally authorized that spend, and in almost every one, it is happening anyway.
What the new line item actually looks like
The market has split into three pricing shapes, and most engineering teams are already running all three at once without noticing.
Seats with usage inside them. GitHub Copilot Business at $19 per user per month and Enterprise at $39. Cursor Teams at $40. ChatGPT Business standard seats at $25 monthly, or $20 on annual billing. These look familiar to finance because they still start as seats. The catch is that more of the work behind the seat is now metered: included credits first, overages after that, or usage-based Codex seats with no fixed monthly price.
Premium personal subscriptions. Anthropic’s Claude Max plans at $100 and $200 a month. ChatGPT Pro at $200. These are aimed at individuals and they are priced that way, because the vendors know a single power user can extract more value from them than an average team seat. Plenty of your engineers are on these already. Most of them are paying out of pocket. We will come back to that.
Pure usage-based compute. Claude Code billed directly against the Anthropic API. OpenAI’s Codex CLI on the same model. Open-source agents like Aider and Cline that plug into whichever provider you give them keys for. This is the line item that is genuinely new. There is no per-seat number. You are buying compute by the token, the same way you buy it from AWS, and it behaves the same way: cheap when it is disciplined, terrifying when it is not.
The useful mental shift is that these are not really competing products. They are different cost shapes for different kinds of work. Copilot and Cursor are seats with usage hidden inside. Claude Max is a compensation line item disguised as a subscription. Claude Code and Codex are cloud compute with an editor wrapped around them. Operators who treat them as one budget line end up managing none of them well.
| Tool | Shape | Price | Ceiling |
|---|---|---|---|
| GitHub Copilot Business | Seat + credits | $19 user / month | AI credits and paid overages from June 1 |
| GitHub Copilot Enterprise | Seat + credits | $39 user / month | AI credits and paid overages from June 1 |
| Cursor Teams | Seat + usage | $40 user / month | Included usage, usage charges above |
| ChatGPT Business | Seat + optional Codex | $25 monthly / $20 annual | Standard seat, Codex can meter separately |
| Claude Pro | Personal | $20 / month | 5-hour and weekly limits |
| Claude Max 5x | Personal, premium | $100 / month | Roughly ~300M input tokens Sonnet / mo reported |
| Claude Max 20x | Personal, premium | $200 / month | 20x Pro limits, weekly cap |
| ChatGPT Pro | Personal, premium | $200 / month | Generous but throttled under load |
| Anthropic API / Claude Code | Usage-based | Per token | Whatever cap you set. Unlimited on Enterprise. |
| OpenAI Codex / API | Usage-based | Per token | Whatever cap you set |
The same senior engineer, running roughly the same week of work, on common 2026 options. List prices as of May 2026. Caps, credits, and throttling behavior change often.
“The new developer cost line is not the tool seat. It is the meter behind the agent.”
Why the spread is 10x
The first time I saw an engineer burn a four-figure bill in a weekend I assumed something had gone wrong. The second time I assumed the same. By the fifth time I accepted that this is simply what usage-based agentic tooling looks like in the absence of discipline.
The variance is almost entirely a configuration story. A well set up Claude Code or Codex session uses prompt caching, loads only the files it needs, summarizes old conversation turns instead of replaying them, and reserves high-reasoning modes for the problems that actually need them. A poorly set up session does none of that. It re-ingests the monorepo every turn. It fans out to half a dozen MCP servers that each pull thousands of tokens of tool definitions into every single prompt. It leaves reasoning budgets on maximum for tasks that needed a two-line edit. It keeps running background processes whose output quietly drips into context until the window overflows.
The result is a spread that, in our experience inside portfolio companies, runs roughly 10x between the best and worst configured developer on the same team. The worst case is not usually malice or laziness. It is almost always an engineer who installed four MCP servers in the first week because a thread on X told them to, and has never opened the configuration file since.
That gap is the actual management problem. A team where everyone burns $100 is a boring line item. A team where the average is $200 but half the spend comes from two people is a team where you are paying for misconfiguration, not for work.
The conspiracy theories, and what is probably true
If you spend any time reading developer forums about AI coding tools right now, you will notice a second thing going on underneath the cost conversation. A steady drumbeat of engineers claiming that the model they are talking to today is not the model they were talking to a month ago. That Claude got dumber. That the Max plan quietly routes to smaller or quantized variants when demand peaks. That the API gives you the real thing and the subscription gives you something downsampled. That Enterprise is a different tier of service entirely, not just a different SKU.
Almost none of this is officially confirmed. Some of it is probably wrong. A meaningful fraction of it is probably right. It is worth understanding the specific theories in circulation, because they shape how operators should actually route their spend.
The subscription-as-spot-instance theory. A widely shared comment on Hacker News put it cleanly: “Claude subscription is the equivalent of a spot instance. APIs are on-demand. Priority is set to APIs, and leftover compute is used by subscription plans. When there is no capacity, subscriptions are routed to highly quantized cheaper models behind the scenes.” No vendor has confirmed this literally. The shape of the complaints suggests something close to it is happening in practice.
The quantization-on-Max theory. Anthropic has never said the Max plan routes to a lower precision of the same model. Engineers on forums keep testing the same prompts on Max and on the raw API and reporting visibly different outputs. Some of that is almost certainly expectation bias. Some of it is almost certainly real. Either way, if your procurement strategy treats the $200 Max plan and the Enterprise API as interchangeable, you are making an assumption the vendor has not committed to in writing.
The weekly-limits-are-a-nerf narrative. In August 2025 Anthropic emailed Max customers to introduce weekly rate limits alongside the existing five-hour windows. The official framing was that the change affected “less than 5% of users” and was aimed at “policy violations like account sharing and reselling access, and advanced usage patterns like running Claude 24/7 in the background.” The HN thread on the email cleared 600 points in a day. The engineering internet read it as the start of subscription tightening, not a one-off.
The “enterprise is better” theory. There is a persistent view that enterprise and first party API traffic sits on a different, less contended capacity pool with less aggressive throttling, longer effective context, and more consistent model behavior than consumer subscriptions. This is consistent with how every other cloud provider runs tiered SLAs. We cannot prove it holds for Anthropic or OpenAI specifically. We can say that, in a year of routing portfolio traffic through both paths, the API side is noticeably more predictable than the subscription side.
The underlying reality most of these theories are groping at is simpler than any individual rumor. Frontier model vendors are severely compute constrained. Data center buildouts are lagging demand. Subsidized subscription plans are a growth and distribution tool, not a high-margin product. Some HN commenters have estimated they can get the equivalent of $1,000 of Opus API usage out of a $200 Max plan, which, if even approximately right, means Anthropic is losing money on its heaviest subscription customers and has every incentive to find quiet ways to throttle them.
For an operator the implication is not outrage. It is routing. If model quality and consistency matter to your product, put your production and your serious coding load on the API or enterprise tier, not on a personal subscription. If you want the cheap subsidized subscription, treat it as exactly that, a subsidy you are welcome to while it lasts, priced more like a promo than a contract. Build your budget on the assumption that subscription behavior will get worse, not better, over the next twelve months.
Brute force and grounding
There are two mental models floating around inside engineering teams right now for how to use an agentic coding tool, and almost the entire 10x cost spread sits on the difference between them.
The first model is brute force. Hand the agent the whole monorepo. Install every MCP server you have heard of. Leave reasoning on max. Let a background agent run while you sleep. Run three parallel sessions in case one of them gets it right. This is the style we see in the engineers clearing four-figure monthly bills on their own, and it is seductive because it genuinely does sometimes work. The model will often, eventually, stumble into the right answer. The cost of getting there is buried in the invoice.
The second model is grounding. Give the agent the specific files it needs, a short and honest description of the task, one or two real examples of the pattern you want it to follow, and a small set of tools it can actually use. Keep the context narrow. Run the cheapest model that can plausibly do the work, and escalate only when the task demands it. Let the agent ask, not scan. This is the style we see in the engineers burning hundreds rather than thousands, and the difference in output quality is, honestly, in favor of grounding. Brute force produces PRs that technically run. Grounded sessions produce PRs that a human reviewer is willing to approve.
The illustration below is the argument in one picture. Same target. Three ways to reach it.
Harness Quality
Brute force vs grounding
Row 01
Brute force
Whole monorepo in context. Every MCP server loaded. Max reasoning on every task. Most tokens do nothing.
~$4,000 / month
and the PR still needs rework
Row 02
Grounding
Narrow context. Right model for the job. Prompt caching on. Tools the agent actually uses, nothing it does not.
~$400 / month
and the PR ships
Row 03
Grounding + local
The same grounded harness, pointed at an on-prem model. Variable cost collapses. The context never leaves the network.
~$0 per token
inside the network
Brute force sprays context and reasoning at a target and mostly misses. A grounded harness lands in one shot. Pointed at a local model, the same grounded harness stops paying per token at all.
The third row is where this ties back to a point we made in MedGemma and the Local Models Moment. Once you are actually grounded, running the model on an on-prem or on-device stack stops being a research project and becomes a pure cost decision. The hard work of agentic engineering is picking the right context, not paying to re-read it on a hosted API. A grounded harness pointed at a local 4B or 27B model is, for a meaningful slice of coding work, both cheaper and more private than the same harness pointed at a frontier API, and the gap is widening every quarter.
None of this means local replaces hosted. It means the teams that have done the grounding work earn the right to route more of their traffic local, and pocket the difference. The teams still in brute force mode cannot, because a brute force agent pointed at a small local model just fails louder and cheaper. Grounding is the prerequisite. Local is the dividend.
The harness is the new build system
Every decade or so, engineering organizations acquire a new discipline that used to be optional and is now non-negotiable. Source control in the nineties. CI/CD in the 2010s. Observability in the late 2010s. What we are watching now is the same arc for what the Anthropic and OpenAI teams call the harness: the configuration around the model that decides how much of your codebase gets loaded, which tools are available, which model answers which kind of question, and how context is managed across a long session.
The harness is a first-class engineering artifact now. It lives in a repo, carries versions, gets reviewed like any other code, and has measurable quality. The specific levers that matter most in our experience:
Prompt caching on by default. Anthropic’s cache discount cuts repeated-context cost by up to 90%. Teams that forget to enable it are paying full price for the same system prompt thousands of times a week. This is the single biggest controllable lever in the stack.
MCP hygiene. Every MCP server an engineer installs adds tool definitions to every prompt whether or not they are used. We treat MCP servers like npm dependencies: audited, shared at the team level, and pruned aggressively. Installing one is a team decision, not an individual preference.
Model routing, not model loyalty. Haiku or a small OpenAI model for grunt work. The reasoning-heavy frontier models only when the task warrants it. This is the single biggest lever on agent costs after caching, and almost no one defaults to it out of the box.
Context discipline. Narrow the files the agent can see. Keep CLAUDE.md or its equivalent short and specific. Use sub-agents with their own small contexts for isolated tasks rather than dragging the whole session along. Every thousand tokens saved on context is a thousand tokens you are not paying for on every single turn.
None of this is exotic. It is roughly as complicated as writing a good Dockerfile. But it has to be written down, shared, and enforced, and almost no engineering organization we work with has put that discipline in place yet.
The Pro plan arbitrage hiding on your team
Here is the quiet story almost nobody in finance has noticed. A meaningful fraction of your best engineers are paying for AI coding tools out of their own pockets, and using them for work. Claude Pro at $20. Claude Max at $100 or $200. ChatGPT Pro at $200. They are not doing it to cheat. They are doing it because procurement is slow, the enterprise tier they were given is underpowered, and the productivity gap between a disciplined power user and an average seat license is genuinely career-defining.
From an individual engineer’s perspective this is rational. From an operator’s perspective it is a compliance and security failure in slow motion.
No BAA, no zero-retention. Consumer and prosumer subscriptions do not ship with the data agreements your enterprise contract does. In healthcare that means any PHI-adjacent code, schema, or log that touches a personal plan is a notifiable event waiting to happen.
No audit trail. Your security team cannot see what left the building, when, through whose account. A breach investigation that starts with “we think someone pasted part of our codebase into a personal Claude account” is a bad way to start a Tuesday.
No IP clarity. Code generated on a personal subscription, used for company work, tracked against the wrong terms of service, sitting in production. Your general counsel does not want this email.
The answer is not to ban personal plans. That just pushes the behavior further underground. The answer is a reimbursed AI stipend that routes through a corporate-managed account: same model access, same power-user limits, but on your BAA, your SSO, your audit log, your retention policy. The stipend costs roughly what the engineer was already paying. The operational risk drops to something a CISO will actually sign off on.
Put a meter in the path
You cannot manage what you cannot see, and until recently the default visibility into AI coding spend was a monthly invoice and a vague sense of dread. The observability layer for this has matured fast in the last twelve months.
The pattern we are putting into portfolio companies looks roughly the same every time. Route every agent and IDE through an LLM gateway. LiteLLM, Helicone, and Portkey are the three most common choices. The gateway gives you per engineer token visibility, per project cost attribution, hard dollar caps, and a single place to enforce zero-data-retention flags. It turns the bill from an end-of-month surprise into something the team can see while the work is happening.
On the individual engineer side, ccusage and similar small CLIs give developers a local readout of their own Claude Code usage in real time. This matters more than it sounds. Most overspend comes from the moment an agent quietly enters a loop and nobody notices for two hours. A meter ticking on the engineer’s own screen catches this in minutes rather than days.
The more interesting metric, once the basic plumbing is in place, is not dollars. It is tokens per merged pull request. An engineer spending $400 a month and shipping 30 reviewed PRs is well configured. An engineer spending the same amount and shipping 2 is burning money on a thrashing harness. You cannot tell these apart without the gateway. Once you have it, the gap between teams is usually the single most useful signal a VP of engineering gets all quarter.
What I would put in place
We are writing this from inside portfolio companies where we are either running engineering or advising the people who do. The checklist is not complicated. The hard part is getting someone to own it.
1. Put AI compute on its own line
Pull it out of the generic tools budget and give it the same treatment as cloud infrastructure. Separate flat-rate seats, premium subscriptions, and usage-based compute. Forecast each one differently. The first quarter you present it this way, the conversation with finance gets dramatically easier.
2. Route everything through a gateway
No engineer should be hitting a model vendor directly from a corporate device. Every request through LiteLLM or Portkey or Helicone, every key rotated through the gateway, every cap set at the gateway. This is the first thing I would install.
3. Audit the harness, not just the bill
Treat team-level harness configuration as a code artifact: review it, version it, share good CLAUDE.md files across teams, and prune MCP servers quarterly. The spread between a well-configured and a poorly-configured engineer is where your budget is actually leaking, and you cannot see it from the invoice.
4. Offer a stipend, kill the arbitrage
Assume every strong engineer on your team either is, or will be, paying out of pocket for a premium personal plan. Meet them where they are. A $200 monthly AI stipend, gated on use through the gateway, is cheaper than pretending the behavior is not happening and much easier for security to govern.
5. Revisit the budget quarterly, not annually
The underlying models change faster than your annual planning cycle. Prices move. New tiers appear. A model gets deprecated and the one that replaces it is 3x cheaper per token or 2x as capable at the same price. Assume the allocation you set in January will be wrong by April, and treat that as a feature.
The model I use now
A year ago, AI coding tools were a productivity experiment you ran on the side of your engineering budget. Today they are a variable cost line with a range that runs from a rounding error to a real number, and the spread depends on choices your operators have almost certainly not been asked to make yet.
The companies we see doing this well have stopped treating AI coding as a tools problem. They treat it like cloud compute: metered, useful, dangerous when nobody watches it, and boring when the defaults are good. The ones doing it badly are still pretending it is a SaaS seat, and quietly absorbing 10x variance inside a line item they do not look at.
If you are running engineering, or sitting on a board asking what the AI line on the P&L actually represents, get in touch. This is the kind of operating work we do inside portfolio companies. The gap between disciplined and undisciplined teams on this line item is already large enough to show up in the budget.
Cade Newsletter
Research that moves before the market does.
Original analysis on healthcare strategy, AI adoption, and market dynamics. Delivered when we publish.
No spam. Unsubscribe anytime.
