Anthropic's Claude Code Max sparked a $5,000/month panic. The real compute cost? Closer to $500. The gap between retail token pricing and actual infrastructure cost is the defining tension of AI-native SaaS — and most PMMs are on the wrong side of it.
Photo by Stephen Dawson on Unsplash
This is Part 1 of the AI Pricing Economics series. Previously: How AI Agents Are Breaking Your Pricing Model covered the what — per-seat is dying, new models are emerging. Credit Anxiety: The Shadow Tax on Developer Productivity covered the how — usage pricing creates psychological friction that kills adoption. This piece covers the why: the infrastructure economics that explain both.
A viral Hacker News post hit 288 points and 207 comments last week. The question: does Anthropic's Claude Code Max plan cost $5,000 per month per user in raw compute?
The answer was no. Not even close.
Anthropic charges $5 per million input tokens and $25 per million output tokens at retail API rates for Opus 4.6. But comparable open-weight models on OpenRouter run at roughly 10% of those rates. That means a heavy Claude Code user burning through what looks like $5,000 in retail API pricing is actually consuming about $500 in compute — possibly less if Anthropic is running optimized inference on its own silicon.
Meanwhile, Anthropic launched Claude Max 20x at $200/month with full Claude Code access. And it's adding premium review workflows like automated security reviews inside Claude Code for higher-value enterprise use cases.
Three price points. Three completely different cost signals. One product.
This is the pricing illusion. And it's not unique to Anthropic. It's the defining structural tension of AI-native SaaS — and most people, including most PMMs, are anchored to the wrong number.
Here's the core problem: customers — including sophisticated technical buyers — anchor to retail token prices when they evaluate AI product economics. They see $25 per million output tokens and do napkin math that makes flat-rate plans look like money-losing charity.
But retail API pricing is not cost. It never was.
The gap between retail price and marginal cost is enormous — and it's the gap that makes every new AI pricing model work. Flat-rate subscriptions, credit bundles, per-outcome pricing — none of these are possible if the company's cost structure matches what the API price page says.
This is why Anthropic can offer $200/month unlimited Claude Code access. This is why OpenAI can sell ChatGPT Pro at $200/month. This is why Perplexity can offer 10,000 credits at the same price point. The retail API rate is a profit center, not a cost floor.
Understanding this gap isn't academic. It's the single most important input to AI product pricing strategy — and most PMMs don't have it.
We've been here before. Not with AI, but with cloud infrastructure.
In the mid-2000s, AWS launched with pay-as-you-go compute. Twilio launched with per-API-call pricing. Stripe charged per transaction. These were the first wave of usage-based SaaS, and the dynamics rhyme with what's happening in AI right now:
What worked: Usage pricing aligned cost with value at scale. Startups could start at $0 and grow into six-figure annual contracts without a single sales call. Twilio grew from $49M to $1.76B in revenue between 2015 and 2021 on the back of per-message and per-minute pricing. Stripe's 2.9% + $0.30 per transaction meant their revenue scaled linearly with customer success.
What broke: Customers who scaled got punished. Large Twilio users built their own telephony infrastructure. Enterprises negotiated committed-use discounts that cut per-unit revenue by 40–60%. AWS introduced Reserved Instances and Savings Plans because pure on-demand pricing created exactly the anxiety I wrote about in Credit Anxiety — the finance team couldn't forecast, so they throttled usage.
What emerged: The hybrid. Base commitment + overage. Reserved capacity + burst pricing. Committed spend tiers with declining per-unit costs. Every mature usage-based company eventually added predictability mechanisms because pure usage pricing has a ceiling: the point where the customer's finance team says enough.
AI pricing is speed-running this exact arc. We're watching a decade of cloud pricing evolution compressed into 18 months.
Here's where the AI story diverges from cloud 1.0. Traditional SaaS has gross margins of 70–85%. The marginal cost of serving one more user on a Salesforce seat or a Notion workspace is almost zero — the software is already built, the infrastructure cost per user is negligible.
AI-native products have a fundamentally different cost structure. Every interaction burns compute. Every prompt is a real cost event. And the cost varies wildly based on model selection, context length, and output complexity.
But — and this is the part that most analysis gets wrong — the cost is falling at a rate that has no precedent in SaaS history.
Consider the trajectory:
| Period | Comparable Capability | Cost per M Output Tokens |
|---|---|---|
| Early 2024 | GPT-4 / Claude 2 era | $60–$120 |
| Late 2024 | GPT-4o / Claude 3.5 Sonnet | $15–$30 |
| Early 2025 | Claude 3.5 / GPT-4o-mini | $5–$15 |
| Early 2026 | Opus 4.6 retail / open-weight equivalents | $2.50–$25 (retail) / $0.25–$2.50 (inference) |
That's a 50–100x cost reduction in two years for equivalent capability. Nothing in enterprise software has ever deflated this fast — not storage, not compute, not bandwidth.
This means two things for pricing strategy:
First, margins are expanding even as prices drop. Anthropic can charge less per user today and make more per user in profit than they could a year ago. The retail API price is dropping, but the gap between retail and cost is widening. This is the opposite of commodity compression — it's margin expansion disguised as price deflation.
Second, any price set today will look expensive in 12 months. This creates an existential challenge for PMMs: how do you anchor customers to value when the cost floor is in freefall? Price too high and you look greedy next quarter. Price too low and you leave margin on the table while costs are still meaningful.
Anthropic's current pricing architecture is a masterclass in navigating this tension — whether or not it was designed that way.
The API layer ($5/$25 per million tokens for Opus 4.6) is the profit engine. It's priced at 5–10x marginal cost. This is where Anthropic makes money from developers and companies building on the platform. It's also the anchor price that makes everything else look like a deal.
Claude Max 20x ($200/month, unlimited Claude Code) is the adoption play. At $200/month with heavy coding use, a power user might consume $300–$800 in retail API value per month. But the actual compute cost is $30–$80. Anthropic is profitable on most Max users and underwater on the true power users — the classic insurance model. The 80% of users who consume moderately subsidize the 20% who go all out.
Automated security reviews are the outcome-priced wedge. This isn't framed as tokens or minutes of compute. It's framed around a high-value review workflow that can catch costly issues before production. That is the direction of travel for AI pricing: charge around outcomes buyers care about, not raw model consumption.
Three layers. Three pricing philosophies. One product surface. This is what the pricing model evolution looks like in practice — not a clean migration from per-seat to usage-based, but a simultaneous multi-model strategy where different segments get different cost structures.
If you're a PMM or product leader pricing AI capabilities right now, the perception gap is your most dangerous blind spot. Here's why, and what to do about it.
Your finance team is probably modeling AI feature costs based on API pricing. That's like modeling cloud costs based on on-demand EC2 rates — technically accurate for a single instance, wildly misleading at scale. Build your cost models on committed/negotiated rates, not retail. If you're using a third-party model provider, ask for volume pricing. If you're running open-weight models, benchmark actual inference cost on your infrastructure.
The HN thread about Claude Code's costs was driven entirely by token-price anchoring. Nobody in that thread asked "how much value does a month of Claude Code create?" — they only asked "how many tokens does it burn?" This is a messaging failure. If your pricing page leads with tokens-per-dollar, you're inviting customers to do the wrong math. Lead with outcomes: reviews completed, code shipped, hours saved.
Most SaaS companies model AI features as margin-compressive — "this costs us real money per interaction, so margins will shrink." That's true in the short term and catastrophically wrong in the medium term. Model cost is falling 50%+ annually. If you set prices today based on today's cost, you'll be running 90%+ margins on those features within 18 months. Build that into your pricing architecture. Consider declining credit costs, loyalty pricing, or volume tiers that let you share the windfall with customers who stick around.
The gap between what customers think AI costs and what it actually costs is a positioning asset. A $200/month flat-rate plan that customers believe delivers $5,000/month of compute value is an incredibly strong value proposition — but only if you let the anchoring work in your favor. Don't correct the perception. Don't publish your cost structure. Let the retail API price be the reference point that makes your subscription plans look like bargains.
The real threat to AI pricing isn't competition from other proprietary vendors. It's open-weight models closing the capability gap at 10% of the price. When a fine-tuned Llama or Mistral model can do 90% of what Opus 4.6 does at 10% of the cost, every pricing assumption built on proprietary model margins evaporates. PMMs who aren't tracking open-weight benchmarks are pricing blind.
This piece is the economics layer — the infrastructure reality that sits beneath everything I've written about pricing model transitions and credit anxiety. Understanding the perception gap explains why per-seat pricing is collapsing (the value-per-seat is exploding while the cost-per-seat is imploding) and why credit anxiety exists (customers anchor to retail prices and think every prompt is expensive).
Next in the series: "The Margin Mirage" — how AI cost deflation is creating a false sense of security in SaaS gross margins, and why the companies celebrating 85% gross margins on AI features today might be the ones disrupted by open-weight alternatives tomorrow.