BEYONDFEATURES
>Blog>Resources>About>Subscribe
>Blog>Resources>About>Subscribe

// ask ai about beyond features

Ask AI about Beyond Features

Copy the prompt and open your AI of choice to get a faster read on what Beyond Features is, who it helps, and where to start.

Prompt about Beyond Features

Help me understand Beyond Features. What is it, who is it for, what problem is it solving, and which page should I start with first?

Each button copies the same prompt before opening the app in a new tab.

BlogSubscribeSponsorLinkedInGitHub
BEYOND FEATURES

© 2026 Beyond Features

Satire at shared patterns, not the people. Same human behind this site.

Back to blog
AI pricingtoken economicsSaaS pricingClaude Codeusage-based pricingAI infrastructure costsdeveloper toolsPMMAI marginscompute costs

The Pricing Illusion: Why AI Product Costs Aren't What You Think

Anthropic's Claude Code Max sparked a $5,000/month panic. The real compute cost? Closer to $500. The gap between retail token pricing and actual infrastructure cost is the defining tension of AI-native SaaS — and most PMMs are on the wrong side of it.

March 10, 202610 min readby Beatriz

The Pricing Illusion: Why AI Product Costs Aren't What You Think

Analytics dashboard with charts and metrics

Photo by Stephen Dawson on Unsplash

This is Part 1 of the AI Pricing Economics series. Previously: How AI Agents Are Breaking Your Pricing Model covered the what — per-seat is dying, new models are emerging. Credit Anxiety: The Shadow Tax on Developer Productivity covered the how — usage pricing creates psychological friction that kills adoption. This piece covers the why: the infrastructure economics that explain both.


A viral Hacker News post hit 288 points and 207 comments last week. The question: does Anthropic's Claude Code Max plan cost $5,000 per month per user in raw compute?

The answer was no. Not even close.

Anthropic charges $5 per million input tokens and $25 per million output tokens at retail API rates for Opus 4.6. But comparable open-weight models on OpenRouter run at roughly 10% of those rates. That means a heavy Claude Code user burning through what looks like $5,000 in retail API pricing is actually consuming about $500 in compute — possibly less if Anthropic is running optimized inference on its own silicon.

Meanwhile, Anthropic launched Claude Max 20x at $200/month with full Claude Code access. And it's adding premium review workflows like automated security reviews inside Claude Code for higher-value enterprise use cases.

Three price points. Three completely different cost signals. One product.

This is the pricing illusion. And it's not unique to Anthropic. It's the defining structural tension of AI-native SaaS — and most people, including most PMMs, are anchored to the wrong number.

The Perception Gap

Here's the core problem: customers — including sophisticated technical buyers — anchor to retail token prices when they evaluate AI product economics. They see $25 per million output tokens and do napkin math that makes flat-rate plans look like money-losing charity.

But retail API pricing is not cost. It never was.

The gap between retail price and marginal cost is enormous — and it's the gap that makes every new AI pricing model work. Flat-rate subscriptions, credit bundles, per-outcome pricing — none of these are possible if the company's cost structure matches what the API price page says.

This is why Anthropic can offer $200/month unlimited Claude Code access. This is why OpenAI can sell ChatGPT Pro at $200/month. This is why Perplexity can offer 10,000 credits at the same price point. The retail API rate is a profit center, not a cost floor.

Understanding this gap isn't academic. It's the single most important input to AI product pricing strategy — and most PMMs don't have it.

The Historical Parallel: Usage Pricing 1.0

We've been here before. Not with AI, but with cloud infrastructure.

In the mid-2000s, AWS launched with pay-as-you-go compute. Twilio launched with per-API-call pricing. Stripe charged per transaction. These were the first wave of usage-based SaaS, and the dynamics rhyme with what's happening in AI right now:

What worked: Usage pricing aligned cost with value at scale. Startups could start at $0 and grow into six-figure annual contracts without a single sales call. Twilio grew from $49M to $1.76B in revenue between 2015 and 2021 on the back of per-message and per-minute pricing. Stripe's 2.9% + $0.30 per transaction meant their revenue scaled linearly with customer success.

What broke: Customers who scaled got punished. Large Twilio users built their own telephony infrastructure. Enterprises negotiated committed-use discounts that cut per-unit revenue by 40–60%. AWS introduced Reserved Instances and Savings Plans because pure on-demand pricing created exactly the anxiety I wrote about in Credit Anxiety — the finance team couldn't forecast, so they throttled usage.

What emerged: The hybrid. Base commitment + overage. Reserved capacity + burst pricing. Committed spend tiers with declining per-unit costs. Every mature usage-based company eventually added predictability mechanisms because pure usage pricing has a ceiling: the point where the customer's finance team says enough.

AI pricing is speed-running this exact arc. We're watching a decade of cloud pricing evolution compressed into 18 months.

AI-Native Margins: A Different Animal

Here's where the AI story diverges from cloud 1.0. Traditional SaaS has gross margins of 70–85%. The marginal cost of serving one more user on a Salesforce seat or a Notion workspace is almost zero — the software is already built, the infrastructure cost per user is negligible.

AI-native products have a fundamentally different cost structure. Every interaction burns compute. Every prompt is a real cost event. And the cost varies wildly based on model selection, context length, and output complexity.

But — and this is the part that most analysis gets wrong — the cost is falling at a rate that has no precedent in SaaS history.

Consider the trajectory:

PeriodComparable CapabilityCost per M Output Tokens
Early 2024GPT-4 / Claude 2 era$60–$120
Late 2024GPT-4o / Claude 3.5 Sonnet$15–$30
Early 2025Claude 3.5 / GPT-4o-mini$5–$15
Early 2026Opus 4.6 retail / open-weight equivalents$2.50–$25 (retail) / $0.25–$2.50 (inference)

That's a 50–100x cost reduction in two years for equivalent capability. Nothing in enterprise software has ever deflated this fast — not storage, not compute, not bandwidth.

This means two things for pricing strategy:

First, margins are expanding even as prices drop. Anthropic can charge less per user today and make more per user in profit than they could a year ago. The retail API price is dropping, but the gap between retail and cost is widening. This is the opposite of commodity compression — it's margin expansion disguised as price deflation.

Second, any price set today will look expensive in 12 months. This creates an existential challenge for PMMs: how do you anchor customers to value when the cost floor is in freefall? Price too high and you look greedy next quarter. Price too low and you leave margin on the table while costs are still meaningful.

The Claude Code Case Study

Anthropic's current pricing architecture is a masterclass in navigating this tension — whether or not it was designed that way.

The API layer ($5/$25 per million tokens for Opus 4.6) is the profit engine. It's priced at 5–10x marginal cost. This is where Anthropic makes money from developers and companies building on the platform. It's also the anchor price that makes everything else look like a deal.

Claude Max 20x ($200/month, unlimited Claude Code) is the adoption play. At $200/month with heavy coding use, a power user might consume $300–$800 in retail API value per month. But the actual compute cost is $30–$80. Anthropic is profitable on most Max users and underwater on the true power users — the classic insurance model. The 80% of users who consume moderately subsidize the 20% who go all out.

Automated security reviews are the outcome-priced wedge. This isn't framed as tokens or minutes of compute. It's framed around a high-value review workflow that can catch costly issues before production. That is the direction of travel for AI pricing: charge around outcomes buyers care about, not raw model consumption.

Three layers. Three pricing philosophies. One product surface. This is what the pricing model evolution looks like in practice — not a clean migration from per-seat to usage-based, but a simultaneous multi-model strategy where different segments get different cost structures.

What This Means for PMMs Pricing AI Features

If you're a PMM or product leader pricing AI capabilities right now, the perception gap is your most dangerous blind spot. Here's why, and what to do about it.

1. Stop anchoring to retail token prices in internal planning

Your finance team is probably modeling AI feature costs based on API pricing. That's like modeling cloud costs based on on-demand EC2 rates — technically accurate for a single instance, wildly misleading at scale. Build your cost models on committed/negotiated rates, not retail. If you're using a third-party model provider, ask for volume pricing. If you're running open-weight models, benchmark actual inference cost on your infrastructure.

2. Anchor customers to value, not tokens

The HN thread about Claude Code's costs was driven entirely by token-price anchoring. Nobody in that thread asked "how much value does a month of Claude Code create?" — they only asked "how many tokens does it burn?" This is a messaging failure. If your pricing page leads with tokens-per-dollar, you're inviting customers to do the wrong math. Lead with outcomes: reviews completed, code shipped, hours saved.

3. Plan for margin expansion, not compression

Most SaaS companies model AI features as margin-compressive — "this costs us real money per interaction, so margins will shrink." That's true in the short term and catastrophically wrong in the medium term. Model cost is falling 50%+ annually. If you set prices today based on today's cost, you'll be running 90%+ margins on those features within 18 months. Build that into your pricing architecture. Consider declining credit costs, loyalty pricing, or volume tiers that let you share the windfall with customers who stick around.

4. Use the perception gap strategically

The gap between what customers think AI costs and what it actually costs is a positioning asset. A $200/month flat-rate plan that customers believe delivers $5,000/month of compute value is an incredibly strong value proposition — but only if you let the anchoring work in your favor. Don't correct the perception. Don't publish your cost structure. Let the retail API price be the reference point that makes your subscription plans look like bargains.

5. Watch the open-weight convergence

The real threat to AI pricing isn't competition from other proprietary vendors. It's open-weight models closing the capability gap at 10% of the price. When a fine-tuned Llama or Mistral model can do 90% of what Opus 4.6 does at 10% of the cost, every pricing assumption built on proprietary model margins evaporates. PMMs who aren't tracking open-weight benchmarks are pricing blind.

Where This Series Goes Next

This piece is the economics layer — the infrastructure reality that sits beneath everything I've written about pricing model transitions and credit anxiety. Understanding the perception gap explains why per-seat pricing is collapsing (the value-per-seat is exploding while the cost-per-seat is imploding) and why credit anxiety exists (customers anchor to retail prices and think every prompt is expensive).

Next in the series: "The Margin Mirage" — how AI cost deflation is creating a false sense of security in SaaS gross margins, and why the companies celebrating 85% gross margins on AI features today might be the ones disrupted by open-weight alternatives tomorrow.


Sources

  • Hacker News: Claude Code Max compute economics discussion
  • Anthropic Claude Pricing — Opus 4.6 at $5/$25 per million tokens (input/output)
  • Anthropic Max plan — Max 5x and Max 20x tiers
  • Anthropic automated security reviews in Claude Code
  • OpenRouter Model Pricing — Comparative open-weight model pricing
  • Twilio FY2021 results — Revenue growth context
  • Tomasz Tunguz on AI Agent Pricing
  • a16z: The Economics of AI
  • Bessemer Cloud 100 Benchmarks Report — SaaS margin benchmarks

// related posts

How AI Agents Are Breaking Your Pricing Model

12 min read

PLG Broke the Day Your User Became an Agent

11 min read

Credit Anxiety: The New Shadow Tax on Developer Productivity

8 min read