AI in 15 — April 20, 2026

Kate

Anthropic said the price hasn't changed. Independent developers just measured what actually happens when you run the same prompts. Your bill jumped forty percent. Welcome to the end of flat pricing.

Kate

Welcome to AI in 15 for Monday, April 20, 2026. I'm Kate, your host.

Marcus

And I'm Marcus, your co-host.

Kate

Monday kickoff, Marcus. Heavy follow-through on stories we've been tracking, plus some genuinely new angles. Simon Willison finally published concrete measurements on that Opus 4.7 tokenizer change, and it's worse than we thought. Uber burned its entire annual AI budget in four months. The RAM shortage is about to make every laptop, phone, and server more expensive through 2027. Federal prosecutors just indicted the first major AI-bubble fraudsters. A six-thousand-executive study says AI has had basically no impact on productivity. Switzerland is breaking up with Microsoft. And Gemma 4 is running in your browser. Let's go.

Kate

The Opus 4.7 tokenizer story goes from rumor to receipts.

Kate

Uber torches its 2026 AI budget in four months and goes back to the drawing board.

Kate

And the DRAM shortage turns AI capex into your next phone bill.

Kate

Lead story, Marcus. Simon Willison updated his Claude token counter over the weekend and the numbers are finally concrete. We flagged this Saturday. What's new today?

Marcus

Concrete measurements replacing rumor. Willison's counter now shows the Opus 4.7 tokenizer uses on average one-point-four-six times as many tokens as Opus 4.6 for the same input. Real-world workloads range from one-point-three-two-five for Claude Code sessions to one-point-four-seven for technical documentation. Anthropic's own migration guide concedes a range of one to one-point-three-five. Independent measurement puts the upper bound much higher.

Kate

So translate that to bills.

Marcus

Same sticker price, same workload, roughly thirty to forty-five percent more money out the door. Five dollars per million input and twenty-five per million output hasn't moved. But the meter is counting faster. That's a silent price hike on every customer who upgraded.

Kate

And there's a why behind the change.

Marcus

A Hacker News commenter, handle kouteiheika, argues Anthropic made the tokenizer more semantically aware to help the model reason better. Plausible. But the cost of that choice is passed straight to paying customers without disclosure. Separately, a community site called keepmyprompts dot com catalogued the Opus 4.7 system prompt diff. Less pushy end-of-conversation behavior, a new acting-versus-clarifying section, expanded child safety wrapping, and references to Claude running inside PowerPoint. Useful changes, but the tokenizer shift is the one that hits your invoice.

Kate

Editorial read?

Marcus

This is the cleanest example yet of a pricing illusion in the LLM economy. Flat headline price, rising effective price. If token definitions can change silently between versions, price per million tokens becomes a misleading benchmark. Enterprise buyers need contractual language now. Guaranteed tokenizer stability, or a fixed-cost-per-workload clause. The loss-leader era of premium AI subscriptions is ending, and the industry just made that end arrive without announcing it.

Kate

Which pairs perfectly with story two. Uber's CTO Praveen Neppalli Naga told staff the company is, quote, back to the drawing board on AI spending. Marcus, how did Uber torch an entire annual budget in four months?

Marcus

They rolled out Claude Code to roughly five thousand engineers in December. Usage nearly doubled by February. By April, four months in, they had burned through the full 2026 AI tooling budget. Total R&D came in at three-point-four billion. Critically, Uber had gamified adoption with internal leaderboards ranking engineers by token usage. Hacker News commenters called it token-maxxing. If you tell talented engineers to maximize a number, they will absolutely maximize that number.

Kate

Any upside in the data?

Marcus

Real upside. Eleven percent of Uber's live backend code updates are now written by AI agents. Ride matching, pricing logic, bug fixes. That's not a pilot, that's production. But the cost curve crushed the productivity argument faster than the productivity gains could compound. Uber is now testing OpenAI's Codex and other alternatives to break single-vendor dependency, and reworking how tokens are metered internally.

Kate

Read across to every Fortune 500.

Marcus

Every CTO is reading this memo. Combine it with the tokenizer story we just walked through. You signed a flat-priced contract, encouraged your engineers to adopt, then watched the vendor quietly raise your effective cost per prompt by forty percent. The honeymoon is over. Expect large enterprises to demand contractual guarantees against tokenizer drift, dual-sourcing across labs, and a hard look at open-source options they can run locally where the meter is theirs.

Kate

Speaking of costs you didn't see coming. The Verge published a deep dive this weekend. The RAM shortage could last years. Two hundred and forty-nine points on Hacker News. Marcus, why now?

Marcus

Samsung, SK Hynix, and Micron make essentially all of the world's DRAM. They're shifting wafer capacity to high-bandwidth memory, HBM, for Nvidia's data-center GPUs. Producing one bit of HBM uses roughly three times the wafer capacity of a bit of DDR5. So every H100 or Rubin that ships is a starved smartphone or laptop somewhere. AI will consume about twenty percent of all DRAM wafer capacity in 2026. Hyperscalers absorb roughly seventy percent of total memory chip output.

Kate

Consumer prices have already moved.

Marcus

Brutally. Counterpoint Research says DRAM prices rose eighty to ninety percent in the first quarter of 2026 alone. TrendForce projects another fifty to fifty-five percent jump in Q2. IDC does not see prices stabilizing until mid-2027 at the earliest. Major manufacturers expect to meet only about sixty percent of global demand by late 2027.

Kate

Is there any software-side relief?

Marcus

Some. A Hacker News commenter flagged Google's newly published TurboQuant technique. A six-times key-value cache reduction that's already landing in llama dot cpp. Memory-efficient architectures, quantization, cache compression, small models running in browsers, this is suddenly the hot frontier. If you can run the same model on a third of the RAM, you're effectively printing capacity during a drought.

Kate

Bigger picture.

Marcus

AI infrastructure is no longer an abstraction. It is squeezing every other part of computing. Your next laptop, your next phone, your next home server will cost more because Nvidia is consuming the world's memory wafers. That is the clearest cost-to-consumer transmission the AI boom has produced so far.

Kate

Story four, Marcus. The first major AI-bubble fraud prosecution. Federal prosecutors in Brooklyn unsealed a ten-count indictment on Friday.

Marcus

The defendants are Harish Chidambaran, fifty-seven, founder and former CEO of iLearningEngines, and former CFO Sayyed Farhan Ali Naqvi, forty-four. The Department of Justice alleges they fabricated, quote, virtually all of the AI company's four hundred and twenty million dollars in claimed 2023 revenue. The mechanism was sham customer contracts and round-trip cash transfers. Money leaves the company, passes through a fake customer, and comes back as revenue.

Kate

The company went public.

Marcus

Through a SPAC in April 2024 at a one-point-four billion dollar valuation. Climbed to one-point-five billion. Then Hindenburg Research published a short report in 2024 that specifically flagged the customer list as suspect. Chapter 11 in December 2024. Converted to Chapter 7 liquidation in March 2025. Chidambaran was arrested at his home in Potomac, Maryland. Naqvi in San Jose. Charges include continuing financial crimes enterprise, securities fraud, wire fraud, and conspiracy.

Kate

The fraud was hiding in plain sight for nearly two years.

Marcus

That's the part regulators have to answer for. Hindenburg published. The pattern was legible. And the SPAC-to-AI-hype-to-implosion arc had a familiar shape that should have drawn scrutiny earlier. This is the first major AI fraud prosecution. It will not be the last. Expect more enforcement against companies that rode the 2023 and 2024 AI wave with fabricated metrics. If you're holding SPAC-sourced AI exposure, read the audit carefully.

Kate

Reality check story, Marcus. And it's a big one. A February 2026 National Bureau of Economic Research study of six thousand executives found something the market does not want to hear.

Marcus

Executives across the US, UK, Germany, and Australia. Nearly ninety percent of firms reported AI has had no impact on employment or productivity over the last three years. About two-thirds of those executives said they personally use AI, but they average just one and a half hours a week. A quarter don't use it at all. And yet the same executives project AI will boost productivity by one-point-four percent and output by zero-point-eight percent within the next three years. Classic just-around-the-corner optimism.

Kate

Fortune drew a pointed historical parallel.

Marcus

Economist Robert Solow's 1987 line. You can see the computer age everywhere but in the productivity statistics. Apollo chief economist Torsten Slok reinforced the theme this week. Outside the big-tech sector, quote, there are no signs of AI, in employment data, productivity metrics, or profit margins. And that's despite over two hundred and fifty billion dollars in corporate AI investment in 2024 alone.

Kate

So is AI a bust?

Marcus

No, and I want to be careful here. Earlier general-purpose technologies, electricity, the PC, took about a decade to show up in aggregate productivity statistics. That pattern is real. But the market is pricing AI as if the transformation is already underway. OpenAI at eight hundred and fifty-two billion. Nvidia at trillions. The gap between measured productivity and asset prices is now the single most important macro question in AI. Combine this with the Uber overspend. Combine it with the tokenizer story. You start to see why cautious CFOs are getting nervous.

Kate

From macro to geopolitics. The Swiss Federal Chancellery announced over the weekend that they aim to reduce dependency on Microsoft, step by step, long-term. Marcus, context.

Marcus

Switzerland only just finished installing Microsoft 365 on roughly fifty-four thousand administration workstations. They paid Microsoft one hundred and fifty million Swiss francs, about one hundred and eighty-seven million US dollars, in license fees at the end of 2024. No public tender because there was no viable alternative. Cyber Defence Command head Thomas Süssli has been driving the sovereignty push. The Hacker News discussion surfaced a community site called mxmap dot ch. A map tracking which Swiss municipalities remain dependent on Microsoft and US cloud services.

Kate

Why now?

Marcus

Two drivers. The US CLOUD Act, which European governments view as a direct legal risk to any data held by US cloud providers. And a broader shift in how Europe reads the US as a geopolitical partner. Fair or unfair, that perception is hardening.

Kate

Read across to AI.

Marcus

Immediate. If European governments are breaking up with Microsoft over general cloud dependency, they are absolutely going to want sovereign AI rather than US hyperscaler APIs. Winners, open-source models, European cloud providers, on-premise inference frameworks. The Mistral bet just got more interesting. So did every stack that lets a Swiss bank run a capable model on hardware it physically owns.

Kate

Last quick hit, Marcus. A developer demo went viral on Hacker News this weekend. Prompt-to-Excalidraw diagram generation running Google's Gemma 4 E2B model entirely in the browser via WebAssembly.

Marcus

Three-point-one gigabytes to load, which is enormous. But once cached, fully local, no server round-trip, no API key, no bill. Gemma 4 launched April second under Apache 2.0. Four sizes. Edge tier E2B and E4B for phones and embedded hardware. Workstation tier twenty-six B mixture-of-experts and thirty-one B dense. The thirty-one B fits on a single eighty gigabyte Nvidia H100 in bfloat16.

Kate

And browser models are genuinely different.

Marcus

Different optimization problem. At batch size one, kernel launch overhead and memory bandwidth dominate, not raw floating-point operations. One commenter reported running Gemma 4 E2B on a Pixel 10 Pro during a flight with results comparable to cloud Gemini or ChatGPT. That is the moment edge AI stops being a demo.

Kate

Why it matters in one sentence.

Marcus

RAM shortages are driving server costs up. Anthropic and OpenAI are raising effective prices. The economic logic of running a capable model locally just got sharply stronger. Expect iOS 20, Android 17, and Windows 12 all to ship with built-in on-device LLMs this cycle.

Kate

Monday big picture, Marcus. Three threads.

Marcus

First, the end of flat pricing. The Opus 4.7 tokenizer hike, Uber's blowout, the DRAM shortage. AI compute is getting more expensive, not cheaper, even as models get better. The gravy train is ending for enterprise buyers.

Kate

Second?

Marcus

Reality versus hype. Six thousand executives say no impact on productivity, yet OpenAI is valued at eight hundred and fifty-two billion and iLearning shows what happens when fabricated metrics meet enforcement. The gap between measured output and market optimism is the most important question in AI right now.

Kate

And third?

Marcus

Sovereignty and decentralization. Switzerland walking away from Microsoft, Gemma 4 running in a browser, Uber diversifying off single-vendor AI. Customers want optionality. That favors open-source models, edge inference, European clouds, and anyone who lets you own your own meter.

Kate

The meter metaphor is doing a lot of work today, Marcus.

Marcus

Because that's what the whole week is really about, Kate. Who holds the meter, who sets the rate, and whether the number on your invoice actually reflects the work you asked for.

Kate

That's your AI in 15 for Monday, April 20, 2026. See you tomorrow.