AI in 15 — March 04, 2026
"Shock! Shock!" Those are the opening words of a paper published this week by Donald Knuth, the ninety-three-year-old godfather of computer science, describing the moment an AI solved a math problem he'd been stuck on for weeks.
Welcome to AI in 15 for Wednesday, March 4, 2026. I'm Kate, your host.
And I'm Marcus, your co-host.
Marcus, what a day. We've got one of the most respected minds in computing crediting an AI with a genuine mathematical breakthrough. Plus OpenAI just dropped GPT-5.3 Instant with the tagline "more accurate, less cringe." Google fired back with the cheapest Gemini yet. Apple published research questioning whether AI reasoning models actually reason at all. The Supreme Court just settled a major AI copyright question. And AI coding tools are printing money at a pace that's hard to believe. Let's preview.
Donald Knuth, author of The Art of Computer Programming, published a paper called "Claude's Cycles" documenting how Claude Opus solved an open combinatorics problem he couldn't crack.
OpenAI's GPT-5.3 Instant is now the default ChatGPT model, focused on being less preachy and more accurate.
Google launched Gemini 3.1 Flash-Lite at up to sixteen times cheaper than Gemini Pro.
Apple dropped a paper called "The Illusion of Thinking" arguing that AI reasoning models may not actually be reasoning.
The Supreme Court declined to hear the AI copyright case, and AI coding tools just hit some staggering revenue numbers. Let's get into it.
Marcus, I want to start with the Knuth story because this feels genuinely historic. Donald Knuth, the man who literally wrote the book on algorithms, published a paper crediting Claude with solving a problem he couldn't.
And the way the paper opens tells you everything about the impact. "Shock! Shock!" This is a man who has been doing mathematics for over sixty years. He doesn't use exclamation marks lightly. The problem was a directed Hamiltonian cycle decomposition, which is deep combinatorics work intended for a future volume of The Art of Computer Programming. Knuth had been working on it for weeks. Claude Opus solved it in about ninety minutes.
Walk me through what actually happened.
Claude made thirty-one systematic explorations. It tried brute-force approaches, invented what it called serpentine patterns, hit dead ends, changed strategies, and eventually found a construction that works for all odd-numbered cases. What's fascinating is the process. It wasn't just spitting out an answer. It was exploring, failing, adapting, and finding something genuinely novel.
But Knuth had to write the proof himself?
And that's the critical caveat. Claude found the construction but couldn't formally prove it was correct. Knuth did the rigorous mathematical proof and discovered Claude's solution was actually just one of seven hundred and sixty valid approaches. The even-numbered case remains unsolved. So Claude found the answer but couldn't verify it, and a human had to close the loop.
The Hacker News discussion was enormous. Five hundred and sixty-eight points, two hundred and thirty-four comments. And people were divided.
Fiercely divided. The core debate is whether Claude was doing genuine problem-solving or sophisticated pattern matching on similar constructions in its training data. And here's the thing, when Donald Knuth himself calls Claude's approach "quite admirable" and describes the result as "a dramatic advance in automatic deduction and creative problem solving," you have to take that seriously. This isn't a benchmark demo. This isn't a curated example. This is one of the most credentialed mathematicians alive, working on his own research problem, being genuinely surprised by what an AI produced. That carries weight that no leaderboard ever could.
It also draws a really precise line around what AI can and can't do right now in mathematics.
Almost surgically precise. Can find novel constructions? Yes. Can prove they're correct? No. That boundary is the most honest assessment of AI mathematical capability I've seen, and it came not from a benchmark but from a real researcher's real experience.
Now, the model that solved Knuth's problem, Claude Opus, that's the frontier. But the everyday models are getting interesting updates too. OpenAI just made GPT-5.3 Instant the default in ChatGPT. Marcus, "more accurate, less cringe" is quite a tagline.
It's remarkably candid. OpenAI is essentially admitting that their previous model was patronizing users. The overly cautious responses, the unsolicited lectures, the lengthy disclaimers before answering a straightforward question. They've dialed all of that back significantly. On the accuracy side, hallucination rates are down almost twenty-seven percent when using web search and about twenty percent on internal knowledge.
That's a meaningful improvement.
It is, and the strategic framing matters. OpenAI isn't chasing benchmark records with this release. They're chasing usability. After the Pentagon backlash, after the uninstall surge we've been covering all week, they need to remind people why they used ChatGPT in the first place. Making it less annoying to talk to is a very practical way to do that. And they teased that 5.4 is coming "sooner than you think," which is clearly meant to keep people from switching while they're still shopping around.
Google isn't sitting still either. Gemini 3.1 Flash-Lite launched yesterday. Marcus, the pricing on this is aggressive.
Twenty-five cents per million input tokens. That's twelve to sixteen times cheaper than Gemini Pro in high-context scenarios. And it's fast. Three hundred and sixty-three tokens per second, which is a forty-five percent throughput improvement over the previous Flash model. One million token context window, natively multimodal across text, image, speech, and video.
Despite being the budget option, it actually outperforms the previous generation on several benchmarks?
On reasoning and multimodal benchmarks, yes. Demis Hassabis called it "small but mighty" on X, which is a nice soundbite but also accurate. This model isn't competing with Opus or GPT-5 for frontier capability. It's competing for the workload that matters most commercially, the eighty percent of production tasks that don't need frontier intelligence. Content moderation, classification, real-time translation, e-commerce processing. The boring stuff that makes money. Google is saying you can run those workloads at a fraction of current costs, and that's a message enterprises care about deeply.
Now here's a fascinating counterpoint to the Knuth story. Apple's machine learning team published a paper called "The Illusion of Thinking." Marcus, what did they find?
They tested reasoning models from OpenAI, Anthropic, and Google on controllable puzzle environments, things like Tower of Hanoi and River Crossing where you can precisely dial complexity up and down. And they found three distinct performance regimes. At low complexity, standard models actually outperformed the reasoning models. At medium complexity, extended thinking showed genuine advantages. But at high complexity, everything collapsed. Both types of models fell apart completely.
So the thinking traces, the chain-of-thought reasoning that companies have been marketing as a breakthrough, Apple is saying that might not be real reasoning?
They're saying the models "fail to use explicit algorithms and reason inconsistently across puzzles." And here's the strangest finding. As complexity increased, the models' reasoning effort actually decreased despite having plenty of token budget left. As if they were giving up. The tweet announcing the paper got seventy-five hundred likes. And then someone published a rebuttal paper called "The Illusion of the Illusion of Thinking" that went viral before the author admitted it was a joke.
That's wonderful. But seriously, this cuts at how these companies market their most expensive products.
It does, and it's worth noting who's saying it. Apple doesn't sell a reasoning model. They have no competitive incentive to validate the category. The Knuth paper and the Apple paper together paint a nuanced picture. AI can find remarkable solutions to specific problems, as Knuth experienced. But the general reasoning capability that the marketing suggests? That may be considerably more limited than the demonstrations imply.
Quick legal story. The Supreme Court declined to hear the AI copyright case. No human author, no copyright. That's now settled law, Marcus.
Thaler v. Vidal is done. Stephen Thaler spent eight years trying to copyright art made entirely by his AI system. Every court said no. The Supreme Court won't even hear the appeal. The practical effect is significant. Fully AI-generated works enter the public domain immediately. No company can claim exclusive rights to content made without meaningful human creative input.
So if you're building a business on purely AI-generated content?
You have no intellectual property protection for that content. Anyone can copy it, use it, sell it. The legal framework is now clearly incentivizing hybrid human-AI creation rather than fully autonomous generation. For businesses, the message is: keep humans meaningfully in the creative loop, or lose your IP rights entirely.
Now for some jaw-dropping business numbers. Cursor just crossed two billion in annualized revenue. Claude Code hit two point five billion. Marcus, those growth rates.
Cursor doubled in three months. Claude Code more than doubled since January. These are among the fastest revenue growth trajectories in enterprise software history. Cursor's enterprise clients now represent about sixty percent of revenue. And Claude Code just launched a voice mode where developers can speak commands instead of typing them.
So AI coding tools aren't just popular, they're a real business.
They're becoming core infrastructure. When three products, Cursor, Claude Code, and GitHub Copilot, are all generating billions in annual revenue, that's not a trend. That's a validated market category. And it's interesting given the METR study we discussed Monday showing developers were nineteen percent slower with AI tools. Companies are paying billions for tools that might be making some developers slower, because the ones who use them well are dramatically more productive. The average masks the distribution.
Last quick hit. Alibaba's Qwen team lost its tech lead and two senior researchers in what appears to be involuntary departures. Shares dropped four and a half percent.
Junyang Lin posted "me stepping down. bye my beloved qwen" on X, and a colleague strongly hinted the departure wasn't voluntary. This matters because Qwen has been one of the most competitive open-source model families globally. Losing multiple senior technical leaders simultaneously, the day after releasing new models, raises real questions about what's happening internally. And frankly, it fits a pattern. The narrative around Chinese AI labs often looks stronger from the outside than the organizational reality supports.
Wednesday big picture, Marcus. Donald Knuth says an AI solved a problem he couldn't. Apple says AI reasoning might be an illusion. The Supreme Court says AI can't own what it creates. And the tools developers use to code with AI are generating billions in revenue. What's the thread?
The thread is that we're finally getting honest data about what AI actually is. Not what the marketing says, not what the hype suggests, but what it demonstrably does and doesn't do. Knuth's paper is honest. Claude found something remarkable and couldn't prove it was right. Apple's paper is honest. Reasoning models have hard structural limits. The Supreme Court ruling is honest. Creation without human authorship doesn't get legal protection. Even OpenAI's "less cringe" framing is a kind of honesty, admitting the previous version was annoying. After years of AI being sold on promises, we're entering the era of AI being evaluated on evidence. And the evidence is more interesting, more nuanced, and frankly more useful than the hype ever was.
The truth is more interesting than the pitch. That might be the best thing anyone's said about AI this year.
Give it a week. The pitch will try to catch up.
That's your AI in 15 for Wednesday, March 4, 2026. See you tomorrow.