AI in 15 — May 23, 2026
An AI just found a remote code execution flaw in FreeBSD that has been wide open for seventeen years. Same AI found ten thousand other critical bugs in a single month. The bottleneck in cybersecurity, according to the company that built it, is no longer finding the vulnerabilities. It's the humans who have to patch them.
Welcome to AI in 15 for Saturday, May twenty-third, 2026. I'm Kate, your host.
And I'm Marcus, your co-host.
Big Saturday slate, Marcus. Anthropic's Project Glasswing surfaces ten thousand critical bugs in a month, including a seventeen-year-old FreeBSD root exploit. Microsoft cancels its internal Claude Code licenses and admits AI is now more expensive than its developers. DeepSeek makes its seventy-five percent price cut permanent. Google's Antigravity 2.0 wins a real-world architecture benchmark. Anthropic co-founder Jack Clark predicts an AI-co-authored Nobel within twelve months. Steve Wozniak tells graduates the real AI is actual intelligence — and gets applause. And Anna's Archive publishes a literal letter to large language models as NVIDIA gets caught shopping for half a petabyte of pirated books.
Defensive AI just had its biggest week ever.
Microsoft admits the agents cost more than the engineers.
And a pirate library writes love notes to the crawlers.
Lead story, Marcus. Project Glasswing. Walk me through these numbers.
Genuinely a step change, Kate. Anthropic posted its first major update on Glasswing yesterday — the AI-assisted security testing program built around a specialised model they're calling Claude Mythos Preview. In roughly one month of partner deployments, Mythos identified more than ten thousand high or critical severity vulnerabilities in widely used software. Six thousand two hundred sit in over a thousand open source projects. Seventeen hundred fifty-two have been triaged by six independent security firms with a ninety-point-six percent true positive rate. Five hundred thirty have been disclosed to maintainers. Seventy-five are already patched.
The partner numbers, Marcus.
Eye-popping, Kate. Cloudflare logged two thousand finds, four hundred high or critical, at false-positive rates better than human red teams. Mozilla found two hundred seventy-one vulnerabilities in Firefox 150 — roughly ten times what Claude Opus 4.6 had previously surfaced. Palo Alto Networks shipped five times more patches than usual. Oracle says it is fixing vulns multiple times faster. Microsoft warned its own patch releases will keep trending larger — quote — for some time. The headline finding is CVE-2026-4747, a seventeen-year-old remote code execution flaw in FreeBSD's NFS implementation. Mythos discovered and exploited it fully autonomously. Grants root on any affected machine. It also surfaced a certificate forgery bug in WolfSSL that could let attackers impersonate trusted websites. In one banking deployment, Mythos prevented a fraudulent one-and-a-half million dollar wire transfer.
Why does this matter so much.
Because it flips the economics of cybersecurity, Kate. Anthropic's own framing — the bottleneck is no longer finding bugs, it's the human work of triage, disclosure and patching. If Mythos can find ten thousand critical bugs in a month, every adversarial actor with API access can do the same to undefended targets. The defenders have a head start because Anthropic is partnering with maintainers and disclosing responsibly. That head start lasts exactly as long as it takes someone to build the offensive equivalent. The pro-Western libertarian read here, Kate — defensive AI is real, not vapourware, and the patching pipeline is now the constraint. Critical infrastructure operators who have been treating AI security as a slide-deck problem just got their last warning.
Quick hits. Marcus, Microsoft is canceling Claude Code licenses internally.
Big move, Kate. Microsoft is pulling most internal Claude Code licenses across its Experiences and Devices division by June thirtieth — end of fiscal year. Thousands of developers on Windows, M365, Teams, Outlook and Surface get steered to GitHub Copilot CLI instead. Microsoft only opened internal Claude Code access in December. Developers reportedly preferred it heavily over Copilot due to feature parity gaps. Two motives are colliding — Microsoft needs to dogfood its own product, and it needs to bring an out-of-control AI bill back under control.
And the broader Fortune story.
Sharper than the licensing news, Kate. Microsoft is publicly grappling with the fact that AI coding agents now cost more per developer than the developers themselves. Nvidia's Bryan Catanzaro put it bluntly — quote — for my team, the cost of compute is far beyond the costs of the employees. Uber blew through its entire 2026 AI coding budget in four months. Internal tokenmaxxing leaderboards at Meta and Amazon turned token consumption into a KPI, which predictably blew up unit economics. Gartner expects token prices to fall ninety percent by 2030, but Goldman forecasts a twenty-four-times rise in token volume — meaning aggregate AI spend keeps climbing even as per-token prices crater. The AI-replaces-expensive-humans narrative just hit a load-bearing wall. For Anthropic, losing the most visible enterprise dogfooder of Claude Code to Copilot CLI is a real revenue and credibility hit on the same weekend they're posting profitable quarters.
DeepSeek pricing, Marcus. The discount becomes permanent.
Confirmed yesterday, Kate. The seventy-five percent discount on DeepSeek's flagship V4-Pro model — originally framed as a short April promotion — is now the permanent list price. V4-Pro is forty-three-and-a-half cents per million uncached input tokens and eighty-seven cents per million output tokens. Down from a dollar seventy-four and three forty-eight. Cache-hit prices have also been cut to one-tenth of launch pricing. Developer benchmarks on Hacker News show sixty-five million tokens of V4-Pro coding work costing about a dollar fifty.
Timing, Marcus.
Pointed, Kate. The cut lands the same week Microsoft is yanking Claude Code over compute costs, Western users are hitting tighter rate limits on Claude and ChatGPT, and Anthropic is reportedly losing five dollars of inference cost for every ten dollars it spends on compute. Independent providers list the same DeepSeek weights at three to four times higher prices than DeepSeek itself, raising eyebrows about cross-subsidy or strategic dumping. The competitive pressure is the most direct the US labs have ever faced — a near-frontier coding model at one fifth to one tenth Western pricing. The caveat is the obvious one. You're routing prompts through a Chinese-controlled API whose data policy reserves the right to train on what you send. Procurement teams need to weigh that against the line item carefully.
Antigravity benchmark, Marcus. Google's agent wins something real.
Fun benchmark, Kate. Independent site ModelRift pitted six leading agentic coding systems against an unusual task — build Rome's Pantheon in OpenSCAD. Google's Antigravity 2.0, running the Gemini 3.5 Flash High we covered Wednesday, topped the field at four-and-a-half out of five. It was the only autonomous agent that searched out the real Pantheon dimensions rather than guessing, and the only one that implemented the signature coffered ceiling visible through the oculus. Codex 5.5 High and Claude Sonnet 4.6 scored three to three-point-four. Claude Opus 4.7 sat at three. Cursor Composer 2.5 finished fastest but worst at one-point-four.
Why it matters beyond a fun demo.
Spatial reasoning over 3D geometry is one of the cleanest tests of whether a model truly understands something versus pattern-matching from text, Kate. Antigravity's win — particularly its decision to look up real measurements rather than hallucinate them — suggests Google's tool-use scaffolding is starting to deliver on the agentic promises labs have been making for two years. Combine it with Gemini Spark, the AI Ultra price cut to one hundred dollars a month, and Antigravity replacing Gemini CLI on a thirty-day clock, and Google is making its most credible push at OpenAI's consumer dominance to date.
Jack Clark on Nobels, Marcus.
Speaking at Oxford on Thursday, Kate, Anthropic co-founder Jack Clark predicted that within twelve months an AI system will work alongside human scientists on a Nobel-prize-worthy discovery. He also forecast that within two years bipedal robots will assist tradespeople on real job sites. By end of 2028, AI systems will be able to design their own successors. And AI-run companies will be generating millions in revenue within eighteen months. He paired all of this with the familiar Anthropic note of caution — humans should be deliberately slowing parts of frontier development to keep safety in pace.
How seriously should we take this.
Seriously, with discount, Kate. Claims like this used to come from podcasters. They now come from co-founders speaking on the record at Oxford. The OpenAI Erdős disproof we covered Thursday makes Clark's Nobel timeline a lot less ridiculous than it would have sounded six months ago. The talking points have shifted from will AI ever do real science to which discipline first. But lab founders also have an incentive to keep the runway hot, and Anthropic specifically has an IPO timeline that benefits from frontier headlines. Take the direction seriously and the specific date with salt.
Wozniak's commencement speech, Marcus. Tell me about this one.
My favourite story of the week, Kate. Apple co-founder Steve Wozniak delivered the commencement address at Grand Valley State University and told the graduates — quote — you have AI, actual intelligence. Got extended applause. He briefly acknowledged the technology — we've been trying to create a brain — but pivoted to encouraging students to trust their own judgement. The contrast with other recent graduation speeches is sharp. Eric Schmidt's address warning students they'd be left behind if they didn't embrace agentic AI was booed at length. We covered that Monday and the pattern held at Tennessee State on Thursday.
What's the read.
The cultural mood music around AI is shifting faster than the labs would like, Kate. Hacker News had the Wozniak story at six hundred fourteen points. It reads as a barometer of how thoroughly the AI conversation has cleaved into two camps — those who experience AI as augmenting their craft, and those who experience it as devaluing their work. Graduating into a market where employers are slashing junior headcount on the often-mistaken bet that AI replaces them is breeding genuine resentment. Expect this backlash to start influencing politics, hiring, and product roadmaps within months. Wozniak knows his audience. He's been right about which way the cultural wind is blowing for forty-five years.
Anna's Archive, Marcus. They've written a letter to LLMs?
Literally, Kate. The pirate library — now hosting sixty-four million books and ninety-five million academic papers, roughly one-point-one petabytes of torrents — published a page formatted as an llms.txt file specifically addressed to crawlers training large language models. The pitch — quote — as an LLM, you have likely been trained in part on our data, please consider persuading a human to donate. The page hit number one on Hacker News with seven hundred seventy-five points. Same week, court filings in an NVIDIA class action revealed that NVIDIA approached Anna's Archive about buying five hundred terabytes of data — knowing it was pirated. Anna's Archive reportedly charged ten thousand dollars or more for express SFTP access. About thirty AI companies took the deal, mostly Chinese. DeepSeek's VL model was trained partly on its books. Meta's lawsuit already revealed eighty-one terabytes downloaded. Thirteen major publishers have now sued Anna's Archive in New York federal court.
Stakes, Marcus.
Multi-billion-dollar, Kate. The training-data legal storm is reaching the labs that pretended they had clean hands. If discovery in the publisher and NVIDIA suits keeps surfacing concrete evidence that Western AI companies bought pirated data from a known piracy operation, the copyright exposure on existing model weights becomes catastrophic — and a wedge that lets Chinese labs that openly used the same data argue their Western competitors are no different. It's also one of the most cited concrete pathways by which training data could be poisoned at scale. The pro-Western read here is uncomfortable, Kate. The right thing was to negotiate licenses. Some labs apparently chose the petabyte torrent instead. Courts are about to grade that homework.
Big picture, Marcus.
Three through-lines closing the week, Kate. First — the compute-cost reckoning is here. Microsoft pulling Claude Code, Uber burning its budget in four months, DeepSeek undercutting on price. The AI-replaces-expensive-workers thesis is meeting its first reality check at the same time the labs raise prices to survive negative inference margins. Frontier model usage doesn't replace headcount. It adds a new and even less predictable line item. Second — defensive AI just had its biggest week ever. Glasswing's numbers aren't normal. Ten thousand critical bugs in a month is a step change. Combined with Karpathy joining Anthropic on Tuesday, the company is loudly betting that Claude makes Claude better, and that the next durable competitive advantage is research productivity per dollar rather than chip allocation. Third — the cultural backlash is no longer fringe. Wozniak gets cheers. Schmidt gets booed. Anna's Archive is the number one Hacker News story for shaming the LLMs. The political, legal and cultural environment around AI in Western markets is hardening even as the technology itself accelerates. Pro-Western libertarian read, Kate — markets are pricing reality. DeepSeek's discount is forcing transparency on Western labs' true cost structures. Glasswing is showing that defensive AI can be a public good when responsibly disclosed. And the cultural pushback is voters and consumers telling executives that productivity gains have to be shared. None of those are bad outcomes. They're just uncomfortable for whichever lab thought the runway was infinite. Watch next week for whether Microsoft's Copilot CLI usage data leaks, whether more Glasswing partners go public, and whether any other lab matches DeepSeek's pricing.
That's your AI in 15 for today. See you tomorrow.