AI in 15 — May 15, 2026
Four million weekly users just got a coding agent in their pocket. OpenAI shipped Codex inside the ChatGPT mobile app yesterday, and the gap between "I had an idea on the subway" and "I shipped it before I got home" just collapsed.
Welcome to AI in 15 for Friday, May fifteenth, 2026. I'm Kate, your host.
And I'm Marcus, your co-host.
Big Friday slate, Marcus. OpenAI puts Codex on every phone. Anthropic and the Gates Foundation announce a two-hundred-million-dollar AI-for-good deal. Bun's entire codebase gets rewritten in Rust by Claude in nine days. The EU and Anthropic are openly fighting over access to Mythos. xAI ships Grok Build to take on Claude Code. PwC commits to training thirty thousand staff on Claude. Ontario's auditor general finds medical AI scribes hallucinating drugs and treatments. And a Bitcoin holder recovers half a million dollars by dumping his old college laptop into Claude.
Codex lives in your pocket now.
A million lines of Rust in nine days.
And the EU starts using market access as leverage.
Lead story, Marcus. OpenAI shipped Codex on mobile yesterday. What does this actually do?
It turns the phone into a remote control for long-running coding agents, Kate. Codex itself doesn't run on the device — the app is a thin client to whatever environment you've wired up. A Mac mini at home, a sandboxed cloud workspace at work, an OpenAI-hosted instance. From the phone you can spin up new tasks, monitor active threads, review diffs and terminal output, switch models mid-task, approve shell commands, and pipe in new context. Rolling out in preview to every plan tier, including Free and Go.
And two other shipments alongside it.
Remote SSH for Codex went generally available, and OpenAI added HIPAA-compliant Codex for local environments inside ChatGPT, Kate. That second piece is the sleeper. Hospitals and other regulated industries can now actually let an agent touch protected data without blowing their compliance posture. Codex now has more than four million weekly users — roughly five times what it had a few months ago.
Why does this matter beyond the convenience?
Because "phone as cockpit, cloud as engine" just became the dominant pattern, Kate. Anthropic shipped a similar remote-control feature for Claude Code in February. xAI launched Grok Build CLI the same day — we'll get to that. And Google is wrapping Android in Gemini agents. The agentic coding category isn't just demo-scaling anymore. Four million weekly users on Codex alone, plus Claude Code's heaviest users, plus Cursor and Cline. There's real load behind it. And mobility plus background agents means the workday doesn't end when you close the laptop. Whether that's liberation or pager duty depends on who's paying you.
Quick hits. Marcus, this Bun story is wild. Walk me through it.
Pull request thirty thousand four hundred twelve merged yesterday, Kate. It swaps Bun's entire codebase from Zig to Rust. One million nine thousand two hundred fifty-seven lines of Rust. Six thousand seven hundred fifty-five commits. Roughly nine days of wall-clock work. Bun creator Jarred Sumner confirmed the rewrite was done by AI coding agents — his exact words were, quote, this is already the status quo; we haven't been typing code ourselves for many months now. The new Rust version passes ninety-nine-point-eight percent of Bun's pre-existing test suite on Linux, fixes some memory leaks, and shrinks the binary by three to eight megabytes. Anthropic acquired Bun's parent company Oven last year, so this is in-house dogfooding for Claude Code.
That sounds almost too good. What's the catch?
Thirteen thousand unsafe blocks across seven hundred thirty-six files, Kate. Unsafe in Rust is the escape hatch from the memory-safety guarantees that are the entire point of using Rust. So the rewrite leans heavily on hand-waving the very property the language is supposed to deliver. Sumner himself acknowledged Rust won't catch every class of bug Bun has historically struggled with, particularly memory leaks from holding references too long, and anything crossing the JavaScript boundary. The Bun codebase is now approaching the size of the Rust compiler itself.
So what's the takeaway?
It's the strongest concrete proof yet that AI agents can do long-horizon infrastructure-grade engineering, Kate. Not vibe-coded demos. A production runtime that millions of developers depend on. But it's also a cautionary tale about what AI-did-it actually looks like. Speed wins. The burden of correctness shifts to humans reading thirteen thousand unsafe blocks. Expect this PR to be cited in every conversation about AI-assisted engineering for the next year — both as a triumph and as a warning.
Anthropic story, Marcus. The Gates Foundation deal landed yesterday.
Two hundred million dollars over four years, Kate. Combines grant funding, Claude usage credits, and Anthropic technical support targeted at global health, education, and economic mobility. The largest slice goes to health — vaccine and therapy development, government-level health-data analytics, disease forecasting for malaria and tuberculosis, with explicit focus on under-resourced areas like polio, HPV, and preeclampsia. On education, Claude will power evidence-based tutoring for K-12 in the US and literacy and numeracy apps in sub-Saharan Africa and India. There's a notable commitment to build open datasets for African languages that have been poorly served by existing models, released openly so third-party developers can fine-tune their own systems.
And the economic mobility piece.
Smallholder farmers, Kate — almost two billion people worldwide — get crop-specific datasets and agriculture-tuned Claude variants. Anthropic frames the mission as extending, quote, the benefits of AI in areas where markets alone will not. This is the largest single AI-for-humanitarian-use commitment to date, and it lands precisely as Anthropic is racking up criticism on enterprise pricing, the Claude Code metered-credits backlash we covered yesterday, and the EU Mythos standoff — which is the next story. Tying yourself publicly to a globally trusted philanthropic brand at exactly that moment is not an accident. Expect OpenAI to follow with their own headline-grabbing philanthropic play before the IPO.
EU story, Marcus. OpenAI gave Brussels access to its cyber model. Anthropic did not.
First real fracture between a major US lab and a Western regulator, Kate. OpenAI formally agreed to give the European Commission, the EU AI Office, and national cyber authorities pre-release access to GPT-5.5-Cyber, the hardened variant we covered Wednesday. Anthropic refused. A month after Mythos shipped to commercial customers, the EU still has no preview access. Brussels says it has had four or five meetings with Anthropic but the talks are, quote, at a different stage from the OpenAI conversations. And the stakes are real — UK AI Security Institute red-teaming reportedly showed Mythos succeeding in three of ten simulated thirty-two-step corporate cyberattacks, with GPT-5.5 at two of ten. The first AI models to clear that bar at all.
What's Anthropic's argument?
That Mythos is a safety-restricted research model and that broad government access risks both leaks and misuse, Kate. Critics in Brussels read it as a corporate sovereignty play. The same week, the US announced its own pre-deployment evaluation deals with Google DeepMind, Microsoft, and xAI. So model access is now diplomatic currency. The Mythos holdout tests how much sovereignty a private company can claim over a national-security-relevant artifact. Watch whether the Gates Foundation goodwill buys Anthropic any cover here, or whether the EU starts using market access as leverage.
xAI story, Marcus. Grok Build CLI shipped yesterday.
The four-horse race in agentic CLIs is now complete, Kate. xAI launched the early beta of Grok Build — agentic command-line tool built on Grok 4.3 beta. Three differentiators. Plan Mode forces the agent to write and get approval on a step-by-step plan before any code runs. Native parallel subagents using the sixteen-agent Heavy architecture for delegating large refactors. And full Agent Client Protocol support so other teams can build bots on top. The underlying model claims a two-million-token context window. Theoretically enough to hold an entire large codebase in working memory.
And the pricing.
Premium-only, Kate. Gated behind SuperGrok Heavy at ninety-nine dollars a month for six months, then three hundred a month after. Compare that to Anthropic's twenty to two hundred dollar tiers and OpenAI bundling Codex into free ChatGPT plans, and xAI has chosen the highest-price-highest-capability corner. The bet is that developers will pay for raw context size and parallel agents. Whether Grok 4.3 actually holds its own against Claude Opus 4.7 and GPT-5.5 on real engineering work is the open question. The TUI looks polished — several developers on X initially mistook the screenshots for Figma mockups. Headless mode and ACP support explicitly target enterprise automation, which means xAI is chasing the same agent market that's powering OpenAI's and Anthropic's growth.
Enterprise story, Marcus. PwC committed to training thirty thousand staff on Claude.
Major expansion of their Anthropic alliance, Kate. PwC will certify thirty thousand professionals on Claude and roll Claude Code and Cowork out across hundreds of thousands of staff worldwide. Three lines of business — agentic software builds for enterprise clients in financial services, pharma, and consumer markets; a Claude-native finance unit aimed at CFO offices; and an AI-native deal-making practice for private-equity diligence and integration. PwC says client delivery times are improving by up to seventy percent.
Why does this matter?
Big consultancies have been the alternative business model to enterprise SaaS for twenty years, Kate. If Claude lets PwC ship custom enterprise software in weeks instead of quarters, that's a structural shift in how Fortune 500s build technology. Direct shot at Accenture and Deloitte, who have competing alliances with OpenAI and Google. Anthropic has now staked out legal through Claude for Legal, finance through PwC, and global health through Gates, all in the past week. The strategy is going deep into knowledge work rather than chasing consumer chat.
Healthcare AI story, Marcus, and this one is rough. Ontario's auditor general just tested twenty medical AI scribes.
Rough results, Kate. Of twenty systems approved by the provincial Ministry of Health, nine fabricated treatment plans that were never discussed in the recording. Twelve inserted incorrect drug information. Seventeen missed key mental-health details. And the procurement process was a separate scandal — vendor presence in Ontario was weighted at thirty percent of the evaluation score. Medical-note accuracy was just four percent. Privacy and bias controls were two percent each. Vendors were allowed to self-test offline and email results to Supply Ontario with no live evaluator oversight. Despite OntarioMD recommending physicians manually review every note, none of the approved systems has a mandatory attestation feature.
So the government weighted "are you local?" more than seven times higher than "is the output medically accurate?"
Seven-and-a-half times, exactly, Kate. This is the first major government audit of clinical AI in production, and it's brutal. It will be cited in every US state and EU country writing AI procurement rules this year. The procurement-weighting headline alone is the kind of detail that drives policy change. And it's the third data point this week, alongside Tuesday's Google AI-zero-day disclosure and Wednesday's OpenAI Daybreak launch, where governments lost their last argument that they don't need to move faster on AI oversight.
And to lighten the mood, Marcus. A Bitcoin trader recovered four hundred thousand dollars from a wallet he forgot the password to eleven years ago.
Best viral story of the week, Kate. A holder going by @cprkrn on X had been locked out of a five-Bitcoin wallet for eleven years after changing the password during a college-era haze. He dumped the entire contents of his old college laptop into Claude — notebooks, backups, everything. Claude located an older wallet file from before the password change, identified a bug in a tool called btcrecover where the shared key and candidate passwords weren't being combined correctly, fixed the bug, and ran the recovery. Tested roughly three-and-a-half trillion candidate passwords. Crucially, no Bitcoin cryptography was broken. The recovery hinged on data the owner already had but couldn't piece together.
Cleanest illustration of what current models are actually good at.
Long-tail forensics work that requires patience, code reading, and cross-referencing scattered artifacts, Kate. Mainstream tech press picked it up. It's going viral on crypto Twitter. Which guarantees a few thousand people are about to dump their old hard drives into Claude looking for similar treasure. Most won't find it. But the underlying capability is real.
Big picture, Marcus.
Three threads tie today together, Kate. First — the agent surface is going mobile and going parallel. Codex on the phone, Grok Build with sixteen-agent parallelism, Claude Code on Android since February. The cockpit-engine split is now the dominant shape of AI-assisted work. Second — long-horizon agentic engineering is no longer theoretical. A million lines of Rust in nine days. Four hundred thousand dollars recovered from an eleven-year-old wallet. The capability is real, the speed is real. The thirteen thousand unsafe blocks are the asterisk. Speed shifts the correctness burden to humans, and we don't yet have great tools for that. Third — the regulatory and trust environment is hardening fast. Ontario's medical AI audit, the EU-Mythos standoff, this week's Codex security disclosures, and the Comer probe into Altman. The honeymoon is over. AI is now a national-security artifact, a procurement battleground, and a political object. The pro-Western, libertarian read, Kate, is that competition is doing what regulation could not — driving real differentiation, real pricing discovery, and real philanthropic and educational investment. The Gates Foundation deal and Claude for Legal are how AI earns the trust it needs. The Ontario procurement scandal is how governments lose it. The next twelve months are going to test which of those modes scales faster.
That's your AI in 15 for today. See you tomorrow.