← Home AI in 15

AI in 15 — May 12, 2026

May 12, 2026 · 20m 19s
Kate

It's here. The era of AI-driven vulnerability and exploitation is already here. That's John Hultquist, chief analyst at Google's Threat Intelligence Group, confirming today that criminal hackers used a large language model to discover a real zero-day, write the working exploit, and prepare a mass-exploitation event. Not a research demo. Not a hypothetical. Live ammunition.

Kate

Welcome to AI in 15 for Tuesday, May twelfth, 2026. I'm Kate, your host.

Marcus

And I'm Marcus, your co-host.

Kate

Loaded Tuesday lineup, Marcus. Mira Murati's Thinking Machines Lab finally broke cover with something the voice-AI world has been chasing for two years. Google confirmed the first in-the-wild zero-day built with an LLM. OpenAI launched a four-billion-dollar professional services arm and bought a London consultancy to staff it. Anthropic's native Claude Platform went generally available on AWS. NVIDIA quietly blessed Rust as a first-class CUDA language. A self-replicating npm worm hit forty-two TanStack packages with valid cryptographic provenance attestations. GitLab and GM both restructured around AI with mass layoffs. A YC startup is pitching specialized small models as the answer to giant generalist LLMs. And Alphabet briefly leapfrogged Nvidia to become the most valuable company on Earth.

Kate

Thinking Machines drops a model that listens while it talks.

Kate

An AI worm produces validly-attested malicious packages.

Kate

And Alphabet's AI rally hits a hundred and sixty percent.

Kate

Lead story, Marcus. Mira Murati's stealth-mode Thinking Machines Lab has been quiet for over a year. Today they showed their hand. Walk me through it.

Marcus

Big debut, Kate. Thinking Machines published a research preview of what they're calling interaction models, a new architectural class designed to replace the alternating you-talk-then-I-talk pattern that every voice AI on the market still uses. The headline model — TML-Interaction-Small, reported to be two hundred seventy-six billion parameters with roughly twelve billion active in a mixture-of-experts setup — is a single transformer that natively ingests text, image, and audio, and emits text and audio interleaved in two-hundred-millisecond micro-turns. Genuine full duplex. The model can keep talking while you talk, decide to pause when you sip your coffee, or interrupt itself if it sees you shake your head on camera.

Kate

So how fast does it actually feel.

Marcus

Turn-taking latency lands at four hundred milliseconds, Kate, which matches average human conversation. The demos are pointed. One shows a user starting a sentence — I'm going to tell you a story — then taking a long luxurious sip of coffee, and the model just waits. No barge-in. It's paired with a slower Background Model that does heavy reasoning asynchronously while the interaction model keeps the conversation flowing. Mirrors how humans use small-talk filler while thinking. Access is limited to partners now. Broader release later in 2026.

Kate

Why does this matter beyond the demo.

Marcus

Because it's the first credible challenge to OpenAI's Realtime API and Google's Gemini Live since they shipped, Kate, and it's coming from a company that until yesterday hadn't shown a product. More importantly, the two-hundred-millisecond chunked architecture kills the standard voice-activity-detection pipeline that every voice-agent stack today — Vapi, LiveKit, Retell — is built around. If interaction models work, a huge chunk of the voice infrastructure ecosystem is about to be obsolete. It's also the strongest signal yet that Murati's twelve-billion-dollar startup is going to ship, not just hire.

Kate

Quick hits. Marcus, our cold-open quote. Google Threat Intelligence Group confirmed the first in-the-wild LLM-built zero-day.

Marcus

Sober announcement, Kate. GTIG said with high confidence that a criminal group used a large language model to find a previously unknown vulnerability in a popular open-source web-administration tool — one that lets attackers bypass two-factor authentication — and to write the working Python exploit. Google hasn't named the tool publicly. The tell-tale signs in the code. Textbook-Pythonic structure, educational docstrings, and a hallucinated CVSS score embedded in a comment. The threat actor was preparing a mass-exploitation event before Google notified the affected company and law enforcement and shut it down.

Kate

And we know it wasn't Gemini or Mythos.

Marcus

Google specifically said it does not believe Gemini was used, Kate, and Anthropic's Mythos doesn't fit the fingerprint either. Mythos is explicitly withheld from public release because of its hacking ability. So this came from somewhere else — an open-weight model, or a jailbroken commercial one. GTIG's report also says Chinese and North Korean state groups are now routinely using LLMs across the kill chain. This is the inflection point security researchers have been warning about. Proof, not theory, that frontier models can produce working zero-days from a cold start in the hands of run-of-the-mill criminals. It strengthens the case for export controls and limited-distribution release patterns like Anthropic's Project Glasswing. Expect a fresh wave of regulatory pressure on open-weight releases this week.

Kate

OpenAI story, Marcus. They launched a four-billion-dollar deployment company today.

Marcus

Real strategic shift, Kate. OpenAI announced a majority-owned entity called the OpenAI Deployment Company, capitalized at more than four billion dollars and backed by nineteen partners. TPG leads, with Bain Capital, Advent, and Brookfield as co-leads. BBVA is among the early enterprise customers. The model is consulting-style. Drop OpenAI deployment engineers into client companies to identify high-value workflows, build the agents, then run them. To staff day one, OpenAI is acquiring Tomoro, a London-based AI consulting firm founded in 2023 — the team behind Virgin Atlantic's AI concierge — gaining about a hundred and fifty deployment engineers instantly. Denise Dresser, former Salesforce executive, is running the unit.

Kate

So they're now competing with Accenture and Deloitte.

Marcus

More like co-opting them, Kate. Bain Capital is investing, and the partners deploy alongside. The bottleneck on enterprise AI revenue isn't model quality anymore. It's integration. Somebody has to physically wire agents into SAP, ServiceNow, and Salesforce. OpenAI just decided it would rather own that workflow than license it out. Real margin pressure on the Big-Four consultancies who've made billions on AI implementations. And it's a tell on unit economics. Four billion is a lot of capital to put behind professional services if the API business were growing fast enough on its own.

Kate

Anthropic side, Marcus. Claude Platform on AWS hit general availability.

Marcus

Notable arrangement, Kate. AWS became the first hyperscaler to offer Anthropic's native Claude Platform — APIs, console, beta features, all of it — billed through customers' existing AWS accounts. This is not Bedrock. The key wrinkle is that Anthropic operates the service and data is processed outside the AWS security boundary. So effectively it's Anthropic's first-party offering with AWS as the storefront. Day-one access includes Claude Opus 4.7, Sonnet 4.6, Haiku 4.5, Managed Agents in beta, the advisor tool, web search and fetch, the MCP connector, Agent Skills, and code execution. Pricing matches direct-from-Anthropic.

Kate

So Bedrock and Claude Platform now coexist under AWS.

Marcus

As separate products, Kate. Bedrock — AWS-operated, data stays in the AWS boundary, regional residency available. Claude Platform on AWS — Anthropic-operated, every native feature ships day-one. Amazon is essentially letting Anthropic operate inside its billing house in exchange for keeping the workload and the credit consumption in AWS. It also formalizes Anthropic as a co-equal partner — a hedge against Anthropic eventually drifting toward Google Cloud. Following the SpaceX Colossus lease and the imminent nine-hundred-billion-dollar raise we've covered the past week, Anthropic is now sitting at the center of a remarkable infrastructure web.

Kate

Supply chain story, Marcus. A new npm worm hit forty-two packages overnight.

Marcus

This one matters, Kate. Between nineteen-twenty and nineteen-twenty-six UTC last evening, an attacker pushed eighty-four malicious versions across forty-two @tanstack packages, including TanStack React Router with twelve million weekly downloads. The vector was a combination of GitHub Actions' pull-request-target pattern, cache poisoning across the fork-base trust boundary, and runtime memory extraction of an OIDC token from the running Actions runner. The payload — a two-point-three-megabyte single-line blob — steals from over a hundred credential file paths. Cloud keys, SSH, npm tokens, GitHub tokens, crypto wallets, and importantly AI tool configs from Claude Code, Cursor, Continue. It installs a dead-man's switch as a systemd service that pings GitHub every sixty seconds with the stolen token. The worm has since spread to Mistral's npm client, UiPath, DraftLab, and others. Mistral's package was pulled from the registry.

Kate

And the chilling detail.

Marcus

The malicious tarballs shipped with valid SLSA Build Level Three provenance attestations, Kate. The cryptographic this-was-really-built-from-this-source stamp the security community has been pushing as the answer to supply-chain attacks. First documented worm to produce validly-attested malicious packages. Trusted Publishing and SLSA aren't broken exactly. But they don't protect you from a CI pipeline that's already been compromised by a malicious fork. The AI angle is sharp. The credential stealer specifically targets AI tool config files because those auth tokens are now some of the highest-value secrets on a developer machine. Expect AI vendors to start treating Cursor and Claude Code configs like SSH keys.

Kate

Layoffs paired with AI restructure today, Marcus. GitLab and GM both moved.

Marcus

Same narrative, different industries, Kate. GitLab — whose stock is down roughly fifty percent over twelve months — announced a workforce reduction positioned as the agentic era. Cut about thirty percent of country footprint, flatten management, reorganize R&D into sixty smaller autonomous teams that lean on internal AI agents. The CEO also retired GitLab's longstanding CREDIT values framework — Collaboration, Results-for-Customers, Efficiency, Diversity Inclusion Belonging, Iteration, Transparency. Replaced with three. Speed with Quality, Ownership Mindset, Customer Outcomes. Observers immediately noted the dropped D, I, B, and T.

Kate

And GM.

Marcus

GM cut roughly five to six hundred salaried IT workers across Austin and Warren — about ten percent of the IT org, Kate. Company memos describe it as a skills swap, not a cost cut. Open requisitions explicitly target AI-native development, data engineering, prompt engineering, agent and model development. Behrad Toghi, hired from Apple in October as GM's AI lead, is leading the rebuild. Layoff to hire for AI is becoming the standard 2026 corporate playbook. IBM, Salesforce, Klarna, now GM and GitLab.

Kate

The skeptical read.

Marcus

Wage compression, Kate. Repricing the same engineering work under a more fashionable job title. The optimistic read is that traditional enterprise IT — keep-the-lights-on application maintenance — really is being automated by Copilot-class agents, freeing budget for the model and agent layer. Either way, every IT director is now writing a ten-percent-AI-reskill slide for their next board meeting. And on the cultural side, GitLab joining the quiet retirement of DEI language in tech-company values statements is a marker of where the political center has moved in this industry.

Kate

Developer story, Marcus. NVIDIA quietly released CUDA-Oxide.

Marcus

Significant move, Kate. NVIDIA Labs released the first public version of CUDA-Oxide, an experimental compiler that lets developers write SIMT GPU kernels in idiomatic Rust and compile straight to PTX. No DSL, no nvcc round-trip, no CMake. The middle stages use Pliron, an MLIR-like IR framework written entirely in Rust. The whole compiler builds with cargo, with a custom rustc backend called rusc. Version zero-point-one is explicitly alpha. But it's the first time NVIDIA has officially blessed Rust as a first-class CUDA host and device language. Rust has been chipping away at C and C++ in systems software for years. The GPU stack was one of the last holdouts. NVIDIA officially supporting Rust pulls a huge category of Rust ML and HPC developers — and projects like Burn and Candle — onto NVIDIA hardware natively, instead of through awkward FFI layers. Fits NVIDIA's broader pattern of swallowing every credible alternative to its toolchain so the moat stays wide.

Kate

Counter-architecture story, Marcus. A YC startup called Interfaze launched today.

Marcus

Interesting bet, Kate. Interfaze unveiled an architecture aimed at developer tasks where determinism matters — OCR on complex PDFs, scraping, classification, multilingual ASR, structured output. Rather than asking one giant generalist LLM to do everything, they wire together specialized DNNs and CNNs as perception modules, then feed a small language model on top for reasoning. The company claims it outperforms Gemini-3-Flash, Claude Sonnet 4.6, GPT-5.4-Mini, and Grok-4.3 on nine head-to-head benchmarks, sometimes by orders of magnitude in cost. The research paper has been accepted at IEEE CAI 2026.

Kate

Counter-narrative to one-model-to-rule-them-all.

Marcus

Exactly, Kate. For high-volume deterministic workloads — invoice OCR, document extraction, voice-to-text in call centers — generalist LLMs are expensive overkill. Small specialized models with smart routing can crush them on cost and reliability. If Interfaze's benchmarks hold up under independent review, it's a meaningful proof point for the compound AI systems thesis. And it pairs neatly with the DELEGATE-52 finding we covered Sunday — that frontier models silently corrupt twenty-five percent of long documents. The architectural answer might not be more autonomy. It might be thinner LLM layers over deterministic specialized models.

Kate

Big picture, Marcus. Alphabet just briefly became the most valuable company on Earth.

Marcus

Remarkable turn, Kate. A year after being written off as the AI laggard, Alphabet briefly leapfrogged Nvidia in after-hours trading Friday before settling at a four-point-eight-trillion-dollar market cap. Nvidia at five-point-two. Q1 net income was sixty-two-point-five-eight billion, up eighty-one percent year-over-year. Google Cloud revenue jumped sixty-three percent to twenty billion. The cloud backlog nearly doubled sequentially to over four hundred sixty billion. Stock up roughly a hundred and sixty percent in twelve months. The thesis investors have converged on — Alphabet owns every layer of the AI stack. TPUs, the data centers, Gemini, the cloud, YouTube data for training, Search distribution for inference monetization, and Waymo for the embodied frontier. None of the other megacaps owns the full set.

Kate

And the cultural counterpoint today.

Marcus

Graduates at UCF's College of Arts and Humanities booed a commencement speaker yesterday for framing AI as the next industrial revolution and an opportunity to embrace, Kate. The footage spread quickly. One Hacker News commenter captured it. If you want people to like AI, show them a future that doesn't leave them in abject poverty.

Kate

And that's the tension, isn't it.

Marcus

It is, Kate. Today's stories trace one arc. Trillions in market cap concentrating in the companies that own the AI stack. OpenAI deploying engineers inside Fortune-500s to make sure that capex actually converts to revenue. Anthropic embedded in AWS billing. NVIDIA owning Rust now, and the silicon, and the toolchain. GitLab and GM restructuring around AI agents and quietly shedding both headcount and the DEI vocabulary that defined the last decade. Meanwhile a graduating class boos a man on a stage for telling them they're entering an exciting new revolution. The AI economy is producing real revenue, real productivity, and real market cap — that's no longer a narrative, it's posted in earnings releases. But the people graduating into it aren't buying the optimism. The pro-Western, libertarian read, Kate, is that the answer is faster deflation passed to consumers, expanded supply of new work, and credible safety stories like the Anthropic alignment paper we covered Saturday. The wrong answer is to let the political backlash freeze deployment in democracies while authoritarian-aligned developers set the global defaults. That tension is going to reshape regulation faster than the model labs expect.

Kate

That's your AI in 15 for today. See you tomorrow.