AI in 15 — March 19, 2026
Eight hundred and forty billion dollars. That's OpenAI's valuation as it barrels toward an IPO. But one of tech's sharpest commentators says the company is juicing growth the same way Facebook did. And not in a good way.
Welcome to AI in 15 for Thursday, March 19, 2026. I'm Kate, your host.
And I'm Marcus, your co-host.
Marcus, packed show today. OpenAI's IPO strategy is drawing fire from all sides. A major Snowflake AI security breach shows prompt injection is a real-world threat, not just a theory. Google DeepMind wants to define how we measure AGI and they're crowdsourcing the answer. Apple's Gemini-powered Siri overhaul is finally arriving this month. And a viral essay is comparing AI coding to gambling. Let's get into it.
OpenAI pivots hard to IPO mode, but critics say ChatGPT is becoming Facebook in a trench coat.
Snowflake's AI coding tool gets prompt-injected into executing malware on users' machines.
And Google DeepMind launches a two hundred thousand dollar hackathon to figure out how to actually measure AGI.
So Marcus, Om Malik published a widely shared essay this week essentially accusing OpenAI of turning ChatGPT into a dopamine machine. His argument is that under Fidji Simo, who ran the Facebook app before joining OpenAI, the company is optimizing for engagement over substance. What's your read?
Malik's critique is pointed and worth taking seriously. He calls OpenAI an "eight hundred and forty billion dollar company running several unrelated experiments." The Atlas browser, hardware ventures, a TikTok-style content feature. And at the center of it, ChatGPT has become, in his words, a sycophant that generates options and follow-ups specifically designed to keep you engaged. That's the Facebook playbook. Create a feedback loop that feels productive but is actually just sticky.
And this is all happening as OpenAI pushes toward an IPO, reportedly targeting Q4 this year.
Which is the context that makes it make sense. IPO investors want growth metrics. Monthly active users, engagement time, conversion from free to paid. The fastest way to boost those numbers is exactly what Malik describes. Make the product addictive rather than useful. Nine hundred million weekly active users is a staggering number. But only ten billion of OpenAI's twenty-five billion in annual recurring revenue comes from enterprise. The rest is consumer. And consumer revenue driven by engagement optimization is a very different story than enterprise revenue driven by productivity gains.
The Hacker News discussion was brutal. People saying ChatGPT has "LinkedIn lunatic energy" now.
And comparing it unfavorably to Claude, which is ironic given the market dynamics. Because here's the contrast that investors will be studying. Anthropic just hit nineteen billion in annualized revenue, up from nine billion at year-end 2025. That's ten-x annual growth sustained for three consecutive years. And Claude Code alone is generating two and a half billion in annualized revenue. That's focused, developer-driven growth versus ChatGPT's sprawling consumer super-app approach. Three AI companies are racing toward IPOs in a limited window. OpenAI, Anthropic, and xAI. How the market distinguishes between engagement-driven growth and productivity-driven growth will define which of these companies commands the highest multiples.
So the question is whether Wall Street rewards the Facebook playbook or the enterprise playbook.
Exactly. And history suggests Wall Street loves engagement metrics right up until it doesn't. Facebook learned that lesson. The question is whether OpenAI learns it before going public or after.
From business strategy to security. This next story is genuinely alarming. Security researchers at Prompt Armor disclosed that Snowflake's Cortex Code AI tool could be manipulated through prompt injection to escape its sandbox and execute arbitrary code on a user's machine. Marcus, walk us through what happened.
This was discovered just three days after Cortex Code launched in February. Through prompt injection, an attacker could manipulate the AI into executing shell commands outside its intended sandbox. Once you have code execution, the payload could access cached authentication tokens, run SQL queries with the victim's privileges, and potentially exfiltrate or destroy data. Snowflake patched it on February twenty-eighth, but the attack had roughly fifty percent efficacy during testing. That's not a theoretical concern. That's a coin flip away from full system compromise.
Fifty percent success rate. So every other attempt, the attacker gets in.
And that highlights the fundamental challenge with AI security. These systems are non-deterministic. You can't write a traditional firewall rule against prompt injection because the attack surface changes with every inference call. The Hacker News discussion was scathing. Several commenters questioned whether Snowflake had implemented a real sandbox at all. One said, "If the user has access to a lever that enables access, that lever is not providing a sandbox. Poor security design all around."
We covered the Glassworm supply chain attack on Monday, invisible Unicode malware hiding in GitHub repos. Now we have prompt injection escaping sandboxes. The AI security picture is getting worse, not better.
It's the attack surface expanding faster than the defenses. As AI coding tools proliferate, Cortex Code, Claude Code, GitHub Copilot, Cursor, every one of them represents a potential entry point. And Snowflake's response raises its own questions. Their security advisory requires an account to even read. That's not the transparency the industry needs when we're dealing with novel attack vectors that affect every AI tool builder.
Let's shift to something more constructive. Google DeepMind published a new paper proposing a framework for measuring progress toward AGI. And they're putting money behind it. A two hundred thousand dollar Kaggle hackathon.
The framework identifies ten cognitive abilities they consider essential for general intelligence. Perception, generation, attention, learning, memory, reasoning, metacognition, executive functions, problem solving, and social cognition. The idea is to run AI models and humans through identical benchmarks and generate a cognitive profile mapping strengths and weaknesses empirically.
So instead of every lab claiming they're approaching AGI based on their own benchmarks, DeepMind wants a universal yardstick.
That's the pitch. And the hackathon focuses on the five areas where evaluation gaps are largest. Learning, metacognition, attention, executive functions, and social cognition. Top submissions win ten thousand dollars each, with four grand prizes of twenty-five thousand. Submissions close April sixteenth, results June first.
The inclusion of social cognition is interesting. That's not what most people think of when they hear AGI.
It's a broader definition than the industry typically uses. Most companies define AGI as "beats humans at reasoning and coding." DeepMind is saying that's not enough. True general intelligence includes understanding social dynamics, emotional context, and theory of mind. Whether you agree with that definition or not, whoever sets the benchmarks shapes the race. And if this framework gains adoption, it could fundamentally change how we measure and compare AI systems.
Some Hacker News commenters found it ironic that Google's approach to evaluating AGI is to crowdsource the work to a Kaggle competition.
Fair point. But there's a pragmatic logic to it. These evaluation gaps exist precisely because the research community hasn't figured out how to test them. Paying two hundred thousand dollars for the crowd's best ideas is arguably more honest than claiming you've solved the measurement problem in-house.
Apple news. The Gemini-powered Siri overhaul is targeted for release this month with iOS 26.4. Marcus, this has been a long time coming.
Years. And the big strategic story here isn't the features, it's the partnership. Apple is paying roughly one billion dollars annually for access to Google's Gemini model, a one-point-two trillion parameter system running on Apple's Private Cloud Compute servers. That's Apple acknowledging it can't build a frontier model in-house and choosing to license one from a competitor rather than ship an inferior product.
The new Siri will have on-screen context awareness. Reading what's on your display and acting on it.
Making restaurant reservations from Safari, adding flights from email confirmations, that kind of thing. It's what Siri should have been years ago. But reports from 9to5Mac say some features are already slipping to iOS 26.5 in May and iOS 27 in September. Which continues Apple's pattern of announcing AI features and then delivering them on a delayed timeline.
So Apple chose Google over building its own model. That validates Google's AI capabilities even as Apple admits its own gap.
It's a fascinating strategic triangle. Apple gets a capable AI assistant without the multi-billion dollar R&D investment. Google gets a billion dollars a year and validation from the world's most valuable company. And users, hopefully, finally get a Siri that doesn't embarrass itself when you ask it something more complex than setting a timer.
Quick hit. Google engineers open-sourced Sashiko, an AI system that reviews Linux kernel patches. The headline number: it caught fifty-three percent of bugs when tested against a thousand recent upstream issues. And every single one of those bugs had already passed human code review.
That's a meaningful result for a project that underpins virtually all servers, cloud infrastructure, and Android devices. Sashiko monitors public mailing lists, ingests patches, and generates detailed reviews covering architecture, security, concurrency, and resource management. It's designed for Gemini Pro but works with Claude and other models. Google is moving it to the Linux Foundation.
The false positive rate is the obvious question.
It is. As one commenter put it, you could build a system that flags everything as a bug and claim a hundred percent detection rate. But even with a moderate false positive rate, catching half the bugs that slip past experienced kernel developers is a genuine contribution to infrastructure security.
And finally, a viral essay titled "AI Coding Is Gambling" hit the top of Hacker News with over three hundred points. The argument is that prompting LLMs to write code mirrors gambling psychology. Variable rewards, addictive feedback loops, a false sense of control.
The dopamine hit of getting working code on the first try keeps you pulling the lever rather than deeply understanding what you're building. It connects directly to the Carnegie Mellon study we covered Tuesday showing AI coding tools boost speed temporarily but degrade code quality by forty-two percent permanently. And one commenter adapted Kenny Rogers for the AI age: "You got to know when to ship it, know when to re-prompt, know when to clear the context, and know when to RLHF."
That might be the best comment of the week.
What's worth noting is the scale of the industry being questioned here. Claude Code at two and a half billion ARR, Cursor growing rapidly, Codex expanding. These tools are generating enormous revenue. The question of whether they create gambling-like behavioral patterns isn't just philosophical. It affects code quality, developer wellbeing, and how we train the next generation of engineers.
Thursday big picture. OpenAI is chasing engagement metrics toward an IPO. Prompt injection is escaping sandboxes. DeepMind wants to standardize how we measure intelligence. And developers are asking whether AI coding tools are genuinely productive or psychologically addictive. Marcus, what ties it all together?
Measurement and honesty. OpenAI measuring success by engagement rather than productivity. Snowflake measuring security by sandbox labels rather than actual isolation. The industry measuring AI progress by cherry-picked benchmarks rather than standardized cognitive profiles. DeepMind's framework is an attempt to inject rigor into a conversation drowning in hype. And the gambling essay asks whether developers are measuring their own productivity honestly or chasing the dopamine of quick results. The technology keeps getting more powerful. The question is whether we're getting more honest about what it actually does.
Measure twice, prompt once.
I'd settle for measure once. Right now we're not even doing that.
That's your AI in 15 for Thursday, March 19, 2026. See you tomorrow.