AI in 15 — March 25, 2026
OpenAI just killed Sora and a billion-dollar Disney deal died with it. Six months from launch to shutdown. That might be a record, even for Silicon Valley.
Welcome to AI in 15 for Wednesday, March 25, 2026. I'm Kate, your host.
And I'm Marcus, your co-host.
Marcus, Wednesday is packed. OpenAI is shutting down Sora and Disney is walking away from a billion-dollar investment. The LiteLLM supply chain attack we first covered Sunday just got dramatically worse. Arm is making its own chips now with a 136-core monster. Google's new embedding model lets you search raw video in under a second. There's an open-source tool that lets you run models way too big for your Mac's memory. And developers are asking a very pointed question: where are all the AI apps? Let's get into it.
OpenAI kills Sora. Disney's billion-dollar deal collapses.
The LiteLLM attack escalates with Kubernetes credential theft.
And Arm builds its first chip ever, a 136-core data center CPU.
Marcus, let's start with Sora because this is a genuinely shocking reversal. Six months ago, this was one of the most hyped AI products on the planet. Now it's dead. What happened?
OpenAI says they're "simplifying their portfolio," which is corporate speak for admitting they spread themselves too thin. Sora launched as a standalone app after that incredible preview back in February 2024 that had everyone convinced AI video was about to transform Hollywood. But the reality never matched the demo. The product struggled with consistency, had safety challenges, and faced intense competition from Google's Veo and Runway, which kept iterating while Sora tried to be a consumer product.
And the Disney deal. A billion dollars just evaporating. That's not a small thing.
It's a massive blow to OpenAI's narrative. Disney had agreed to license Mickey Mouse, Cinderella, their iconic characters for use on Sora, plus take a billion-dollar equity stake in OpenAI. But no money ever changed hands. The deal was never finalized. Disney's statement was diplomatically brutal. They "respect OpenAI's decision to exit the video generation business." That's Disney politely saying, you pulled the rug out from under us.
This is interesting timing given that just Monday we were talking about OpenAI's twenty-five billion in annualized revenue and a potential trillion-dollar IPO. Does this change that picture?
It should raise questions. Sam Altman was announcing massive partnerships left and right six months ago. Now one of the biggest is dead. The IPO narrative depends on OpenAI being the platform that everyone builds on. When a partner like Disney walks away, even if it's because the product was discontinued rather than because the relationship failed, it signals that OpenAI's product strategy isn't as stable as investors might want. You can't pitch yourself as a reliable enterprise partner and then shut down products six months after launch.
Hacker News noticed something odd too. OpenAI published a Sora safety primer literally the day before the shutdown.
Which suggests this decision may have been abrupt, even internally. You don't publish safety documentation for a product you're about to kill unless the left hand doesn't know what the right hand is doing. The AI video generation market isn't dead. Google, Runway, and others are pushing forward. But OpenAI just ceded the entire space. That's a strategic retreat that's going to be hard to explain to investors.
From one OpenAI story to an attack that could have leaked OpenAI's API keys along with everyone else's. The LiteLLM supply chain attack. We first covered the Trivy compromise on Sunday, but Marcus, this has escalated significantly.
Dramatically. LiteLLM is one of the most popular Python libraries for working with multiple LLM APIs. Three point four million downloads per day. The same TeamPCP attackers who compromised Trivy used that access to inject credential-stealing malware into LiteLLM versions 1.82.7 and 1.82.8, published to PyPI on Monday.
Walk me through what the malware actually does because this is sophisticated.
Two stages. Version 1.82.7 embedded a base64-encoded payload inside the proxy server module. It executes whenever anything imports litellm.proxy. Version 1.82.8 escalated by adding a special .pth file to site-packages, which is a mechanism that fires on every Python interpreter startup. No import required. Just starting Python on a compromised machine triggers the malware. It dumps environment variables, queries cloud metadata endpoints, encrypts everything with a 4096-bit RSA key, and exfiltrates to a domain that looks like legitimate LiteLLM infrastructure but isn't.
And the Kubernetes part. That's what makes this truly scary.
If it detects it's running in Kubernetes, it reads all cluster secrets across every namespace and attempts to create privileged pods on every node. Think about what that means for a production AI deployment. Your LLM proxy, which by definition has API keys for OpenAI, Anthropic, and every other provider you use, just handed all of those credentials to an attacker. Plus every other secret in your Kubernetes cluster.
As we reported Sunday, the attack vector was Trivy, the security scanner. So the fix isn't just updating LiteLLM.
No. LiteLLM has deleted all their PyPI publishing tokens and is moving to trusted publishing via JWT tokens. But any organization that ran the compromised versions needs to rotate every credential in that environment. Every API key, every cloud token, every Kubernetes secret. And they need to audit whether the attacker used the initial access to establish persistence. This is going to be an expensive cleanup for a lot of companies.
Let's shift to hardware. Arm, the company that licenses chip designs to basically everyone, just announced it's going to start making and selling its own silicon. A 136-core data center CPU. Marcus, this is historic for a thirty-five-year-old company that has never done this before.
It's a fundamental transformation of Arm's business model. They've always been the Switzerland of semiconductors. License the architecture, let everyone else build the chips. Now they're competing directly with their own licensees. The chip itself is impressive. 136 Neoverse V3 cores on TSMC 3nm, three point seven gigahertz, 300-watt TDP, twelve channels of DDR5 memory delivering over 825 gigabytes per second of bandwidth, and 96 lanes of PCIe 6.0.
And they're calling it the AGI CPU. Which does not stand for what everyone thinks it stands for.
It stands for "Agentic AI Infrastructure." Which is clearly, deliberately designed to make people think of Artificial General Intelligence. Hacker News commenters called it "bordering on securities fraud" and honestly, I'm sympathetic to that criticism. It's cynical marketing that exploits investor confusion. But the underlying product is genuinely significant. In liquid-cooled rack configurations, you get over forty-five thousand cores per rack. That's more than double the core density of Nvidia's competing Vera racks.
Meta is the lead customer?
Meta, plus OpenAI, Cerebras, Cloudflare, SAP, and others. That's a who's-who of AI infrastructure. This directly threatens Nvidia's Grace CPU, AMD's EPYC, and Intel's Xeon in the data center. And it gives Arm recurring silicon revenue instead of just licensing fees, which could dramatically change their financial profile.
Google quietly released something that I think is going to be huge. Gemini Embedding 2 is the first model that natively embeds text, images, video, and audio into a single vector space. A developer built sub-second video search with it over the weekend. Marcus?
This is a fundamental capability unlock. Previously, if you wanted to search video content semantically, you had to extract frames, caption them with one model, embed the captions with another model, and search the text embeddings. Lossy, slow, and expensive. Gemini Embedding 2 takes raw video pixels and projects them directly into a 768-dimensional vector space. No middleman.
And someone built a working demo that got almost three hundred points on Hacker News.
SentrySearch. A CLI tool that indexes hours of footage into a vector database, then lets you search with natural language and auto-trims matching clips. Sub-second retrieval. Early adopters are reporting seventy percent latency reduction and twenty percent recall improvement over the old multi-model approach. The dual-use implications are significant though. This is incredibly powerful for media production and content management. It's also incredibly powerful for surveillance. Every security camera and dashcam just became searchable with natural language.
Quick hit on a clever open-source project. Hypura lets you run AI models that are too big for your Mac's memory by streaming weights from your SSD. Marcus, after Flash-MoE on Monday and the iPhone demo yesterday, local inference keeps getting more creative.
Hypura profiles your hardware, figures out the bandwidth of your GPU, RAM, and NVMe drive, then optimally places each tensor on the right tier. GPU gets attention layers and embeddings. RAM gets overflow. NVMe streams the rest with prefetching. A 31 gigabyte Mixtral runs on a 32 gig Mac Mini at two point two tokens per second. A 40 gigabyte Llama 70B runs at zero point three tokens per second. Not fast, but where vanilla llama.cpp would just crash, you actually get output.
The trajectory from Monday to today is remarkable. Flash-MoE, then iPhone inference, now NVMe streaming.
The local inference community is solving this problem from every angle simultaneously. It's slow, it's hacky, but the direction is clear. The gap between what you need a data center for and what you can run at home shrinks every week.
Last one, and I think this is an important temperature check. Two posts went viral on Hacker News this week. One asking "where are all the AI apps?" and another titled "Is anybody else bored of talking about AI?" Six hundred and fourteen points on that second one. Marcus, is the shine wearing off?
It's more nuanced than AI fatigue. The answer.ai analysis looked at the top fifteen thousand PyPI packages and found relatively few AI-native applications. But commenters pointed out that iOS app submissions jumped twenty-four percent in 2025, the first meaningful increase in years. So apps are being built, just maybe not as Python libraries. The more interesting signal is the "bored of talking about AI" post. The top comment nailed it. It's incredibly easy to get an idea to prototype stage now, but making it production-ready still needs boring old software engineering. Not a single person they know who followed the vibe-coding-my-own-business trend actually shipped.
So the gap between demo and product is the real story.
Exactly. AI is spectacular at generating prototypes and demos. The hard, unglamorous work of making software reliable, scalable, and maintainable hasn't been automated. And that's where actual value lives. The skepticism is healthy. Show me the shipped product, not the impressive demo.
Wednesday big picture. OpenAI retreats from video and loses Disney. A supply chain attack compromises the AI ecosystem's plumbing. Arm enters silicon manufacturing. Video becomes searchable in milliseconds. And developers ask where the actual products are. Marcus, what ties this together?
The difference between announcements and outcomes. Sora was announced to rapturous applause and died in six months. The LiteLLM attack shows that the infrastructure we've been announcing as enterprise-ready has fundamental security gaps. Arm announces a chip with deliberately misleading AGI branding. And the developer community is finally asking the uncomfortable question: where are the products that match all these announcements? The companies and projects that survive this phase will be the ones where the product matches the press release. Flash-MoE works. SentrySearch works. Hypura works. Small tools, real results, no hype. That's where the actual future is being built.
Less sizzle, more steak.
The sizzle got us here. The steak is what keeps us going.
That's your AI in 15 for Wednesday, March 25, 2026. See you tomorrow.