AI in 15 — March 13, 2026

Kate

An AI just solved math problems that stumped the world's best mathematicians for over a decade. And it didn't use any technique a human designed. It invented its own.

Kate

Welcome to AI in 15 for Friday, March 13, 2026. I'm Kate, your host.

Marcus

And I'm Marcus, your co-host.

Kate

Happy Friday the 13th, Marcus. We've got a great show. Google DeepMind's AlphaEvolve just cracked five classical math problems that haven't budged in a decade. Anthropic shipped interactive visuals inside Claude conversations. NVIDIA just dropped two billion dollars on an AI cloud startup. A grandmother spent a hundred and eight days in jail because facial recognition got it wrong. A major study says AI is making people work more, not less. And the creator of Kotlin just launched a programming language designed for the AI age. Let's preview.

Kate

AlphaEvolve improves the bounds on five Ramsey numbers by inventing its own search strategies.

Kate

An innocent woman loses her home, her car, and her dog after AI misidentifies her face.

Kate

And a new analysis says LLM coding abilities have quietly stopped improving. Let's get into it.

Kate

Marcus, AlphaEvolve. Google DeepMind published results on Tuesday showing their system improved lower bounds for five classical Ramsey numbers. For people who haven't thought about combinatorics since college, explain why this is a big deal.

Marcus

Ramsey numbers are among the hardest problems in combinatorics. They describe the minimum size a structure needs to be before a certain pattern is guaranteed to appear. The bounds for these numbers have been notoriously difficult to improve. Some of the records AlphaEvolve just broke had stood for over a decade. The famous mathematician Paul Erdos once joked that if aliens demanded we compute a specific Ramsey number or they'd destroy Earth, we should marshal all our resources to solve it. These are genuinely hard problems.

Kate

And what makes AlphaEvolve's approach different from just throwing compute at the problem?

Marcus

This is the key part. AlphaEvolve didn't solve these problems using a human-designed algorithm. It's an evolutionary coding agent powered by Gemini that searches for the search strategies themselves. It generates candidate algorithms, evaluates them, and uses language models to mutate the promising ones. So it's not just finding better answers, it's discovering the methods for finding better answers. That meta-algorithmic layer is what makes this genuinely novel.

Kate

Demis Hassabis called it a big milestone for AI in mathematics.

Marcus

And he's right. Beyond the five new records, AlphaEvolve also recovered known exact values for all Ramsey numbers where we already know the answer, and matched best-known results across many other cases. That consistency is important. It means the system isn't just getting lucky on a few problems. It has a general capability for this class of mathematics. This is AI making contributions to pure math that push the frontier of human knowledge, not just verifying what we already knew.

Kate

This ties back to the AlphaGo anniversary we covered Wednesday. Hassabis warned about self-learning AI, and now his own lab is demonstrating exactly that capability in mathematics.

Marcus

The trajectory from AlphaGo to AlphaFold to AlphaEvolve is remarkable. Game-playing to protein folding to pure mathematics, each time applying self-discovery methods to harder domains. And the new DeepMind headquarters, Platform 37, named after AlphaGo's legendary Move 37, opens this summer. Ten years from a Go move that shocked the world to an AI that invents mathematical search strategies. The naming is deliberate and, honestly, earned.

Kate

Now for a story that's harder to celebrate. Angela Lipps, a grandmother from Tennessee, spent a hundred and eight days in jail because AI facial recognition identified the wrong person.

Marcus

She was arrested at gunpoint by U.S. Marshals last July for bank fraud in Fargo, North Dakota. The problem? Bank records proved she was over twelve hundred miles away at her home in Tennessee when the crimes were committed. A detective used AI facial recognition software, looked at her social media and driver's license photo, and wrote that she "appeared to be the suspect." The surveillance footage apparently showed a significantly younger woman.

Kate

A hundred and eight days. And the consequences went beyond jail time.

Marcus

She lost her home, her car, and her dog while incarcerated. Officers from North Dakota didn't even pick her up from her Tennessee jail until October, three and a half months after the arrest. She spent nearly six months locked up before charges were dismissed. And here's the pattern that makes this systemic, not isolated. This is the eighth documented wrongful arrest involving facial recognition in the United States. Seven of the eight victims have been Black.

Kate

That's not a coincidence.

Marcus

It's not. A NIST study found these systems are ten to a hundred times more likely to misidentify Black and Asian faces compared to white faces. The technology has a measurable, documented racial bias. And in this case, the detective didn't even do basic due diligence. A simple check of the suspect's location would have cleared Lipps immediately. The AI gave a wrong answer, and the human just accepted it. That combination of biased technology and lazy verification is where the real danger lives.

Kate

We covered the McKinsey hack yesterday, where an AI agent found a vulnerability from the nineties. Here we have facial recognition ruining someone's life because nobody questioned the machine's output.

Marcus

Different domains, same underlying problem. AI tools are being deployed with a level of trust they haven't earned. When the AI is wrong in a consulting platform, you get a data breach. When it's wrong in law enforcement, you get an innocent person in jail for months. The stakes demand different safeguards, and those safeguards clearly don't exist yet.

Kate

Anthropic news. Claude can now generate interactive visualizations right inside conversations. Charts, diagrams, timelines, all clickable and explorable.

Marcus

It uses HTML, SVG, and React to render inline graphics. Users can hover, click, adjust parameters. Think interactive compound interest curves, clickable periodic tables, decision tree diagrams. Claude decides on its own when a visual would help, though you can also just ask for one. Available across all pricing tiers.

Kate

How is this different from Artifacts, which Claude already had?

Marcus

Artifacts are persistent, downloadable, shareable. These inline visuals are temporary conversation aids designed to help you understand something in the moment. Early reactions have been enthusiastic. One user on Hacker News described Claude creating beautiful tabbed interactive charts completely unprompted during portfolio analysis. Though some users on lower-tier plans reported hitting usage limits before visuals could finish rendering.

Kate

It's a meaningful expansion of how an AI can communicate. Not just telling you about data but showing you.

Marcus

And it positions Claude as more than a text generator. If your AI assistant can create an interactive visualization that helps you understand a concept faster than three paragraphs of explanation, that changes how people learn and make decisions through these interfaces.

Kate

NVIDIA invested two billion dollars in Nebius Group for an eight percent stake. Marcus, who is Nebius and why does NVIDIA care?

Marcus

Nebius is an Amsterdam-based company building what NVIDIA calls neoclouds, purpose-built cloud infrastructure optimized specifically for AI workloads. Not general-purpose like AWS or Azure, but designed from the ground up for training and running AI models. Together they plan to deploy five gigawatts of data center capacity by 2030. Nebius already has approval for a one-point-two-gigawatt facility near Independence, Missouri.

Kate

Five gigawatts. For context, that's roughly the power consumption of a small country.

Marcus

And it illustrates NVIDIA's strategy of investing in the ecosystem around its chips, not just selling GPUs. By backing neocloud providers, NVIDIA ensures demand for its hardware while building out infrastructure the industry needs. Nebius stock jumped sixteen percent on the announcement. And with GTC kicking off Monday and Jensen's keynote coming, expect more infrastructure deals like this. Every data center decision in the industry is waiting on what NVIDIA reveals about Vera Rubin next week.

Kate

A study from ActivTrak analyzed over a hundred and sixty thousand employees and concluded that AI does not reduce workloads. Period.

Marcus

The numbers are striking. After AI adoption, email volume increased a hundred and four percent. Chat and messaging surged a hundred and forty-five percent. Time in business management tools rose ninety-four percent. The pattern is clear. AI accelerated individual tasks, but the freed time was immediately filled with more work. Management used the productivity gains to increase output expectations.

Kate

Amazon employees are saying the same thing. Over a thousand signed a petition calling the company's mandatory AI tools half-baked.

Marcus

One developer said, and I quote, "I and many of my colleagues don't feel that it actually makes us that much faster." The tools generate errors that need manual verification and correction, which extends task completion time. Harvard Business Review published a supporting piece titled "AI Doesn't Reduce Work, It Intensifies It." This undercuts the core selling point of enterprise AI. The promise was that AI frees people for higher-value work. The reality appears to be that AI makes people run faster on a bigger treadmill.

Kate

As we covered yesterday, Atlassian just cut sixteen hundred jobs citing AI investment. The workers losing jobs might want to look at these numbers.

Kate

Here's one that sparked a massive debate. A new analysis claims LLM coding abilities have actually stopped improving in real-world terms.

Marcus

The blog post examines SWE-bench data and finds that while models score higher on automated benchmarks, the rate at which their pull requests are actually merged by human maintainers has stagnated since early 2025. Research from METR backs this up. There's a twenty-four-percentage-point gap between automated scores and maintainer merge rates. Automated grading improves about fifteen points per year. Actual merge rates improve at only five.

Kate

So the models are getting better at passing tests but not at writing code humans would ship.

Marcus

That's the argument. Critics point out this doesn't capture newer models that haven't been measured yet, and that the subjective experience of using coding agents has improved substantially. One Hacker News commenter captured the contradiction perfectly: "Something happened during 2025 that made the models much better. I only type in the terminal anymore. But the quality of the code is still quite often terrible." If the gap between benchmark performance and production readiness keeps growing, the industry needs better evaluation approaches.

Kate

The creator of Kotlin launched something called CodeSpeak. It's a new programming approach designed for the AI era.

Marcus

Andrey Breslav's idea is a middle layer between natural language prompting and traditional code. You write structured specification files describing what a program should do, and CodeSpeak compiles those specs into working Python, Go, or JavaScript. The philosophy is that you capture only what the human uniquely knows, and the machine handles everything else. He claims roughly ten-x code reduction.

Kate

The Hacker News debate was fierce. Two hundred and eighty-seven points, two hundred and forty-six comments.

Marcus

The core skepticism is valid. If a spec has to be precise enough to generate correct code, isn't it just programming with extra steps? And since LLMs aren't deterministic, the same spec might produce different code each run. But the pedigree is real. Breslav built one of the most successful modern programming languages. Whether CodeSpeak works or not, it's asking the right question. What's the right interface between human intent and machine-generated code?

Kate

Last one. RAG document poisoning. A technical analysis is getting serious attention in the security community.

Marcus

The research shows that adding just five malicious documents to a corpus of millions causes a RAG system to return attacker-controlled false answers ninety percent of the time for targeted questions. Poisoning zero-point-zero-four percent of a corpus achieves a ninety-eight percent attack success rate. And prompt hardening barely helps, only reducing success from ninety-five to eighty-five percent. The fundamental problem is the model has no way to distinguish retrieved documents from system instructions.

Kate

After the McKinsey hack story yesterday, this is another reminder that AI security is way behind AI deployment.

Marcus

The attack surface keeps expanding. Yesterday it was SQL injection in AI platforms. Today it's poisoned knowledge bases. Email is identified as a particularly easy vector. If an agent reads emails as context, anyone can send instructions embedded in a message. As enterprises rush to deploy RAG systems for internal knowledge management, this is a critical vulnerability most haven't even considered.

Kate

Friday big picture, Marcus. AI inventing math strategies. AI wrongly jailing grandmothers. AI increasing workloads instead of reducing them. What's the thread?

Marcus

The gap between what AI can do at its best and what happens when it's deployed carelessly. AlphaEvolve is AI at its most impressive, pushing the boundaries of human mathematical knowledge with techniques no human designed. Angela Lipps is AI at its worst, a biased system rubber-stamped by a lazy detective destroying an innocent person's life. And the workload study sits in the middle, showing that even when AI works as intended, the human systems around it can turn productivity gains into productivity pressure.

Kate

The technology isn't the bottleneck anymore. It's how we choose to use it.

Marcus

Exactly. AlphaEvolve succeeds because DeepMind built careful evaluation systems around it. Facial recognition fails because law enforcement deploys it without safeguards. Enterprise AI intensifies work because management captures the gains instead of sharing them. The AI is only as good as the human decisions surrounding it. And right now, those decisions range from brilliant to negligent, sometimes in the same week.

Kate

That's your AI in 15 for Friday, March 13, 2026. Have a great weekend. See you Monday.