← Home AI in 15

AI in 15 — July 05, 2026

July 5, 2026 · 12m 07s
Kate

A free AI model you can download for nothing just beat OpenAI's best on a real coding benchmark. It was trained entirely on Chinese chips. And here's the kicker — developers were already using it, and loving it, before anyone knew where it came from.

Kate

Welcome to AI in 15 for Sunday, July fifth, 2026. I'm Kate, your host.

Marcus

And I'm Marcus, your co-host. Slow Sunday on the calendar, but the technology news did not get the memo.

Kate

It really didn't, Marcus. Our lead today is the model we flagged yesterday as one to watch — LongCat 2.0, out of China, and the verification is starting to come in. Then three more.

Kate

Developers found the smoking gun that OpenAI's Codex has been quietly getting dumber — and it comes down to one very specific number.

Kate

Anthropic is reportedly in talks with Samsung to build its own custom chip.

Kate

And Sam Altman goes looking for a whole new world order for AI, as OpenAI loses ground at home.

Kate

Lead story, Marcus. LongCat 2.0. We name-checked this yesterday. What actually landed?

Marcus

So this is out of Meituan, Kate — China's giant food-delivery company, which is not who you'd expect to ship a frontier model. And they open-sourced it, under a permissive license. It's a mixture-of-experts model, one-point-six trillion total parameters, though only about forty-eight billion fire on any given token, so it's efficient to run. Native one-million-token context window. And on SWE-bench Pro — that's a hard, real-world software-engineering test — it scored fifty-nine-point-five, edging out GPT-5.5's fifty-eight-point-six.

Kate

So a free download just beat a US flagship on real coding work.

Marcus

On that one benchmark, yes, Kate. And I want to be careful, because these are vendor-reported numbers, and SWE-bench Pro is one narrow slice of what a model does. But it's not the only score — seventy-point-eight on Terminal-Bench, seventy-seven on multilingual coding. This isn't a fluke on a single test.

Kate

Now the detail everyone fixated on — the chips.

Marcus

That's the load-bearing claim, Kate. Meituan says LongCat was trained end to end on a cluster of roughly fifty thousand domestic Chinese AI chips. No Nvidia at the top of the stack. They didn't name the chipmaker, but the software they used — Huawei's communication library — points hard at Huawei's Ascend hardware. If that holds up under independent scrutiny, it's the most concrete crack yet in the whole export-control theory. The bet in Washington was that denying China the best chips would keep frontier training out of reach. This says maybe not.

Kate

And you flagged the OpenRouter twist, which I love.

Marcus

It's the best part, Kate. Before Meituan revealed what it was, the model was quietly deployed on the OpenRouter marketplace under the anonymous name "Owl Alpha." Developers picked it up and started using it purely on merit — it was fast, it was good — with no idea it was Chinese. So it earned its adoption blind, before any flag was attached to it. That's a very different story from a government press release.

Kate

Okay, keep me honest. This connects straight back to the Alibaba story we covered yesterday, doesn't it?

Marcus

It does, and this is where I'd stay skeptical, Kate. Remember, Anthropic told the Senate that Alibaba ran nearly twenty-nine million queries against Claude to distill its capabilities. So the open question hanging over every one of these Chinese models is — is the catch-up organic engineering, or is it partly farmed from Western labs? And the honest answer, for LongCat specifically, is we can't tell yet. There's no evidence it was distilled. But you can't rule it out either. What's not in dispute is the trajectory — a capable model is now free to download, and the cost of intelligence keeps collapsing. DeepSeek's latest is reportedly running something like a hundred times cheaper than GPT-5.5 on output. Capability is commoditizing whether the West likes it or not.

Kate

Alright, next one, Marcus, and this is a proper detective story. OpenAI's Codex — its coding tool — has been quietly getting worse, and one developer found the fingerprint. Walk me through it.

Marcus

I love this one, Kate, because it's data instead of vibes. A developer pulled nearly four hundred thousand response records across eight hundred and sixty-five Codex sessions and noticed something bizarre. GPT-5.5's responses disproportionately stop at exactly five hundred and sixteen reasoning tokens. Not around there — exactly there. With smaller clusters at ten-thirty-four and fifteen-fifty-two, each spaced about five hundred and eighteen apart.

Kate

Wait — exactly the same number, over and over? That's not random.

Marcus

That's the tell, Kate. Real reasoning doesn't stop on a perfectly round threshold. GPT-5.5 is only about nineteen percent of the responses in that dataset, but it accounts for eighty-two percent of these exact-516 events. And when a run hits that ceiling and stops, it strongly correlates with getting the hard questions wrong. Meanwhile the average amount of reasoning has quietly fallen off a cliff — mean intensity dropped from about two hundred and sixty-eight tokens in February to a hundred and seven by May.

Kate

So the model is thinking less, and there's a hard cap kicking in.

Marcus

That's what the numbers suggest, Kate. And the reason this matters beyond one grumpy developer — this is the recurring, under-covered story of the whole frontier era. Silent, unannounced changes to models that people build real businesses on. Was it a deliberate cost-saving move? Fewer reasoning tokens means cheaper inference. Or is it an unintended bug? From the outside, you genuinely can't tell. And that ambiguity is corrosive, because premium pricing depends on trust.

Kate

And OpenAI's said what?

Marcus

Nothing documented yet, Kate. The thread hit the top of Hacker News, developers piled on with their own stories of daily quality "step-downs," and some people who'd switched to Codex from Claude Code are now eyeing the exit again. But here's the silver lining the community actually pointed to — because Codex is open source, this kind of problem can be surfaced and argued out in public, with real telemetry. That's reproducible evidence winning over "the model feels worse lately." I'll take that trade.

Kate

Okay, business story, Marcus, though it's really a hardware story. Anthropic and Samsung.

Marcus

Right, so per The Information, Anthropic has held early-stage talks with Samsung to manufacture a custom AI chip, Kate. And they're reportedly eyeing Samsung's two-nanometer process — that's the cutting edge — plus advanced packaging. Now, big caveat: this is early. Anthropic apparently hasn't decided what the chip is even for, how powerful it'll be, or exactly how it slots into the server. And they were at pains to say Nvidia GPUs, Google's TPUs, and Amazon's Trainium all stay central to the strategy.

Kate

So why bother, if they've already got three chip sources?

Marcus

Because everyone at the frontier is discovering the same thing, Kate — owning the silicon, or at least having your own second source, is becoming a strategic necessity. OpenAI's doing it, Amazon's doing it. A custom chip would likely handle specific Claude inference workloads at massive scale — inference being the recurring cost of actually answering queries, forever, as opposed to training the model once. And when your compute bill is ballooning into the tens of billions, shaving the cost of every single answer compounds enormously.

Kate

It pairs with that Meta number we keep coming back to.

Marcus

It's the same pressure, Kate — Meta's talking a hundred and forty-five billion in capital spending this year. When the numbers get that big, nobody wants to route every dollar through a single vendor. So you get this quiet arms race underneath the model race — everybody trying to depend on Nvidia just a little bit less. Samsung, for its part, would love the win; its chip foundry has been chasing exactly this kind of marquee customer.

Kate

Last one, Marcus, and it's a callback with a new frame. Fortune's got a piece on Sam Altman seeking, quote, a new world order for AI.

Marcus

Right, and the concrete piece underneath the headline is one we broke down on Friday, Kate — the reported idea that OpenAI would hand Washington a roughly five percent stake, about forty-two-and-a-half billion dollars, with other leading labs pooling similar slices into a sovereign-wealth-style fund modeled on Alaska's oil fund. What's new in the Fortune framing is the why-now — they paint Altman doing this precisely because OpenAI is slowly losing ground at home to Google and Anthropic.

Kate

So the grand vision and the competitive squeeze arrive together.

Marcus

And I'd read those two things as connected, Kate, not coincidental. When you're out front, you preach light-touch. When you feel the lead slipping, suddenly you're floating a new global structure with the government as a partner. I'd keep the same skepticism we had Friday, though — this is reported, it's conceptual, and handing government an equity stake in the labs it's supposed to regulate blurs a line that really shouldn't be blurry. A state that profits from AI has a very different incentive to police it. Worth watching precisely because it's such an odd idea to take seriously — and yet here we are, taking it seriously.

Kate

Raised eyebrow, then.

Marcus

Firmly raised, Kate. A trial balloon from a company that would love a friendlier reception in Washington. File it under "watch, don't believe yet."

Kate

One to watch tomorrow, Marcus.

Marcus

Same as our lead, Kate — LongCat 2.0, and specifically whether independent researchers confirm that "trained end to end on domestic Chinese chips" claim. If it holds, that's the most concrete evidence yet that export controls aren't the wall people assumed. Watch for verification, or pushback, in the coming days.

Kate

Agree, or counter?

Marcus

Agree, Kate — but I'd add a companion. Watch whether OpenAI says a single word about that Codex 516-token bug. Silence tells you as much as a fix would. If a model people pay premium prices for is quietly thinking less, the answer to "did you change it" is the whole ballgame.

Kate

That's your AI in 15 for today. See you tomorrow.