The $8 AI That Just Beat a $2,600 One

May 28, 2026

Subquadratic just built the LLM the industry said wasn’t possible. Here’s what it can actually do.

There is a limit inside every AI system you’ve ever used. You can’t see it, but it’s there: a hard architectural ceiling on how much a model can hold in its head at once. Feed it too much, and it starts forgetting. Push further, and the cost becomes prohibitive. Most of what passes for “AI” today is really artificial intelligence operating under severe memory constraints, propped up by a scaffolding of workarounds nobody particularly wants to talk about.

A Miami-based startup called Subquadratic thinks it’s broken through that ceiling.

On May 5th 2026, the company emerged from stealth with $29 million in seed funding and a single, very loud claim: it has built the first frontier large language model that doesn’t behave the way every other LLM does under the hood. Its model, SubQ, offers a 12 million token context window… at roughly one-fifth the cost of leading frontier models. If those numbers hold up at scale, this is the most significant architectural shift in large language models since the original transformer paper in 2017.

The Problem Nobody Was Solving

To understand what Subquadratic built, you need to understand what everyone else was living with.

Every mainstream AI model today, GPT, Claude, Gemini, is built on something called a transformer architecture. Transformers are extraordinarily capable, but they have a fundamental scaling problem: to understand language, they process every possible relationship between every word in the input. That relationship grows quadratically. Double the context length, and compute quadruples. At the scale of millions of tokens, the numbers become untenable.

The industry adapted by building around this limitation rather than solving it.

Retrieval-augmented generation (RAG) became the dominant workaround. Instead of feeding a model everything, you build a search layer that retrieves the most relevant fragments first, then passes those to the model. It’s effective. It’s also expensive, brittle, and fundamentally limiting. You’re not asking the AI to reason across all of your data. You’re asking it to guess which slice of your data matters, then reason across that.

As Subquadratic’s own team put it: developers spend more time and money on the workarounds than on the actual problem.

The Architectural Breakthrough

SubQ is built on what the company calls Subquadratic Sparse Attention (SSA). Rather than processing every possible relationship between tokens, SSA dynamically routes attention only to the relationships that actually matter, based on content. The result is a model that scales linearly rather than quadratically. At 12 million tokens, that difference reduces attention compute by almost 1,000 times.

In practical terms: SubQ runs 52 times faster than existing state-of-the-art attention mechanisms at one million tokens. On a standard long-context benchmark, it achieved comparable accuracy to Claude Opus at a cost of roughly $8 — versus approximately $2,600 for Opus on the same task. That’s a different cost category entirely.

The benchmarks are promising. On MRCR v2 — a demanding test of multi-round reasoning across long contexts — SubQ scored 86, outperforming Claude Opus (78), GPT-5.4 (37), and Gemini 3.1 Pro (26). Results were third-party validated, though independent external reproduction is still forthcoming.

What This Actually Unlocks

Here is where the architecture stops being a technical story and becomes a business one.

Legal and professional services: A firm running M&A due diligence today feeds documents into a retrieval system, retrieves fragments, and hopes the right clause surfaces. With a 12 million token context window, you feed in the entire data room — every contract, filing, email, disclosure — and ask a single question across all of it.

Software engineering: Today’s AI coding agents operate with a narrow slice of a codebase at a time. They make changes without full visibility into how those changes ripple through thousands of other files. SubQ’s Code product — designed to plug directly into Claude Code, Cursor, and Codex — lets an agent hold an entire repository in context simultaneously, alongside months of pull requests and test history.

Financial analysis: An analyst could load a company’s complete transaction history, every earnings call transcript from the past five years, and all regulatory filings simultaneously and ask nuanced questions across the entire evidence base.

The common thread across all of these: the workaround disappears. You stop engineering around the model’s memory limitations and start asking it the actual question.

The Bet

The team behind Subquadratic comes from Meta, Google, Oxford, Cambridge, and BYU. CEO Justin Dangel and CTO Alex Whedon (previously Head of Generative AI at Meta) founded the company on the thesis that the transformer’s quadratic bottleneck wasn’t a feature of intelligence, it was a constraint on it. Backers include Justin Mateen (Tinder co-founder) and Javier Villamizar, former partner at SoftBank Vision Fund.

The pre-seed valuation is reported at approximately $500 million, an extraordinary number for a company that launched weeks ago, and a signal of how seriously sophisticated investors are taking the architectural argument.

SubQ is currently in early access, with two products: a full-context API for developers and enterprise teams, and SubQ Code, the long-context layer for coding agents. The technical report is forthcoming.

The claims are bold. The benchmarks are promising. Independent verification is still catching up. But Subquadratic’s case is a simple one: they’ve built the first model designed from the ground up for long-context reasoning — and at a $500 million pre-seed valuation, investors are betting they’ve pulled it off.

Discussion about this post

Ready for more?