For all the benefits promised by model routing, many users of GPT-5 bristled at what they perceived as a lack of control; some even suggested OpenAI might purposefully be trying to pull the wool over their eyes.
In response to the GPT-5 uproar, OpenAI moved quickly to bring back the main earlier model, GPT-4o, for pro users. It also said it fixed buggy routing, increased usage limits, and promised continual updates to regain user trust and stability.
Anand Chowdhary, co-founder of AI sales platform FirstQuadrant, summed the situation up bluntly: “When routing hits, it feels like magic. When it whiffs, it feels broken.”
Jiaxuan You, an assistant professor of computer science at the University of Illinois Urbana-Champaign, told Fortune his lab has studied both the promise—and the inconsistency—of model routing. In GPT-5’s case, he said, he believes (though he can’t confirm) that the model router sometimes sends parts of the same query to different models. A cheaper, faster model might give one answer while a slower, reasoning-focused model gives another, and when the system stitches those responses together, subtle contradictions slip through.
The model routing idea is intuitive, he explained, but “making it really work is very non-trivial.” Perfecting a router, he added, can be as challenging as building Amazon-grade recommendation systems, which take years and many domain experts to refine. “GPT-5 is supposed to be built with maybe orders of magnitude more resources,” he explained, pointing out that even if the router picks a smaller model, it shouldn’t produce inconsistent answers.
Still, You believes routing is here to stay. “The community also believes model routing is promising,” he said, pointing to both technical and economic reasons. Technically, single-model performance appears to be hitting a plateau: You pointed to the commonly cited scaling laws, which says when we have more data and compute, the model gets better. “But we all know that the model wouldn’t get infinitely better,” he said. “Over the past year, we have all witnessed that the capacity of a single model is actually saturating.”
Economically, routing lets AI providers keep using older models rather than discarding them when a new one launches. Current events require frequent updates, but static facts remain accurate for years. Directing certain queries to older models avoids wasting the enormous time, compute, and money already spent on training them.
There are hard physical limits, too. GPU memory has become a bottleneck for training ever-larger models, and chip technology is approaching the maximum memory that can be packed onto a single die. In practice, You explained, physical limits mean the next model can’t be ten times bigger.
William Falcon, founder and CEO of AI platform Lightning AI, points out that the idea of using an ensemble of models is not new—it has been around since around 2018—and since OpenAI’s models are a black box, we don’t know that GPT-4 did not also use a model routing system.
In a press pre-briefing, OpenAI CEO Sam Altman touted the model router as a way to tackle what had been a hard to decipher list of models to choose from. Altman called the previous model picker interface a “very confusing mess.”
But Falcon said the core problem was that GPT-5 simply didn’t feel like a leap. “GPT-1 to 2 to 3 to 4 — each time was a massive jump. Four to five was not noticeably better. That’s what people are upset about.”
Robert Nishihara, CEO of AI production platform Anyscale, says scaling is still progressing in AI, but the idea of one all-powerful AI model remains elusive. “It’s hard to build one model that is the best at everything,” he said. That’s why GPT-5 currently runs on a network of models linked by a router, not a single monolith.
OpenAI has said it hopes to unify these into one model in the future, but Nishihara points out that hybrid systems have real advantages: you can upgrade one piece at a time without disrupting the rest, and you get most of the benefits without the cost and complexity of retraining an entire giant model. As a result, Nishihara thinks routing will stick around.
Aiden Chaoyang He agrees. In theory, scaling laws still hold — more data and compute make models better — but in practice, he believes development will “spiral” between two approaches: routing specialized models together, then trying to consolidate them into one. The deciding factors will be engineering costs, compute and energy limits, and business pressures.
The hyped-up AGI narrative may need to adjust, too. “If anyone does anything that’s close to AGI, I don’t know if it’ll literally be one set of weights doing it,” Falcon said, referring to the “brains” behind LLMs. “If it’s a collection of models that feels like AGI, that’s fine. No one’s a purist here.”