The city of San Sebastián, in Spain’s Basque region, is a relaxed surfers’ haven that feels a world removed from any war. Yet atop a pine-forested hill overlooking the city, engineers in a conference room at Multiverse Computing are training their focus on combat of the kind raging at the other end of Europe, in Ukraine. They’re demonstrating one of their latest creations: a small AI model designed to help drones communicate from high above a chaotic battlefield.
Multiverse, like its AI models, is currently small—predicted sales this year are a modest $25 million. But it’s on to a big idea. Its work focuses on compressing large language platforms, or LLMs, and creating smaller models, in the belief that most consumers and business customers can do just fine with lower-powered but thoughtfully designed AI that needs less power and fewer chips to run.
“There is a big problem with the way we are doing AI,” says Román Orús, 42, Multiverse’s chief scientific officer. “It is fundamentally wrong.” He and Lizaso see an opportunity to get it right, while it’s still early days for the technology.
Orús and Lizaso believe that the AI arms race is foolish. They argue that the great majority of AI users have constrained needs that could be met with small, affordable, less energy-hungry models. In their view, millions are unnecessarily downloading giant LLMs like ChatGPT to perform simple tasks like booking air tickets or solving arithmetic problems.
Six years after its launch, Multiverse now calls its products “quantum inspired”: The team uses quantum-physics algorithms to train regular computers, a combination they say enables faster, smarter operations than traditional programming does. These algorithms enable Multiverse to create SLMs—models that can operate on a few million parameters, rather than the billions found in LLMs.
Multiverse’s core business is compressing open-source LLMs with such extreme shrinkage that most of its versions can run on CPUs, or central processing units, of the kind used in smartphones and regular computers, rather than GPUs, or graphics processing units. Because it works with open-source models, it doesn’t need the LLMs creators’ cooperation to do the shrinking.
In August, the company launched another product in its “model zoo,” called ChickenBrain, a compressed version of Meta’s Llama 3.1 model that includes some reasoning capabilities. Intel’s senior principal Stephen Phillips, a computer engineer, says Intel has chosen to work with Multiverse among others because “its models did not appear to lose accuracy when compressed, as SLMs often do.”
Switching AI applications to small, CPU-based models might stem that trend, according to Multiverse. Lizaso believes tech companies are less concerned about the environment than the costs. But the two issues are converging. “If green means cheaper, they are fully green,” he says. “The energy crisis is coming.”
Some experts question Multiverse’s claim that for most people, they are just as good as LLMs running on GPUs.. “That’s a big statement that no one has proven yet,” says Théo Alves Da Costa, AI sustainability head at Ekimetrics, an AI solutions company in Paris. “When you use that kind of compression, it is always at the cost of something.” He says he has not found a small language model capable of working in French as well as an LLM, for example, and that his own tests found that models slowed down markedly when switching to CPUs. It’s also generally the case that open-source models of the kind that Multiverse compresses don’t perform quite as well as proprietary LLMs.
Multiverse says that, thanks to compression, its version will be cheaper still. And true to its desire to “challenge the status quo,” Multiverse has tweaked DeepSeek in another way, too, removing government-imposed censorship. Unlike on the original LLM, users will be able to gain access to information about politically charged events like the 1989 massacre of protesters in Beijing’s Tiananmen Square. “We have removed its filters,” says Lizaso—another parameter stripped away.