There are also open questions about model costs, depending on how much computing power diffusion requires. For some tasks, such as generating computer code, diffusion will simply be more efficient, said Dave Nicholson, chief analyst at Futurum Group. “All this will eventually be measured against each model’s running costs,” he explained. Once true costs are reflected in pricing (which is not necessarily the case today, as AI companies and their backers fight for market share), customers will become much more selective about choosing the model best suited to the task at hand, Nicholson said.
Besides simple FOMO regarding access to the new model, the excitement stems from the “diffusion” technique the model is based on. Diffusion is a different type of LLM than the kind used in products like ChatGPT; it’s the AI method that gave birth to the first popular AI image-generation tools like DALL-E 2 and Stable Diffusion.
Diffusion models convert random noise—images that look like static on a TV screen— into high-quality images based on text prompts. Until recently, the diffusion technique, which has been described as more like sculpting than writing, had not seen much success in generating text. Instead of predicting text directly like the traditional LLMs we have come to rely on since ChatGPT launched in 2022, diffusion models learn to generate words and sentences by refining random gibberish into coherent text. One of the reasons it can do so very quickly is that it can perform this “de-noising” process across many different parts of the text at the same time.
Traditional LLMs like ChatGPT, on the other hand, are based on a different AI technique known as a Transformer, that researchers at Google pioneered in 2017. Transformers can only generate one “token,” or chunk of text, at a time, from left to right. Each new word depends on all the previous ones and the model can’t skip ahead, nor can it go back and revise the text it generated earlier. (The new “reasoning” models based on Transformers can revise their outputs, but only by generating a completely new sequence. They don’t revise parts of an existing sequence on the fly.) Diffusion models are more holistic: they guess the entire output all at once (though it is gibberish), and refine it all at once. That means they can generate output faster because the model is not working on one word at a time.
There are tradeoffs, however. Some researchers have noted that while diffusion models are fast and flexible, they can only generate text segments of a fixed length, and so may struggle with writing essays or multi-paragraph narratives. Because they don’t build sentences one word at a time, diffusion models can lose the kind of natural flow and logical progression that transformer-based models are optimized for.
When it comes to computer code though, narrative flow is less important than logic and syntax. And forf developers focused on building and shipping, the speed of diffusion model is a big advantage.
The buzz among techies was evident soon after Google showed off the model Tuesday. Gemini Diffusion, said fans on social media, is a model that is “insane” and like “ChatGPT on steroids.” “It’s a bit like getting a draft and then rework/edit it,” said Alexander Doria, cofounder of the Paris-based Pleias, told Fortune in a message. “So much faster, potentially better for some tasks.”
Gemini Diffusion is part of a trajectory that many in the AI field had anticipated, according to Stefano Ermon, an associate professor in the department of computer science at Stanford University who has been working on diffusion models for the past five years. He is also the co-founder of Inception Labs, which announced the first diffusion large language model a few months ago, called Mercury. The model matched the performance of frontier models optimized for speed, while running five to ten times faster.
“Google’s entry into this space validates the direction we’ve been pursuing,” he told Fortune by email. “It’s exciting to see the broader industry embracing these techniques, though we’re already working on training the next generation of text diffusion models.”
Within a few years, he added that he expected “all frontier models will be diffusion models.”
But other experts pointed out that the public still does not have access and that while it is promising, Gemini Diffusion remains a research experiment with few details.
According to Nathan Lambert, of AI2, Gemini Diffusion is the “biggest endorsement yet of the [text diffusion] model, but we have no details so can’t compare well.”