The company says it spent just $534,700 renting the data center computing resources needed to train M1. This is nearly 200-fold cheaper than estimates of the training cost of ChatGPT-4o, which, industry experts say, likely exceeded $100 million (OpenAI has not released its training cost figures).
The difference may be that independent developers have yet to confirm MiniMax’s claims about M1. In the case of DeepSeek’s R1, developers quickly determined that the model’s performance was indeed as good as the company said. With Butterfly Effect’s Manus, however, the initial buzz faded fast after developers testing Manus found that the model seemed error-prone and couldn’t match what the company had demonstrated. The coming days will prove critical in determining whether developers embrace M1 or respond more tepidly.
Geopolitical and national security concerns have also lessened the enthusiasm of some Western businesses to deploy Chinese-developed AI models. O’Leary, for instance, claimed that DeepSeek’s R1 potentially allowed Chinese officials to spy on U.S. users.
But few things win customers more than free access. Right now, those who want to try MiniMax’s M1 can do so for free through an API MiniMax runs. Developers can also download the entire model for free and run it on their own computing resources (although in that case, the developers have to pay for the compute time). If MiniMax’s capabilities are what the company claims, it will no doubt gain some traction.
The other big selling point for M1 is that it has a “context window” of 1 million tokens. A token is a chunk of data, equivalent to about three-quarters of one word of text, and a context window is the limit of how much data the model can use to generate a single response. One million tokens is equivalent to about seven or eight books or one hour of video content. The 1 million–token context window for M1 means it can take in more data than some of the top-performing models: OpenAI’s o3 and Anthropic’s Claude Opus 4, for example, both have context windows of only about 200,000 tokens. Gemini 2.5 Pro, however, also has a 1 million–token context window, and some of Meta’s open-source Llama models have context windows of up to 10 million tokens.