AI mastered language. The physical world is next

The next great leap in artificial intelligence will not come from better language models. It will come from machines that understand how the physical world works and how to control it.

The binding constraint on embodied AI isn’t compute or architecture. It’s a specific kind of data that barely exists.

What tipped Google’s hand is not what Genie does well, but what it compromises on: environments that last only a few minutes, noticeable latency, physics that behaves strangely. For now, these are acceptable limitations when the real purpose isn’t entertainment. Google told us explicitly that Genie 3 is “a key stepping stone on the path to AGI,” infrastructure for training SIMA, their generalist agent that needs endless diverse environments to learn navigation, object manipulation, and real-world physics. Spawning objects mid-session and changing environmental conditions on the fly isn’t a gaming feature. It’s a curriculum generator for reinforcement learning.

What Google has built is an environment factory, a system that collapses the months of hand-coding traditionally required to create training simulations into seconds of text prompting.

To understand why that distinction matters, zoom out. For all the upheaval of the digital revolution, remarkably little has changed about how we physically interact with reality. The leap from early desktop computing to the smartphone to the transformer architecture was enormous in terms of information flow. But we’re still mostly poking at glass screens.

Consider the squirrel outside your window, leaping branch to branch, adjusting mid-flight for wind and flex. It possesses an extraordinarily sophisticated internal model of physics: gravity, momentum, friction, and can plan complex action sequences. Yet it has no language. It simply knows, in the way that knowing existed long before describing ever could.

AI has ignored this kind of knowing almost entirely. Today’s large language models can write sonnets and debug code. But ask one to fold a towel and you’ll discover the gulf between knowing about the world and knowing how to act within it. Language is but a compression of human experience. Text captures only a thin slice of what we know.

But neither has solved the binding constraint: they don’t have the data to build agents.

Training an agent requires action-conditioned data. Not just what the world looked like, but what someone did and what happened next: observation, decision, action, consequence. The complete loop. The pivot to agents requires millions of hours of human decision-making captured at the source, frame-aligned with resulting state changes, self-selected for edge cases.

Games may be the unlikely answer. They provide complete records of human agency, every input logged and labeled, in environments that capture physics and decision-making under uncertainty. Millions of hours of human judgment, already digitized.

The deepest value isn’t physics. It’s human intuition. A physics engine models how a drone moves; it can’t model how a skilled operator reacts when surprised. In surgery, it’s the feel for how the tissue responds to the scalpel. Train on human decision-making and you capture expertise that can’t be described with words, only shown, felt.

Get this right and the consequences echo what software did to information.

When a machine can learn a manipulation task from hours of demonstration instead of months of programming, manufacturing economics flip. Small-batch production becomes viable. Custom goods cost what mass goods cost today. A master electrician’s lifetime of knowledge deploys in a thousand cities at once. The best surgeon’s judgment scales to rural hospitals that have no access today. The bottleneck was never scalpels. It was hands.

Agriculture, logistics, eldercare. Every domain where physical skill is scarce becomes a candidate for transformation. The common thread: expertise locked in individual bodies becomes transferable.

The digital revolution made information free. The world-model revolution will make capability free. I can’t think of a more consequential bet to make.

The opinions expressed in Fortune.com commentary pieces are solely the views of their authors and do not necessarily reflect the opinions and beliefs of Fortune.

source

AI mastered language. The physical world is next

Latest News

Chevron CEO says Venezuela must do more for oil industry revival

DOJ uses White House correspondents’ dinner shooting to pressure preservations to drop lawsuit over Trump’s $400 million ballroom

Another attack at ‘Hinckley Hilton’ raises new security concerns

Sergey Brin confronted Gavin Newsom at a treehouse party — then launched a political war

We influence 20 million users and is the number one business and technology news network on the planet

You Might Also Like

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.