Exclusive: A former Apple engineer thinks AI infrastructure is built for the wrong future. Investors just gave him $80 million to fix it

For months, Kleiner Perkins partner Aditya Naganath had been mulling over his investing thesis that the next wave of AI wasn’t going to be a chatbot—it was going to be software that does the work autonomously, for hours at a time, across thousands of tasks at once. The trouble was, nobody had built the plumbing for it yet. Then he met Neil Movva.

“It felt obvious to both of us that you’re going to need a different, specific inference platform built for these long-running agents,” Naganath told Fortune.

Now, six months after Naganath and Movva first chatted, Movva’s startup, Sail Research, has launched from stealth with $80 million in seed and Series A funding at a $450 million valuation, Fortune learned exclusively. Kleiner Perkins led the Series A. Sequoia, Redpoint, Theory Ventures, Vine Ventures, and CRV also participated.

Movva’s solution is an end-to-end infrastructure platform built from the lowest level of the chip up. Sail writes the software that orchestrates and optimizes how AI models run on existing chips. Think of it like a highly efficient traffic system that tells the hardware exactly how to allocate its resources, squeezing far more work out of the same physical computing power.

Most AI serving platforms optimize for low latency, meaning they prioritize getting you an answer fast. Sail does the opposite, sacrificing real-time responsiveness to pack far more computing work into every unit of power. The tradeoff is deliberate: Sail can’t power a voice assistant or a live chatbot. But for agents that run for hours? Movva claims customers often seen between 3x to 10x cost improvements over comparable alternatives.

“We only care about efficiency,” Movva told Fortune. “It’s quite difficult to build an inference engine for both throughput and latency at the same time. Everyone else is optimizing for latency, and we just care about throughput.”

Co-founder and CTO Samir Menon also comes from Apple, where he worked in security engineering at scale. The two met on the first day of freshman year at Stanford—they took the same classes, and saw the same academic counselor. Movva jokes that Menon got slightly better grades. They reunited in late 2025 to rebuild the inference stack from scratch.

Sail launched its inference service in March and has already ramped to processing trillions of tokens per week. One early customer, Detail.dev, uses Sail to run code-review agents that spend three to four hours—sometimes longer—digging through an entire codebase hunting for bugs that five-minute reviews miss. “The abundance of tokens that we provide lets them be maximally ambitious in how they scan through code bases,” Movva said.

Movva’s counter: token prices have been flat or rising for six months, demand for compute is growing faster than supply, and the world needs someone focused obsessively on squeezing the most intelligence out of every available GPU. “We feel an emotional pain when we see a GPU be idle or wasted in any way,” he said.

Naganath’s bull case is simple: “The belief that inference is going to be a 10x—even 100x—bigger market than it is today.”

source

Exclusive: A former Apple engineer thinks AI infrastructure is built for the wrong future. Investors just gave him $80 million to fix it

Latest News

What bubble? JPMorgan says the $5.5 trillion AI capex explosion is profitable–for now

Bill Ackman, David Tepper, and other billionaire fund managers are quietly piling into Amazon

Rich consumers taking GLP-1s are rebuying their wardrobes and eating smaller, fancier dishes—it’s a factor saving the luxury sector right now

Reddit COO targets 1 billion users as internet’s ‘odd duck’ aims for new heights

We influence 20 million users and is the number one business and technology news network on the planet

You Might Also Like

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.