For months, Kleiner Perkins partner Aditya Naganath had been mulling over his investing thesis that the next wave of AI wasn’t going to be a chatbot—it was going to be software that does the work autonomously, for hours at a time, across thousands of tasks at once. The trouble was, nobody had built the plumbing for it yet. Then he met Neil Movva.
“It felt obvious to both of us that you’re going to need a different, specific inference platform built for these long-running agents,” Naganath told Fortune.
Now, six months after Naganath and Movva first chatted, Movva’s startup, Sail Research, has launched from stealth with $80 million in seed and Series A funding at a $450 million valuation, Fortune learned exclusively. Kleiner Perkins led the Series A. Sequoia, Redpoint, Theory Ventures, Vine Ventures, and CRV also participated.
Movva’s solution is an end-to-end infrastructure platform built from the lowest level of the chip up. Sail writes the software that orchestrates and optimizes how AI models run on existing chips. Think of it like a highly efficient traffic system that tells the hardware exactly how to allocate its resources, squeezing far more work out of the same physical computing power.
Most AI serving platforms optimize for low latency, meaning they prioritize getting you an answer fast. Sail does the opposite, sacrificing real-time responsiveness to pack far more computing work into every unit of power. The tradeoff is deliberate: Sail can’t power a voice assistant or a live chatbot. But for agents that run for hours? Movva claims customers often seen between 3x to 10x cost improvements over comparable alternatives.
“We only care about efficiency,” Movva told Fortune. “It’s quite difficult to build an inference engine for both throughput and latency at the same time. Everyone else is optimizing for latency, and we just care about throughput.”
Co-founder and CTO Samir Menon also comes from Apple, where he worked in security engineering at scale. The two met on the first day of freshman year at Stanford—they took the same classes, and saw the same academic counselor. Movva jokes that Menon got slightly better grades. They reunited in late 2025 to rebuild the inference stack from scratch.
Sail launched its inference service in March and has already ramped to processing trillions of tokens per week. One early customer, Detail.dev, uses Sail to run code-review agents that spend three to four hours—sometimes longer—digging through an entire codebase hunting for bugs that five-minute reviews miss. “The abundance of tokens that we provide lets them be maximally ambitious in how they scan through code bases,” Movva said.
Movva’s counter: token prices have been flat or rising for six months, demand for compute is growing faster than supply, and the world needs someone focused obsessively on squeezing the most intelligence out of every available GPU. “We feel an emotional pain when we see a GPU be idle or wasted in any way,” he said.
Naganath’s bull case is simple: “The belief that inference is going to be a 10x—even 100x—bigger market than it is today.”



