Thinking Machines Lab, the AI research and product startup founded by former OpenAI CTO Mira Murati, has launched its inaugural research initiative focused on eliminating nondeterminism in large language models (LLMs). Announced on September 10, 2025, the lab released its first blog post on its new platform, Connectionism, titled “Defeating Nondeterminism in LLM Inference.” This work, authored by researcher Horace He, targets a core challenge in AI: the variability in model outputs even when given identical inputs, which has long been viewed as an inherent trait of modern LLMs.
The research identifies the primary source of this randomness in the orchestration of GPU kernels—small programs that execute computations on Nvidia chips during the inference phase, where users interact with models like ChatGPT. Subtle differences in how these kernels are stitched together, such as varying batch sizes or tile configurations in attention mechanisms, introduce inconsistencies. He proposes practical solutions, including updating the key-value (KV) cache and page tables before attention kernels to ensure uniform data layouts, and adopting consistent reduction strategies for parallelism. These tweaks aim to create “batch-invariant” implementations, making responses reproducible without sacrificing performance.
This breakthrough could have far-reaching implications. Consistent outputs would enhance user trust in AI for applications like customer service, scientific research, and enterprise tools, where predictability is crucial. It also promises to streamline reinforcement learning (RL) processes, turning “off-policy” RL—plagued by numeric discrepancies between training and inference—into more efficient “on-policy” training. Thinking Machines Lab plans to leverage this for customizing AI models for businesses, aligning with its mission to democratize advanced AI through open research and products.
Founded in February 2025, the lab has quickly assembled a powerhouse team of over 30 experts, including former OpenAI leaders like Barret Zoph (VP of Research), Lilian Weng (former VP), and OpenAI cofounder John Schulman. Backed by a record $2 billion seed round at a $12 billion valuation from investors such as Andreessen Horowitz, Nvidia, AMD, Cisco, and Jane Street, the startup emphasizes multimodality, adaptability, and transparency. Unlike the closed-door approaches of some rivals, Thinking Machines Lab commits to frequent publications of blog posts, papers, and code via Connectionism, fostering community collaboration.
Mira Murati, who teased the lab’s first product in July 2025 as a tool for researchers and startups building custom models, hinted it could incorporate these consistency techniques. While details remain under wraps, the product is slated for unveiling soon, potentially including significant open-source elements. The initiative has sparked excitement in the AI community, with Reddit discussions on r/singularity praising the lab’s talent pool and open ethos, though some question if it can truly differentiate from giants like OpenAI.
As AI adoption surges, Thinking Machines Lab’s focus on reliability positions it as a key innovator. By addressing nondeterminism, the lab not only tackles a technical hurdle but also paves the way for safer, more scalable AI deployment across industries. Future posts on Connectionism are expected to explore related topics, from kernel numerics to multimodal systems, reinforcing the lab’s role in advancing ethical and effective AI.
Leave a Reply