End-to-End Reinforcement Learning (RL) Training for Emerging Agentic Capabilities (Moonshot AI, Kimi-Researcher)

Kimi-Researcher is an advanced autonomous AI agent developed by Moonshot AI that excels in multi-turn search and complex reasoning tasks. It performs an average of 23 reasoning steps and explores over 200 URLs per task, achieving state-of-the-art results such as a Pass@1 score of 26.9% on the challenging Humanity’s Last Exam benchmark, significantly improving from an initial 8.6% score through end-to-end reinforcement learning (RL).

The model is built on an internal Kimi k-series foundation and trained entirely via end-to-end agentic RL, which allows it to learn planning, perception, and tool use holistically without relying on hand-crafted rules. It uses three main tools: a parallel, real-time internal search engine, a text-based browser for interactive web tasks, and a coding tool for automated code execution. This enables Kimi-Researcher to solve complex problems requiring multi-step planning and tool orchestration effectively.

Kimi-Researcher has demonstrated strong performance across multiple real-world benchmarks, including 69% pass@1 on xbench-DeepSearch, outperforming other models with search capabilities. It also excels in multi-turn search reasoning and factual question answering tasks.To train the agent, Moonshot AI developed a large, diverse, and high-quality dataset emphasizing tool-centric and reasoning-intensive tasks, generated through a fully automated pipeline ensuring accuracy and diversity. The training uses the REINFORCE algorithm with outcome rewards and gamma-decay to enhance stability and efficiency.

Currently, Kimi-Researcher is being gradually rolled out to users, enabling deep, comprehensive research on any topic within the platform. Moonshot AI plans to expand the agent’s capabilities and open-source both the base pretrained and reinforcement-learned models soon, aiming to evolve Kimi-Researcher into a versatile general-purpose agent capable of solving a wide range of complex tasks.

In summary, Kimi-Researcher represents a cutting-edge AI agent that combines powerful multi-step reasoning, extensive tool use, and end-to-end reinforcement learning to deliver state-of-the-art autonomous research and problem-solving capabilities.