Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

Existing Large Language Model (LLM) agent frameworks face two significant challenges: high configuration costs and static capabilities. Building a high-quality agent often requires extensive manual effort in tool integration and prompt engineering, while deployed agents struggle to adapt to dynamic environments without expensive fine-tuning.

To address this, Tencent has introduced Youtu-Agent, a powerful and modular framework for building, running, and evaluating autonomous agents. It is designed with flexibility and extensibility in mind, allowing developers to easily create custom agents, tools, and environments. The framework’s design is centered on a clear separation of concerns, enabling robust and scalable agent development.

At a high level, The framework is a configurable agent system where an Agent (defined by an AgentConfig) operates within an Environment, uses Toolkits to perform actions, and is assessed through an Evaluation Framework. Key components include: 1. Configuration: YAML-based setup using Pydantic and Hydra to define agents, experiments, and components. 2. Agent Paradigms: Supports a SimpleAgent (single-agent ReAct loop) and an OrchestraAgent (multi-agent Plan-and-Execute with Planner, Workers, and Reporter). 3. Environments: Provide state and context, such as local filesystem access or web interaction. 4. Toolkits: Bundles of tools enabling capabilities like web search, file operations, code execution, and document analysis. 5. Evaluation Framework: Standardized benchmarking pipeline covering data management, processing, and automated execution and scoring.

Youtu-Agent demonstrates strong performance across multiple benchmarks, achieving state-of-the-art results on WebWalkerQA (71.47%) and GAIA (72.8%) with open-weight models. Its automated tool-generation pipeline reaches over 81% success, while the Practice module boosts AIME 2024 and 2025 performance by 2.7% and 5.4%.

Paper : Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization