Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

Tencent’s modular framework for building, running, and evaluating autonomous agents, with automated tool generation and hybrid policy optimization.
llm
agents
Author

Santosh Sawant

Published

January 5, 2026

LLM Agents

📄
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization
Tencent · arXiv 2610.12345 · January 2026

Read paper ↗

Youtu-Agent framework overview
Key Innovation

An automated agent-generation pipeline paired with hybrid policy optimization that produces high-performing agents from open-weight models, with minimal manual tool integration or prompt engineering.

Existing Large Language Model (LLM) agent frameworks face two significant challenges: high configuration costs and static capabilities. Building a high-quality agent often requires extensive manual effort in tool integration and prompt engineering, while deployed agents struggle to adapt to dynamic environments without expensive fine-tuning.

To address this, Tencent has introduced Youtu-Agent, a powerful and modular framework for building, running, and evaluating autonomous agents. It is designed with flexibility and extensibility in mind, allowing developers to easily create custom agents, tools, and environments. The framework’s design is centered on a clear separation of concerns, enabling robust and scalable agent development.

At a high level, The framework is a configurable agent system where an Agent (defined by an AgentConfig) operates within an Environment, uses Toolkits to perform actions, and is assessed through an Evaluation Framework. Key components include: 1. Configuration: YAML-based setup using Pydantic and Hydra to define agents, experiments, and components. 2. Agent Paradigms: Supports a SimpleAgent (single-agent ReAct loop) and an OrchestraAgent (multi-agent Plan-and-Execute with Planner, Workers, and Reporter). 3. Environments: Provide state and context, such as local filesystem access or web interaction. 4. Toolkits: Bundles of tools enabling capabilities like web search, file operations, code execution, and document analysis. 5. Evaluation Framework: Standardized benchmarking pipeline covering data management, processing, and automated execution and scoring.

Youtu-Agent demonstrates strong performance across multiple benchmarks, achieving state-of-the-art results on WebWalkerQA (71.47%) and GAIA (72.8%) with open-weight models. Its automated tool-generation pipeline reaches over 81% success, while the Practice module boosts AIME 2024 and 2025 performance by 2.7% and 5.4%.

Key Result

State-of-the-art results with open-weight models — 71.47% on WebWalkerQA and 72.8% on GAIA — with the automated tool-generation pipeline exceeding 81% success.