HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

llm
research paper
Author

Santosh Sawant

Published

September 26, 2024

Large Language Models (LLMs) have revolutionized software engineering (SE), demonstrating remarkable capabilities in various coding tasks. While recent efforts have produced autonomous software agents based on LLMs for end-to-end development tasks, these systems are typically designed for specific SE tasks.

Researchers have now introduced HyperAgent, a novel generalist multi-agent system designed to address a wide spectrum of SE tasks across different programming languages by mimicking human developers’ workflows.

The HyperAgent framework comprising four specialized agents—Planner, Navigator, Code Editor, and Executor—HyperAgent manages the full lifecycle of SE tasks, from initial conception to final verification.

  1. Planner: Acts as the central decision-maker, processing task prompts and generating strategies. It coordinates the activities of the other agents and iteratively refines plans until tasks are completed or a limit is reached.
  2. Navigator: Focuses on fast information retrieval from codebases, using IDE-like tools to quickly address challenges in private or unfamiliar repositories.
  3. Editor: Handles code modification and generation across files. It creates and applies code patches based on input from the Planner, using various editing tools.
  4. Executor: Validates solutions and reproduces issues, managing environment setup and testing through an interactive shell and access to documentation. Together, these agents streamline task management, code navigation, editing, and validation processes.

Through extensive evaluations, HyperAgent achieves state-of-the-art performance across diverse SE tasks: it attains a 25.01% success rate on SWE-Bench-Lite and 31.40% on SWE-Bench-Verified for GitHub issue resolution, surpassing existing methods. Furthermore, HyperAgent demonstrates superior performance in code generation at repository scale (RepoExec), and in fault localization and program repair (Defects4J), often outperforming specialized systems.

Paper : https://arxiv.org/pdf/2409.16299