We are going Agentless 🥷
So essentially,
We don’t need an “Agents” paradigm to solve a problem
Paper:
AGENTLESS: Demystifying LLM-based Software Engineering Agents
Github:
https://github.com/OpenAutoCoder/Agentless
Researchers from University of Illinois Urbana-Champaign are interested in applying LLMs to more complex, repository-level software engineering tasks. These tasks such as bug fixing, feature addition, or test generation require a deeper understanding of code beyond simple, self-contained problems demanding understanding code within extensive files and repository-level dependencies across multiple files.
Hmm..What’s the background?
Recent advancements in large language models (LLMs) have significantly propelled the automation of software development tasks. LLMs like GPT-4 and Claude-3.5 have shown remarkable abilities in synthesizing code snippets from user descriptions.
The emergence of the SWE-bench benchmark, particularly SWE-bench Lite, offers a standardized way to evaluate tools designed to automatically address real-world software engineering problems. To tackle the challenges presented by SWE-bench, the field has seen a surge in agent-based approaches. These approaches equip LLMs with tools and enable them to autonomously plan and execute actions, receive feedback, and adapt their strategies iteratively.
However, this paper argues that the current capabilities of LLMs may not yet warrant the complexity of agent-based systems. The authors question the necessity of such intricate agent-based approaches for software development.
Ok, So what is proposed in the research paper?
The paper advocates for a shift away from complex, autonomous LLM agents and proposes AGENTLESS, an agentless system for solving software development problems. This system employs a straightforward two-phase process: localization and repair.
Hierarchical Localization: AGENTLESS uses a three-step hierarchical process to pinpoint the location of code requiring modification. It first narrows down the potential files, then identifies relevant classes and functions within those files, and finally, pinpoints specific edit locations.
Simplified Repair with Search/Replace Diff: Instead of generating entire code blocks, AGENTLESS utilizes a Search/Replace diff format for repair
Filtering and Ranking Patches: AGENTLESS generates multiple candidate patches and filters out those with syntax errors or those that fail regression tests
To address the limitations of SWE-bench Lite, the paper introduces SWE-bench Lite-S, a refined subset that removes problematic problems. This new benchmark aims to provide a more rigorous and reliable evaluation platform for future research.
AGENTLESS achieves a 27.33% success rate in resolving the problems posed by the benchmark. Notably, this performance surpasses all other open-source solutions evaluated on the same benchmark.
Furthermore, the authors find that AGENTLESS operates at a significantly lower cost, averaging $0.34 per problem, compared to the more expensive agent-based approaches.
What’s next?
In terms of future work, the authors propose several avenues for improving AGENTLESS and, more broadly, advancing the field of agentless software development:
AGENTLESS currently utilizes a majority voting system to select the best patch from a pool of candidates, there could be better strategies
Future iterations of AGENTLESS could benefit from incorporating robust code search capabilities, allowing the system to better tackle these more complex challenges
They propose working with the maintainers of SWE-bench Lite to rectify these issues, contributing to the creation of a more robust and reliable benchmark for evaluating software development tools
So essentially,
We don’t need an “Agents” paradigm to solve a problem