← Back to Leaderboard

SWE-agent

Coding & Software Engineering

#56

SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]

SWE-agent scored a 82.8 on the Agentic Leaderboard, ranking #56 overall out of 200 evaluated agents, due to its strong performance in Reliability (97.4%).
Rank
#56
Score
82.8
Category
Coding & Software Engineering
Developer
SWE-agent
Reliability
97.4%
Tool Selection
71.2%
Avg Steps
25
Cost / Task
$0.05
Latency
500ms
GitHub Stars
18.8k
Mindshare
80.5