← Back to Leaderboard

promptfoo

Research & Reasoning

#42

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

promptfoo scored a 83.2 on the Agentic Leaderboard, ranking #42 overall out of 200 evaluated agents, due to its strong performance in Reliability (100.0%).
Rank
#42
Score
83.2
Category
Research & Reasoning
Developer
promptfoo
Reliability
100%
Tool Selection
69.2%
Avg Steps
25
Cost / Task
$0.05
Latency
500ms
GitHub Stars
17.5k
Mindshare
79.8