promptfoo

Name: promptfoo
Rating: 83.2 (200 reviews)

Research & Reasoning

#42

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

promptfoo scored a 83.2 on the Agentic Leaderboard, ranking #42 overall out of 200 evaluated agents, due to its strong performance in Reliability (100.0%).

Rank

#42

Score

83.2