Skip to content

Evaluation

Articles

libraries

  • vibrantlabsai/ragas - Supercharge Your LLM Application Evaluations
  • promptfoo/promptfoo - Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI
  • confident-ai/deepeval - The LLM Evaluation Framework
  • AgentEvalHQ/AgentEval - AgentEval is the comprehensive .NET toolkit for AI agent evaluation—tool usage validation, RAG quality metrics, stochastic evaluation, and model comparison—built first for Microsoft Agent Framework (MAF) and Microsoft.Extensions.AI. What RAGAS, PromptFoo and DeepEval do for Python, AgentEval does for .NET
  • elbruno/elbruno-ai-evaluation - AI Testing & Observability Toolkit for .NET - deterministic evaluators, synthetic data, golden datasets, regression detection