ai-evaluation-metrics

Here are 9 public repositories matching this topic...

Vvkmnn / awesome-ai-eval

☑️ A curated list of tools, methods & platforms for evaluating AI reliability in real applications.

Updated Feb 12, 2026

Comprehensive AI Model Evaluation Framework with advanced techniques including Temperature-Controlled Verdict Aggregation via Generalized Power Mean. Support for multiple LLM providers and 15+ evaluation metrics for RAG systems and AI agents.

ai-evaluation llm-evaluation ai-evaluation-tools ai-evaluation-metrics aieval ai-evaluation-framework

Updated Jan 22, 2026
Python

firstlinesoftware / eval-ai-library

Star

Comprehensive AI Evaluation Framework with advanced techniques including Temperature-Controlled Verdict Aggregation via Generalized Power Mean. Support for multiple LLM providers and 15+ evaluation metrics for RAG systems and AI agents.

ai-evaluation llm-evaluation ai-evaluation-tools ai-evaluation-metrics aieval ai-evaluation-framework

Updated Jan 7, 2026
Python

hparreao / Awesome-AI-Evaluation-Guide

Star

A comprehensive, implementation-focused guide to evaluating Large Language Models, RAG systems, and Agentic AI in production environments.

awesome gpt evaluation-metrics evaluation-framework awesome-lists claude ai-evaluation large-language-models llm agentic-ai ai-evaluation-tools ai-evaluation-metrics ai-evaluation-framework

Updated Dec 5, 2025

syamsasi99 / prompt-evaluator

Star

prompt-evaluator is an open-source toolkit for evaluating, testing, and comparing LLM prompts. It provides a GUI-driven workflow for running prompt tests, tracking token usage, visualizing results, and ensuring reliability across models like OpenAI, Claude, and Gemini.

electron react typescript datascience developer-tools ai-evaluation llm prompt-engineering prompt-testing promptfoo ai-evaluation-tools ai-evaluation-metrics ai-evaluation-framework

Updated Dec 4, 2025
TypeScript

AGBAJEMUH / Awesome-AI-Evaluation-Guide

Star

🤖 Evaluate AI systems effectively with our comprehensive guide to methods, tools, and frameworks for assessing Large Language Models and agents.

awesome gpt evaluation-metrics evaluation-framework claude ai-evaluation large-language-models llm agentic-ai ai-evaluation-tools ai-evaluation-metrics ai-evaluation-framework

Updated Feb 24, 2026

PocketNugget / Coherence-assessment-of-generated-realistic-images

Star

Dataset of 4,368 AI-generated images based on COCO for assessing coherence and realism in synthetic imagery.

machine-learning computer-vision synthetic-data coco-dataset ai-generated-images open-dataset image-coherence-assessment realistic-image-generation ai-evaluation-metrics image-realism

Updated Oct 24, 2024

ZhaoJackson / AI_Response_Evaluation_Benchmark

Star

AI evaluation tool with suicidal prevention with automatic database for reinforcement learning with ethical alignment, inclusivity, complexity, and sentiment.

inclusivity flask benchmark automation web-development reinforcement-learning drupal rest-api ethics suicide-prevention ai-evaluation-metrics

Updated Oct 23, 2025
Python

epagecareers / hallucination-audit

Star

Configurable evidence-alignment engine for AI and news evaluation using user-defined trusted sources.

python cli information-retrieval governance fact-checking bm25 ai-evaluation ai-evaluation-tools ai-evaluation-metrics

Updated Feb 16, 2026
Python

Improve this page

Add a description, image, and links to the ai-evaluation-metrics topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-evaluation-metrics topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-evaluation-metrics

Here are 9 public repositories matching this topic...

Vvkmnn / awesome-ai-eval

meshkovQA / Eval-ai-library

firstlinesoftware / eval-ai-library

hparreao / Awesome-AI-Evaluation-Guide

syamsasi99 / prompt-evaluator

AGBAJEMUH / Awesome-AI-Evaluation-Guide

PocketNugget / Coherence-assessment-of-generated-realistic-images

ZhaoJackson / AI_Response_Evaluation_Benchmark

epagecareers / hallucination-audit

Improve this page

Add this topic to your repo