llm-evaluation

star 444

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

Dokhacgiakhoa

By Dokhacgiakhoa schedule Updated 2/11/2026

play_arrow Run Skill in Manus View GitHub

version: 4.1.0-fractal name: llm-evaluation description: Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

LLM Evaluation

Master comprehensive evaluation strategies for LLM applications, from automated metrics to human evaluation and A/B testing.

Do not use this skill when

The task is unrelated to llm evaluation
You need a different domain or tool outside this scope

Instructions

Clarify goals, constraints, and required inputs.
Apply relevant best practices and validate outcomes.
Provide actionable steps and verification.
If detailed examples are required, open resources/implementation-playbook.md.

Use this skill when

Measuring LLM application performance systematically
Comparing different models or prompts
Detecting performance regressions before deployment
Validating improvements from prompt changes
Building confidence in production systems
Establishing baselines and tracking progress over time
Debugging unexpected model behavior

Core Evaluation Types

🧠 Knowledge Modules (Fractal Skills)

1. 1. Automated Metrics

2. 2. Human Evaluation

3. 3. LLM-as-Judge

4. BLEU Score

5. ROUGE Score

6. BERTScore

7. Custom Metrics

8. Single Output Evaluation

9. Pairwise Comparison

10. Annotation Guidelines

11. Inter-Rater Agreement

12. Statistical Testing Framework

13. Regression Detection

14. Running Benchmarks

Install via CLI

npx skills add https://github.com/Dokhacgiakhoa/antigravity-ide --skill llm-evaluation

Repository Details

star Stars 444

call_split Forks 137

navigation Branch main

article Path SKILL.md

More from Creator

Dokhacgiakhoa

Dokhacgiakhoa Explore all skills →