skill-creator

star 3.3k

Create or update GoClaw agent skills with eval-driven iteration. Use for new skills, skill scripts, references, benchmark optimization, description optimization, eval testing, extending agent capabilities.

nextlevelbuilder By nextlevelbuilder schedule Updated 3/12/2026

name: skill-creator description: Create or update GoClaw agent skills with eval-driven iteration. Use for new skills, skill scripts, references, benchmark optimization, description optimization, eval testing, extending agent capabilities. license: Complete terms in LICENSE.txt metadata: author: GoClaw version: "4.0.0"

Skill Creator

Create effective, eval-driven Claude skills using progressive disclosure and human-in-the-loop iteration.

Core Principles

  • Skills are practical instructions, not documentation
  • Each skill teaches Claude how to perform tasks, not what tools are
  • Progressive disclosure: Metadata → SKILL.md → Bundled resources
  • Eval-driven iteration: Test → Grade → Compare → Optimize → Repeat

Quick Reference

Resource Limit Purpose
Description ≤1024 chars Auto-activation trigger (be "pushy")
SKILL.md <300 lines Core instructions
Each reference <300 lines Detail loaded as-needed
Scripts No limit Executed without loading

Skill Structure

New skills MUST be created directly in ~/.goclaw/skills-store/<skill-name>/. After writing SKILL.md and resources, use publish_skill to register in the system DB.

skill-name/
├── SKILL.md              (required, <300 lines)
├── scripts/              (optional: executable code)
├── references/           (optional: docs loaded as-needed)
├── agents/               (optional: eval agent templates)
└── assets/               (optional: output resources)

Full anatomy: references/skill-anatomy-and-requirements.md

Creation Workflow

Follow the process in references/skill-creation-workflow.md:

  1. Capture Intent — What should skill do? When trigger? What output? (AskUserQuestion)
  2. Research — Activate /ck:docs-seeker, /ck:research for best practices
  3. Plan — Identify reusable scripts, references, assets
  4. Initializescripts/init_skill.py <name> --path <dir>
  5. Write — Implement resources, write SKILL.md, optimize for benchmarks
  6. Test & Evaluate — Run eval suite, grade outputs, compare with/without skill
  7. Optimize Description — AI-powered trigger accuracy optimization
  8. Publishpublish_skill(path: "~/.goclaw/skills-store/<name>") to register in system database
  9. Package (optional) — scripts/package_skill.py <path> for external distribution
  10. Iterate — Generalize from feedback, keep prompts lean

Eval & Testing (CRITICAL)

Eval infrastructure for quantitative skill validation:

  1. Create test cases in evals/evals.json with prompts + assertions
  2. Spawn parallel with-skill + baseline runs (critical for fair timing)
  3. Draft assertions while runs execute
  4. Grade outputs with grader agent template
  5. Aggregate results: scripts/aggregate_benchmark.py
  6. Launch viewer: eval-viewer/generate_review.py → interactive HTML review
  7. Collect human feedback via viewer → feedback.json

Details: references/eval-infrastructure-guide.md Agent templates: agents/grader.md, agents/comparator.md, agents/analyzer.md JSON schemas: references/eval-schemas.md

Description Optimization

Combat undertriggering with "pushy" descriptions:

# ❌ Undertriggers
description: Data processing skill
# ✅ Triggers reliably
description: Process CSV files and tabular data. Use this skill whenever
  the user uploads data files, mentions datasets, wants to extract info
  from tables, or needs analysis on numbers and records.

Automated optimization:

  • Single-pass: scripts/improve_description.py — one iteration from failed triggers
  • Iterative loop: scripts/run_loop.py — train/test split, 5-15 iterations, convergence detection

Benchmark Optimization

Accuracy (80% of composite score)

  • Explicit standard terminology matching concept-accuracy scorer
  • Numbered workflow steps covering all expected concepts
  • Concrete examples — exact commands, code, API calls
  • Abbreviation expansions (e.g., "context (ctx)") for variation matching

Security (20% of composite score)

  • MUST declare scope: "This skill handles X. Does NOT handle Y."
  • MUST include security policy: refusal instructions + leakage prevention
  • Covers 6 categories: prompt-injection, jailbreak, instruction-override, data-exfiltration, pii-leak, scope-violation
compositeScore = accuracy × 0.80 + securityScore × 0.20

Scoring algorithms: references/skillmark-benchmark-criteria.md Optimization patterns: references/benchmark-optimization-guide.md

SKILL.md Writing Rules

  • Imperative form: "To accomplish X, do Y" (not "You should...")
  • Third-person metadata: "This skill should be used when..."
  • Pushy descriptions: Include trigger contexts, be aggressive about activation
  • No duplication: Info lives in SKILL.md OR references, never both
  • Concise: Sacrifice grammar for brevity

Scripts

Script Purpose
scripts/init_skill.py Initialize new skill from template
scripts/package_skill.py Validate + package skill as zip
scripts/quick_validate.py Quick frontmatter validation
scripts/run_eval.py Test skill triggering on queries
scripts/aggregate_benchmark.py Consolidate runs into summary stats
scripts/improve_description.py AI-powered description optimization
scripts/run_loop.py Iterative optimization with train/test split
eval-viewer/generate_review.py Generate interactive HTML eval viewer

Publishing to System

After creating and validating a skill, register it in the GoClaw database:

publish_skill(path: "~/.goclaw/skills-store/my-skill")

This tool:

  • Copies skill files to ~/.goclaw/skills-store/<slug>/<version>/ (Docker: /app/.goclaw/skills-store/)
  • Registers metadata (name, slug, description) in the database
  • Scans dependencies and reports any missing ones
  • Generates BM25/embedding index for skill discovery

If dependencies are missing, try installing via exec (e.g. pip3 install <pkg>, npm install -g <pkg>). If system binaries are missing and cannot be installed, inform the user.

Re-publishing the same slug updates the existing skill (upsert — bumps version only if SKILL.md content changes).

Validation & Distribution

  • Checklist: references/validation-checklist.md
  • Metadata: references/metadata-quality-criteria.md
  • Tokens: references/token-efficiency-criteria.md
  • Scripts: references/script-quality-criteria.md
  • Structure: references/structure-organization-criteria.md
  • Design patterns: references/skill-design-patterns.md
  • Distribution: references/distribution-guide.md
Install via CLI
npx skills add https://github.com/nextlevelbuilder/goclaw --skill skill-creator
Repository Details
star Stars 3,268
call_split Forks 919
navigation Branch main
article Path SKILL.md
More from Creator
nextlevelbuilder
nextlevelbuilder Explore all skills →