case-studies

star 23

End-to-end case studies showing how to implement the full training pipeline for different skill types. Covers three complete worked examples — tool-calling training, essay-style training, and agentic search (RAG agent) training — demonstrating dataset design, synthetic generation, validation, fine-tuning, evaluation, and iteration. Use when onboarding to the project, understanding how all components fit together, explaining the pipeline to others, or planning a new training capability. This skill is about UNDERSTANDING the system holistically — reference the other skills for specific CLI commands.

ProfSynapse By ProfSynapse schedule Updated 5/29/2026

name: case-studies description: End-to-end case studies showing how to implement the full training pipeline for different skill types. Covers three complete worked examples — tool-calling training, essay-style training, and agentic search (RAG agent) training — demonstrating dataset design, synthetic generation, validation, fine-tuning, evaluation, and iteration. Use when onboarding to the project, understanding how all components fit together, explaining the pipeline to others, or planning a new training capability. This skill is about UNDERSTANDING the system holistically — reference the other skills for specific CLI commands. allowed-tools: Read, Bash, Write, Grep, Glob

Case Studies: Implementing the Training Pipeline

Three end-to-end worked examples showing how to take a capability from concept to trained model.

Why Case Studies?

The other skills teach you how to use individual tools:

  • synthetic-data-generation — how to run SynthChat
  • fine-tuning — how to run trainers
  • evaluation — how to run evals
  • upload-deployment — how to ship models

This skill shows you how they all connect — the decisions, the iteration, and the order of operations that turn an idea into a trained capability.

The Three Case Studies

Case Study What It Teaches Reference
Tool Calling Structured output training — teaching a model to call APIs with correct syntax, context objects, and parameters reference/tool-calling-pipeline.md
Essay Style Creative output training — teaching a model to transform messy brainstorms into structured outlines with voice and personality reference/essay-style-pipeline.md
Agentic Search RAG agent training — teaching a model to search a corpus, select relevant documents, and answer grounded in sources reference/agentic-search-pipeline.md

The Universal Pipeline

All three case studies follow the same high-level pipeline, but diverge in dataset design and validation:

┌──────────────────────────────────────────────────────────┐
│  1. DEFINE THE CAPABILITY                                 │
│     What should the model do? What does good look like?   │
│     → Rubrics, schemas, specifications                    │
└────────────────────┬─────────────────────────────────────┘
                     │
                     ▼
┌──────────────────────────────────────────────────────────┐
│  2. CREATE TRAINING DATA                                  │
│     How do we generate enough high-quality examples?      │
│     → SynthChat scenarios, handcrafted seeds, self-play   │
└────────────────────┬─────────────────────────────────────┘
                     │
                     ▼
┌──────────────────────────────────────────────────────────┐
│  3. VALIDATE & IMPROVE                                    │
│     How do we ensure quality before training?             │
│     → Schema validation, rubric scoring, manual review    │
└────────────────────┬─────────────────────────────────────┘
                     │
                     ▼
┌──────────────────────────────────────────────────────────┐
│  4. TRAIN                                                 │
│     SFT first (learn the format), then KTO (learn         │
│     preferences), optionally GRPO (optimize rewards)      │
│     → Trainers with YAML config                           │
└────────────────────┬─────────────────────────────────────┘
                     │
                     ▼
┌──────────────────────────────────────────────────────────┐
│  5. EVALUATE                                              │
│     Does the model do what we trained it to do?           │
│     → Evaluator with YAML scenarios                       │
└────────────────────┬─────────────────────────────────────┘
                     │
                     ▼
┌──────────────────────────────────────────────────────────┐
│  6. ITERATE                                               │
│     What failed? Generate more data targeting weaknesses. │
│     → Failure analysis → targeted generation → retrain    │
└──────────────────────────────────────────────────────────┘

Key Design Principles

1. Schema-First, Not Example-First

Define what "correct" looks like before writing any training data. For tools, this means JSON schemas. For essays, this means rubrics. The schema is the source of truth — everything validates against it.

2. SFT Teaches Format, KTO Teaches Judgment

SFT (Supervised Fine-Tuning) teaches the model WHAT to do — tool call syntax, output structure, response format. KTO (Kahneman-Tversky Optimization) teaches the model WHICH responses are better — preferring clarification over reckless action, preferring concise outlines over bloated ones. Never try to teach both at once.

3. Paired Contrastive Examples

For KTO, every good example needs a realistic bad counterpart using the SAME user request. The bad example should be a plausible mistake, not garbage — wrong tool selected, missing context fields, overly verbose outline. This is what teaches the model judgment.

4. Validate Before You Train

Training on bad data is worse than not training at all. Every dataset goes through structural validation (schema checks) and quality validation (rubric scoring) before it touches a trainer.

5. Iterate on Failures

After evaluation, the failure analysis tells you exactly what to generate next. If the model keeps producing empty memory fields, make more examples that demonstrate rich session memory. If outlines are too long, add negative examples of bloated outlines.

Progressive Reference

Reference When to Load
Tool Calling Pipeline Understanding the full tools training journey — from schema to trained model
Essay Style Pipeline Understanding the full essay training journey — from brainstorm to outline model
Agentic Search Pipeline Understanding the full RAG agent training journey — from corpus to grounded-answer model
Pipeline Comparison Side-by-side comparison of how the pipelines differ at each stage

Cross-References to Other Skills

At each stage of the pipeline, you'll use tools documented in the other skills:

Pipeline Stage Skill to Reference
Generate data synethetic-data-generation
Validate data synethetic-data-generation (rubrics, validate command)
SFT / KTO / GRPO training fine-tuning
Evaluate model evaluation
Upload & deploy upload-deployment

Tips

  • Read the tool-calling case study first — it's the simpler, more mechanical pipeline
  • The essay case study shows how to adapt the pipeline for creative/subjective outputs
  • The agentic search case study shows how to train multi-step reasoning where tools are means to an end
  • All three pipelines use the same trainers, evaluator, and upload tools — only the data differs
  • When planning a new capability, map it to whichever case study is closer, then adapt
Install via CLI
npx skills add https://github.com/ProfSynapse/Synaptic-Tuner --skill case-studies
Repository Details
star Stars 23
call_split Forks 3
navigation Branch main
article Path SKILL.md
More from Creator