dataset-annotation - SKILL.md Agent Skill

name: dataset-annotation description: "AI-assisted dataset annotation with COCO export — bbox, SAM2, DINOv3 methods" version: 1.0.0

parameters:

name: method label: "Annotation Method" type: select options: ["bbox", "sam2", "dinov3"] default: "dinov3" group: Annotation
name: export_format label: "Export Format" type: select options: ["coco", "yolo", "voc"] default: "coco" group: Export
name: auto_detect label: "Auto-detect Before Annotation" type: boolean default: true description: "Run detection first, then human corrects" group: Annotation
name: detection_model label: "Detection Model" type: select options: ["yolov8n", "yolov11n", "dinov3"] default: "yolov8n" group: Annotation
name: dataset_dir label: "Dataset Directory" type: string default: "~/datasets" group: Storage

capabilities: annotation: script: scripts/annotate.py description: "Dataset annotation with AI assistance and COCO export"

Dataset Annotation

AI-assisted dataset creation for training custom detection models. Supports three annotation methods with COCO format export.

What You Get

BBox annotation — draw bounding boxes, AI auto-suggests
SAM2 annotation — click to segment, get pixel-perfect masks
DINOv3 annotation — click a patch, find similar objects across frames via visual grounding
Object tracking — annotate keyframes, DINOv3 interpolates across the video
COCO export — standard images[], annotations[], categories[] format
Kaggle/HuggingFace upload — push datasets directly to platforms

Annotation Loop

1. Feed frames from clips → auto-detect objects
2. Human reviews → corrects bboxes, adds labels
3. Save as COCO dataset
4. Train improved model
5. Repeat with better auto-detection

Protocol

Aegis → Skill (stdin)

{"event": "frame", "camera_id": "...", "frame_path": "/tmp/frame.jpg", "frame_number": 0, "width": 1920, "height": 1080}
{"event": "detections", "frame_number": 0, "detections": [{"class": "person", "bbox": [100, 50, 200, 350], "confidence": 0.9, "track_id": "t1"}]}
{"event": "save_dataset", "name": "front_door_people", "format": "coco"}

Skill → Aegis (stdout)

{"event": "ready", "methods": ["bbox", "sam2", "dinov3"], "export_formats": ["coco", "yolo", "voc"]}
{"event": "annotation", "frame_number": 0, "annotations": [{"category": "person", "bbox": [100, 50, 200, 350], "track_id": "t1", "is_keyframe": true}]}
{"event": "dataset_saved", "format": "coco", "path": "~/datasets/front_door_people/", "stats": {"images": 150, "annotations": 423, "categories": 5}}

Setup

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt