name: benchmark-logging description: Define benchmark runs and log outcomes with consistent metrics, acceptance criteria, and reproducible artifact references.
Benchmark Logging
Use this skill to run and document benchmark comparisons between sciClaw and baseline workflows.
When to use
- "run benchmark"
- "compare baseline vs sciclaw"
- "log benchmark outcomes"
- "add acceptance criteria"
Minimum benchmark record
- Benchmark ID and date.
- Task category and scenario definition.
- Baseline command sequence.
- sciClaw command sequence.
- Metrics: task success, reproducibility, latency, and resource usage.
- Acceptance decision (pass/fail) with rationale.
Workflow
- Freeze scenario definitions before running.
- Execute baseline and sciClaw runs with the same inputs.
- Record metric values and artifact paths.
- Log failures with root-cause notes and retry policy.
- Add manuscript-ready summary sentences only after data is logged.