benchmark-logging

star 83

Define benchmark runs and log outcomes with consistent metrics, acceptance criteria, and reproducible artifact references.

drpedapati

By drpedapati schedule Updated 2/13/2026

play_arrow Run Skill in Manus View GitHub

name: benchmark-logging description: Define benchmark runs and log outcomes with consistent metrics, acceptance criteria, and reproducible artifact references.

Benchmark Logging

Use this skill to run and document benchmark comparisons between sciClaw and baseline workflows.

When to use

"run benchmark"
"compare baseline vs sciclaw"
"log benchmark outcomes"
"add acceptance criteria"

Minimum benchmark record

Benchmark ID and date.
Task category and scenario definition.
Baseline command sequence.
sciClaw command sequence.
Metrics: task success, reproducibility, latency, and resource usage.
Acceptance decision (pass/fail) with rationale.

Workflow

Freeze scenario definitions before running.
Execute baseline and sciClaw runs with the same inputs.
Record metric values and artifact paths.
Log failures with root-cause notes and retry policy.
Add manuscript-ready summary sentences only after data is logged.

Install via CLI

npx skills add https://github.com/drpedapati/sciclaw --skill benchmark-logging

Repository Details

star Stars 83

call_split Forks 16

navigation Branch main

article Path SKILL.md

Occupations

Statisticians 152041

More from Creator

drpedapati

drpedapati Explore all skills →