benchmark-readme-sync - SKILL.md Agent Skill

name: "benchmark-readme-sync" description: "Use when a user wants to refresh benchmark results in `README.md` from the latest successful GitHub Actions `Benchmark` workflow run for the current repository. Use a robust `gh` and shell workflow to find the newest successful run, extract final Markdown tables from job logs, and update the README run URL, date, versions, and tables."

Benchmark README Sync

When to use

Update README.md benchmark results from GitHub Actions.
Replace stale benchmark tables, versions, run links, or dates.
The user does not need to provide an Actions URL.

Workflow

Read README.md and .github/workflows/benchmark.yml to confirm the benchmark cases and the current results layout.
Resolve the canonical GitHub repository and default branch with gh repo view, then find the latest successful Benchmark workflow run on that branch unless the user gave a specific run ID.
List the jobs for that run and map each matrix case to its job ID before touching README.md.
For each case job, extract only the final benchmark tables from the log using the robust shell pipeline below.
Update README.md carefully:
- Replace the GitHub Actions run URL with the latest run URL.
- Replace the date with today's local date.
- Replace each case table with the extracted table from the matching job.
- Preserve the case heading, prose, command block, and the final --- separator before ## Run locally.
- Do not recalculate rankings or rewrite the numbers manually.
Validation is required after the edit:
- Every case in the workflow matrix is represented in README.md.
- No duplicated headings, tables, or rows were introduced.
- The separator before ## Run locally is still present.
- The diff only changes benchmark data, the run URL, and the date.

Commands

Prefer gh because it is authenticated and exposes both run metadata and logs. Use these to execute or debug the workflow manually.

Resolve the canonical repository and default branch:

repo=$(gh repo view --json nameWithOwner --jq .nameWithOwner)
branch=$(gh repo view --json defaultBranchRef --jq .defaultBranchRef.name)

Find the latest successful benchmark run:

gh run list \
  --workflow Benchmark \
  --branch "$branch" \
  --limit 20 \
  --json databaseId,conclusion,url,createdAt \
  --jq 'map(select(.conclusion == "success")) | sort_by(.createdAt) | last' \
  -R "$repo"

Fetch jobs for a run:

gh api repos/<owner>/<repo>/actions/runs/<run_id>/jobs --paginate

Map case names to job IDs:

gh api repos/<owner>/<repo>/actions/runs/<run_id>/jobs --paginate \
  | jq -r '.jobs[] | [.id, .name] | @tsv'

Extract a job's final benchmark tables:

gh run view <run_id> --job <job_id> --log \
  -R <owner>/<repo> \
  | cut -f3- \
  | perl -pe 's/\e\[[0-9;]*[A-Za-z]//g' \
  | sed -E 's/^\xef\xbb\xbf//; s/^[0-9T:.\-]+Z //' \
  | awk 'BEGIN{capture=0} /^Development metrics:$/ || /^Build metrics:$/ {capture=1} capture && $0!="Post job cleanup." {print} $0=="Post job cleanup." {exit}'

Notes:

Do not rely on the second log column being Run Benchmark. Current gh run view --log output may label lines as UNKNOWN STEP, while the third column still contains the benchmark output you need.
Capture starts at the first Development metrics: or Build metrics: heading so preamble noise is excluded.
Stop at Post job cleanup. so cleanup lines do not leak into the tables.
Keep both Development metrics and Build metrics when present, and only Build metrics for build-only cases.
Prefer replacing one case section at a time or using a temporary one-off local command; do not add repository scripts just to complete a single sync.
The brittle part of the edit is preserving section boundaries, especially the final --- before ## Run locally.

Failure handling

If no successful Benchmark run exists, stop and report that blocker.
If a job log is missing or truncated, stop and report which case could not be extracted.
If the workflow matrix and README sections do not match, update only the cases backed by logs and call out the mismatch clearly.
If README.md already points to the latest successful run and the extracted tables match, the expected result is an empty diff.
If the workflow breaks down, include the failing step in the report: repo resolution, run lookup, job mapping, log extraction, README replacement, or structural validation.