benchmark-and-docs-refresh

star 5.8k

Run or continue model benchmarks, collect measured results, and refresh README/docs benchmark sections from generated artifacts. Use when benchmark tables in model docs need to be created, updated, or corrected.

open-edge-platform

By open-edge-platform schedule Updated 4/10/2026

play_arrow Run Skill in Manus View GitHub

name: benchmark-and-docs-refresh description: Run or continue model benchmarks, collect measured results, and refresh README/docs benchmark sections from generated artifacts. Use when benchmark tables in model docs need to be created, updated, or corrected.

Benchmark and Docs Refresh

Use this skill to update benchmark sections in model documentation from real benchmark outputs.

Scope

This skill focuses on:

running or continuing benchmarks
collecting benchmark CSV results from results/
updating benchmark tables in model READMEs
updating matching docs pages when benchmark status changes

It does not own sample image export. Use model-sample-image-export for that.

Request changes when

incomplete benchmark coverage is presented;
README or docs benchmark status drifts from the actual run state.

Preferred Benchmark Workflow

Always prefer:

tools/experimental/benchmarking/benchmark.py

with an appropriate config file.

If the stock benchmark path is insufficient for a specific model:

derive a small helper script from the benchmark workflow
keep it model-specific unless multiple models clearly need the same pattern
save measurable outputs such as CSV files under results/

Required Evidence

Only publish benchmark values when they come from actual artifacts, for example:

results/<model>_benchmark.csv
benchmark-generated CSV files under runs/ or results/
model-specific run outputs that clearly record the measured metrics

Never infer missing values.

Update Rules

When refreshing benchmark tables:

Read the target README and matching docs page first.
Read the benchmark artifact source.
Fill only the shot-settings and metrics that actually exist.
Leave unavailable rows blank or TODO.
Update status wording if the benchmark is still partial or still running.

Table Conventions

Common sections to refresh:

### Image-Level AUC
### Pixel-Level AUC
### Image F1 Score
### Pixel F1 Score

If a README only contains placeholders, replace only the rows supported by measured results.

Docs Synchronization Rules

If the README benchmark state changes, update the matching docs page under:

docs/source/markdown/guides/reference/models/image/<model>.md
docs/source/markdown/guides/reference/models/video/<model>.md

The docs page may stay shorter than the README, but it must not contradict it.

Quality Checks

Before finishing:

Confirm the benchmark artifact still exists.
Confirm copied values exactly match the artifact.
Confirm averages are computed from measured values only.
Confirm incomplete rows remain clearly incomplete.
Confirm README/docs wording matches reality.

Reviewer checklist

Check that the artifact exists.
Check that every copied value matches.
Check that partial runs are labeled clearly.
Check README and docs wording for consistency.

Repo-Specific Notes

Some benchmark jobs in this repo may require derived helper scripts.
Some long runs are better continued in tmux/background sessions.
A benchmark can be complete enough to fill a subset of rows without justifying all rows.
Never replace TODOs with fabricated numbers.

Install via CLI

npx skills add https://github.com/open-edge-platform/anomalib --skill benchmark-and-docs-refresh

Repository Details

star Stars 5,843

call_split Forks 942

navigation Branch main

article Path SKILL.md

Occupations

Software Developers

More from Creator

open-edge-platform

open-edge-platform Explore all skills →