name: lvms-ci:doctor argument-hint: [release1,release2,...] description: Analyze CI for LVMS periodic jobs and produce an HTML summary user-invocable: true allowed-tools: Skill, Bash, Read, Write, Glob, Grep, Agent
lvms-ci:doctor
Synopsis
/lvms-ci:doctor main
/lvms-ci:doctor 4.20,4.21,4.22,main
Description
Accepts a comma-separated list of release versions (or main), runs analysis for each release, and produces a single HTML summary file consolidating all results. Uses deterministic scripts for data collection, artifact download, aggregation, and HTML generation. LLM agents are used only for per-job root cause analysis.
Arguments
<ARGUMENTS>(required): Comma-separated list of release versions (e.g.,mainor4.20,4.21,4.22,main)
Work Directory
Compute once at the start by running date +%y%m%d and substituting into the path below. In all commands, replace <WORKDIR> with the computed path — do not use shell variables.
/tmp/lvm-operator-ci-claude-workdir.<YYMMDD>
Implementation Steps
Step 1: Prepare — Collect and Download All Artifacts
Goal: Deterministically collect all failed jobs and download their artifacts before any LLM analysis.
Actions:
Determine today's
<WORKDIR>by runningdate +%y%m%dand substituting into/tmp/lvm-operator-ci-claude-workdir.<YYMMDD>. Use this value in all subsequent commands.Run the prepare script:
bash plugins/lvms-ci/scripts/doctor.sh prepare --component lvm-operator --workdir <WORKDIR> <ARGUMENTS>The script deterministically:
- For each release: fetches failed periodic jobs, downloads artifacts, writes
<WORKDIR>/jobs/release-<version>-jobs.json - Outputs a JSON summary listing all releases, job counts, and file paths
- For each release: fetches failed periodic jobs, downloads artifacts, writes
Read the JSON output to know which releases have jobs to analyze and how many
Job JSON field names (use these exactly — do NOT guess alternatives like job_name):
job— full job namebuild_id— unique build identifierartifacts_dir— local path to downloaded artifactsurl— Prow job URLstatus— job result (failure,FAILURE,SUCCESS,PENDING)
Error Handling:
- If
<ARGUMENTS>is empty, show usage and stop - If a release has no failed jobs, its jobs JSON will be an empty array — skip analysis for that release
- If a release has an
"error"field in the JSON summary, data collection failed for that release — report the error to the user but continue with other releases
Step 2: Analyze Each Job Using /lvms-ci:prow-job
Goal: Get detailed root cause analysis for each failed job using pre-downloaded artifacts.
Actions:
Use the JSON summary output from Step 1 to build agent prompts. Do NOT read the job JSON files into the main conversation — the prepare script already printed all job details (artifacts_dir, build_id, job name) and agents receive artifacts_dir directly in their prompt.
For every failed job across all releases, launch a separate Agent (using the
Agenttool, NOT theSkilltool):Agent: subagent_type=general_purpose, prompt="Analyze this Prow job and save the report: 1. Run /lvms-ci:prow-job <ARTIFACTS_DIR> 2. After the analysis completes, save the FULL report output (including the --- STRUCTURED SUMMARY --- block) to: <WORKDIR>/jobs/release-<RELEASE>-job-<N>-<JOB_ID>.txt Use the Write tool to save the file. The file must contain the complete analysis report."Launch ALL agents in a single message as foreground agents (do NOT use
run_in_background). Foreground agents in the same message run concurrently — this is just as fast as background agents but keeps your turn active until all complete.Say "Analyzing N jobs in parallel..." in your message text alongside the Agent tool calls.
When all agents return, immediately proceed to Step 3 in the same turn. Do NOT stop or end your turn between Step 2 and Step 3.
Step 3: Finalize — Aggregate and Generate HTML Report
IMPORTANT: This step is MANDATORY. The task is incomplete without it. You MUST run this even if previous steps produced errors.
Goal: Deterministically aggregate results and generate the HTML report.
Actions:
Run the finalize script:
bash plugins/lvms-ci/scripts/doctor.sh finalize --component lvm-operator --workdir <WORKDIR> <ARGUMENTS>The script deterministically:
- Runs
aggregate.pyfor each release →summary.jsonfiles - Runs
create-report.py→report-lvm-operator-ci-doctor.html
- Runs
Report the script's output to the user
Step 4: Report Completion
Actions:
- Display the path to the generated HTML file
- Summarize: failed job counts per release
Example Output:
Summary:
Periodics:
Release main: 3 failed periodic jobs
Release 4.22: 0 failed periodic jobs
HTML report generated: <WORKDIR>/report-lvm-operator-ci-doctor.html
Examples
Example 1: Analyze Main Branch Only
/lvms-ci:doctor main
Example 2: Analyze Multiple Releases
/lvms-ci:doctor 4.20,4.21,4.22,main
Prerequisites
gsutilCLI must be installed for GCS access (uses anonymous access on public buckets)- Internet access to fetch job data from Prow/GCS
- Bash shell, Python 3
Related Skills
- lvms-ci:prow-job: Single job analysis (used by Step 2 agents)
Notes
- Deterministic scripts handle: data collection, artifact download, aggregation, HTML generation
- LLM agents handle: per-job root cause analysis (Step 2)
- All agents are launched in a single parallel wave
- The
preparescript downloads all artifacts upfront so prow-job agents use local paths (no redundant downloads) - The
finalizescript runs aggregation and HTML generation in one call - All intermediate files use prescribed filenames in
<WORKDIR>— no improvised names - The HTML report is self-contained (no external CSS/JS dependencies)
- If a release analysis fails, it is noted in the report but does not block other releases