name: payload-snapshot description: Snapshot OpenShift payload data (release controller, PR diffs, comments, CI jobs, JUnit results, regression tracking) to a local directory for offline analysis
Payload Snapshot
This skill downloads all data needed to analyze an OpenShift payload into a local directory tree. The resulting snapshot can be navigated entirely via file reads — no live API calls required during analysis.
When to Use This Skill
Use this skill when you need to:
- Analyze a rejected payload and want all data available locally before starting
- Create a reproducible snapshot of payload state at a point in time
- Track test failure regressions across multiple payloads
- Work offline or reduce API calls during payload analysis
Prerequisites
Python 3 (3.10 or later)
- Uses only standard library (no external dependencies)
GitHub CLI (
gh) — for PR diff, comment, and job data- Install:
brew install gh(macOS) or see https://cli.github.com - Authenticate:
gh auth login - Without
gh, release controller data is still fetched; PR data is skipped
- Install:
Google Cloud SDK (
gcloud) — for JUnit test result download- Install:
brew install google-cloud-sdk(macOS) or see https://cloud.google.com/sdk - Authenticate:
gcloud auth login - Without
gcloud, JUnit data is skipped; job directories still created
- Install:
Network access to:
*.ocp.releases.ci.openshift.org(release controller)api.github.com(viaghCLI)storage.googleapis.com(viagcloudCLI)
Implementation Steps
Step 1: Run the Snapshot Script
script_path="plugins/ci/skills/payload-snapshot/scripts/payload_snapshot.py"
# Snapshot a specific payload
python3 "$script_path" 4.22.0-0.nightly-2026-02-25-152806
# Custom output directory
python3 "$script_path" 4.22.0-0.nightly-2026-02-25-152806 --output-dir .work/snapshot
# Limit chain depth
python3 "$script_path" 4.22.0-0.nightly-2026-02-25-152806 --max-chain 5
# Skip JUnit download (faster, still generates job structure and summary)
python3 "$script_path" 4.22.0-0.nightly-2026-02-25-152806 --no-junit
The script will:
- Parse the payload tag to determine version, stream, and architecture
- Probe all available streams for the version (nightly, ci, across architectures)
- Chain backwards through previous payloads until finding one where all blocking jobs passed
- For each payload in the chain, download release controller data and the changelog (PR diff)
- Split jobs into blocking/informing directories with metadata and GCS browser links
- For each failed blocking job, download and parse JUnit XML test results
- For each failed blocking job, download build-log.txt from GCS and extract error/warning lines + log tail
- Track test failure regressions — when did each failure first appear?
- Track per-job failure streaks — consecutive failures, originating payload, failure pattern
- For each unique PR across all changelogs, fetch the git diff, comments, and CI jobs via
gh - Generate summary.json with comprehensive triage data, plus AGENTS.md/CLAUDE.md for agent orientation
Step 2: Navigate the Snapshot
The output directory is structured for easy navigation:
payload/
<version>/
<stream>/
summary.json # START HERE — full triage data
CLAUDE.md # Imports AGENTS.md for Claude Code
AGENTS.md # Dynamic snapshot orientation doc
streams.json # All streams for this version
<tag>/ # Each payload in the chain
payload.json # Release controller API response
changelog.json # PRs that changed vs. previous payload
regressions.json # Test failure regression tracking
jobs/
blocking/
<job-name>/
job.json # Job metadata (state, URLs, GCS link, retries)
build_log.json # Error/warning lines + log tail (failed only)
junit/ # Only for failed jobs
junit_operator.xml # CI phase results
junit-aggregated.xml # Aggregated jobs only
results.json # Parsed test failures (full output)
informing/
<job-name>/
job.json # Job metadata only (no JUnit/build log)
<component>/ # e.g., machine-config-operator
prs/
<pr_number>/
code.diff # Git diff of the PR
comments.json # PR comments and reviews
jobs.json # CI check runs
Step 3: Use the Data
Find failed blocking jobs (with streaks):
jq '.blocking_jobs.failed_jobs[] | {name, state, streak: .streak.streak_length, pattern: .streak.failure_pattern}' payload/<version>/<stream>/summary.json
Check test failures and when they started:
jq '.[] | {test: .test_name, first_failed: .first_failed_in, payloads: .payloads_failing, jobs: .jobs}' payload/<version>/<stream>/<tag>/regressions.json
List PRs in a payload:
jq '.changeLogJson.updatedImages[].commits[] | {component: .name, pr: .pullURL, subject: .subject}' payload/<version>/<stream>/<tag>/changelog.json
Read a specific PR's diff:
cat payload/<version>/<stream>/<tag>/<component>/prs/<number>/code.diff
Check JUnit failures for a specific job:
jq '.[].name' payload/<version>/<stream>/<tag>/jobs/blocking/<job-name>/junit/results.json
CLI Reference
python3 payload_snapshot.py <payload_tag> [OPTIONS]
Positional:
payload_tag Payload tag (e.g., 4.22.0-0.nightly-2026-02-25-152806)
Options:
--output-dir DIR Base output directory (default: payload)
--max-chain N Maximum backward chain depth (default: 20)
--workers N Parallel workers for API calls (default: 8)
--no-junit Skip JUnit download and regression tracking
Output Files
streams.json
Lists all available streams for the payload's version.
summary.json
Comprehensive stream-level triage data — start here. Contains:
- Payload metadata:
payload_tag,phase,release_url,architecture,stream,version - Chain data:
chain_length,baseline_tag,hours_since_baseline blocking_jobs.failed_jobs[]— detailed objects withname,state,prow_url,gcs_url, and relative pathjob_json. May include:rhcos_version,streak(withstreak_length,originating_payload,is_new_failure,failure_pattern),build_log_errors,test_failure_count, and relative pathsjunit_results,build_loginforming_jobs.failed_jobs[]— job name stringstest_failures.blocking[]—test_name,jobs,first_failed_in,payloads_failing,failure_message,failure_text(full, not truncated)payloads[]— per-payload entries withtag,phase, relative file paths, andprs[]with component/diff/comments paths
AGENTS.md / CLAUDE.md
Dynamic orientation document generated at snapshot time. Contains the specific payload tag, chain, failed jobs, file layout, key concepts, and summary.json schema. CLAUDE.md imports AGENTS.md via @AGENTS.md.
payload.json
Full release controller response including blockingJobs, informingJobs, and asyncJobs with their states, Prow URLs, and retry attempt URLs.
changelog.json
Release controller diff response with changeLogJson.updatedImages listing every PR that changed between this payload and its predecessor.
regressions.json
Per-payload regression tracking data. For each failing test in the target payload:
test_name: the failing testjobs: which jobs it fails infirst_failed_in: the earliest payload in the chain where it was failingpayloads_failing: how many consecutive payloads it has been failingfailure_message: the error messagefailure_text: full failure output
job.json
Per-job metadata including name, state, lifecycle (blocking/informing), Prow URL, GCS browser URL (gcs_url), retry count, whether it's an aggregated job, GCS bucket path, and rhcos_version.
The rhcos_version field is determined from the job name and OCP version:
rhcos9_10— heterogeneous cluster (mixed RHCOS 9 and 10 node pools)rhcos10— RHCOS 10 onlyrhcos9— RHCOS 9 (explicit)rhcos9-default— no explicit fragment; defaults to RHCOS 9 for OCP 4.x installs (including major upgrades to 5.x)rhcos10-default— no explicit fragment; defaults to RHCOS 10 for OCP 5.x fresh installs
build_log.json (failed blocking jobs only)
Extracted from build-log.txt in GCS (handles gzip decompression). Contains:
total_lines: total line count of the build logerror_warning_count: number of lines matching error/warning patternserror_warning_lines[]: each withline_numberandtexttail_start_line,tail_lines[]: last 20% of the log for context
results.json (in junit/ subdirectory)
Parsed JUnit test failures for a specific job. Only includes failed/error tests. For aggregated jobs, includes per-run pass/fail/skip data with Prow URLs for each run.
code.diff, comments.json, jobs.json
PR artifacts from GitHub (unchanged from previous version).
Chain Logic
The script chains backwards from the target payload until it finds a payload where all blocking jobs succeeded. This is stricter than the Accepted phase — a payload can be force-accepted with failed blocking jobs, which does not count as a stop point.
For terminal payloads (Accepted/Rejected), jobs showing Pending on the release controller are cross-checked against the actual Prow prowjob.json artifact to get their real state.
Aggregated Jobs
Aggregated jobs run the same underlying test multiple times with statistical analysis. The script:
- Detects aggregated jobs by the
aggregated-name prefix - Downloads
junit-aggregated.xmlwhich contains per-run pass/fail/skip data - Parses the YAML in
<system-out>to extract individual run URLs
Error Handling
- Tag not found: Exits with code 2 and a descriptive error
- Release controller unreachable: Exits with code 1
ghnot authenticated: Prints a warning and continues without PR datagcloudnot available: Prints a warning and skips JUnit download- Individual job/PR fetch failure: Logs a warning and continues
- Idempotent: Re-running skips files that already exist
Notes
- The script uses only Python standard library — no pip dependencies
- PR data is deduplicated across payloads — each PR is fetched once
- JUnit and build-log download are scoped to failed blocking jobs only (informing jobs get
job.jsonbut no JUnit or build log) - The
--workersflag controls parallelism for all subprocess calls (default 8) - Summary is always regenerated on re-run (not skipped like other files)
- Progress is printed to stderr; the script produces no stdout output
See Also
- Related Skill:
fetch-payloads(fetches recent payloads from the release controller) - Related Skill:
fetch-new-prs-in-payload(fetches PRs new in a specific payload) - Related Skill:
payload-analysis(analyzes a payload snapshot for revert candidates)