ci-workflow-guide

name: ci-workflow-guide description: Guide to SGLang CI workflow orchestration — stage ordering, fast-fail, gating, partitioning, execution modes, and debugging CI failures. Use when modifying CI workflows, adding stages, debugging CI pipeline issues, or understanding how tests are dispatched and gated across stages.

SGLang CI Workflow Orchestration Guide

This skill covers the CI infrastructure layer — how tests are dispatched, gated, and fast-failed across stages. For test authoring (templates, fixtures, registration, model selection), see the write-sglang-test skill.

Naming Conventions

Suite: base-{a,b,c}-test-{gpu_count}-gpu-{hardware} (e.g., base-b-test-1-gpu-small)
Test group: Directory-level registered test group under test/registered/ (e.g., hicache maps to test/registered/hicache/test_*.py)
CI runner: {gpu_count}-gpu-{hardware} (e.g., 1-gpu-5090, 4-gpu-h100, 8-gpu-h200)

Key Files

File	Role
`.github/workflows/pr-test.yml`	Main workflow — all stages, jobs, conditions, matrix definitions
`.github/workflows/pr-test-extra.yml`	Extra workflow — gated by BOTH `run-ci` and `run-ci-extra` labels
`.github/workflows/pr-gate.yml`	PR gating: draft check, `run-ci` label, per-user rate limiting
`.github/actions/check-pr-test-health/action.yml`	Cross-job fast-fail: queries API for any failed job
`.github/actions/wait-for-jobs/action.yml`	Stage gating: polls API until stage jobs complete
`.github/actions/check-maintenance/action.yml`	Maintenance mode check
`test/run_suite.py`	Suite runner: collects, filters, partitions, executes tests
`python/sglang/test/ci/ci_register.py`	Test registration (AST-parsed markers), LPT auto-partition
`python/sglang/test/ci/ci_utils.py`	`run_unittest_files()`: execution, retry, continue-on-error
`scripts/ci/utils/slash_command_handler.py`	Handles slash commands from PR comments

Architecture Overview

 ┌──────────────┐
 │ build kernel │
 └──────┬───────┘
        │
        ├─ check-changes ──── detects which packages changed
        │                      (main_package, sgl_kernel, jit_kernel, multimodal_gen)
        │
        ├─ call-gate ──────── pr-gate.yml (draft? label? rate limit?)
        │
        ├─────────────────────────────────────────────────────┐
        │                                                     │
        ▼                                                     │
 ┌─────────────────────────────────────┐                      │
 │          Base A (~3 min)            │                      │
 │         pre-flight check            │                      │
 │                                     │                      │
 │  ┌─────────────────────────────┐    │                      │
 │  │ base-a-test-1-gpu-small    │    │                      │
 │  │ (small GPUs)                │    │                      │
 │  └─────────────────────────────┘    │                      │
 │  ┌─────────────────────────────┐    │                      │
 │  │ base-a-test-cpu            │    │                      │
 │  │ (CPU)                       │    │                      │
 │  └─────────────────────────────┘    │                      │
 └──────┬──────────────────────────────┘                      │
        │                                                     │
        ▼                                                     ▼
 ┌─────────────────────────────────────┐          ┌──────────────────────────┐
 │          Base B (~30 min)           │          │      kernel test         │
 │            base tests               │          └──────────────────────────┘
 │                                     │          ┌──────────────────────────┐
 │  ┌─────────────────────────────┐    │          │   multimodal gen test    │
 │  │ base-b-test-1-gpu-small    │    │          └──────────────────────────┘
 │  │ (small GPUs, e.g. 5090)     │    │
 │  └─────────────────────────────┘    │
 │  ┌─────────────────────────────┐    │
 │  │ base-b-test-1-gpu-large    │    │
 │  │ (large GPUs, e.g. H100)     │    │
 │  └─────────────────────────────┘    │
 │  ┌─────────────────────────────┐    │
 │  │ base-b-test-2-gpu-large    │    │
 │  │ (large GPUs, e.g. H100)     │    │
 │  └─────────────────────────────┘    │
 └──────┬──────────────────────────────┘
        │
        ▼
 ┌─────────────────────────────────────┐
 │          Base C (~30 min)           │
 │          advanced tests             │
 │                                     │
 │  ┌─────────────────────────────┐    │
 │  │ base-c-test-4-gpu-h100     │    │
 │  │ (H100 GPUs)                 │    │
 │  └─────────────────────────────┘    │
 │  ┌─────────────────────────────┐    │
 │  │ base-c-test-8-gpu-h200     │    │
 │  │ (8 x H200 GPUs)             │    │
 │  └─────────────────────────────┘    │
 │  ┌─────────────────────────────┐    │
 │  │ base-c-test-4-gpu-b200     │    │
 │  │ (4 x B200 GPUs)             │    │
 │  └─────────────────────────────┘    │
 │  ┌─────────────────────────────┐    │
 │  │ Other advanced tests        │    │
 │  │ (DeepEP, PD Disagg, GB300)  │    │
 │  └─────────────────────────────┘    │
 └──────┬──────────────────────────────┘
        │
        ▼
 ┌─────────────────────────────────────┐
 │         pr-test-finish              │
 │  aggregates all results, fails if   │
 │  any job failed/cancelled           │
 └─────────────────────────────────────┘

Every stage test job includes a check-pr-test-health step after checkout — if any job in the run has already failed, the job fast-fails (red X) with a root cause annotation.

Scheduled runs skip wait-for-base-* jobs, running all stages in parallel. Fast-fail is also disabled.

Fast-Fail Layers

4 layers of fast-fail, from fine to coarse:

Layer	Mechanism	Granularity	Disabled on schedule?
1. Test method → file	`unittest -f` (failfast)	One test method fails → entire test file stops immediately	Yes
2. File → suite	`run_unittest_files()` default	One test file fails → entire suite stops (`--continue-on-error` off)	Yes
3. Job → job (same stage)	`check-pr-test-health` action	One job fails → other waiting jobs in same stage fast-fail (red X)	Yes
4. Stage → stage (cross-stage)	`wait-for-base-*` + `needs`	Base A fails → base B/C jobs skip entirely (never get a runner)	Yes (wait jobs skipped)

Layer 1: -f flag appended to all python3 -m pytest / unittest invocations in ci_utils.py
Layer 2: --continue-on-error flag in run_suite.py — off for PRs, on for scheduled runs
Layer 3: check-pr-test-health auto-detects schedule event and skips; filters out cascade failures to show only root cause jobs
Layer 4: wait-for-base-* jobs are conditioned on github.event_name == 'pull_request' — skipped for scheduled runs

Execution Modes

Aspect	PR (`pull_request`)	Scheduled (`cron`, every 6h)	`/rerun-stage` (`workflow_dispatch`)
Stage ordering	Sequential: A → B → C via `wait-for-base-*`	Parallel (all at once)	Single target stage only
Cross-job fast-fail	Yes (`check-pr-test-health`)	Yes	Yes
continue-on-error	No (stop at first failure within suite)	Yes (run all tests)	No
Retry	Enabled	Enabled	Enabled
max_parallel	3 (default), 14 if `high priority` label	14	3 (default), 14 if `high priority`
PR gate	Yes (draft, label, rate limit)	Skipped	Skipped
Concurrency	`cancel-in-progress: true` per branch	Queue (no cancel)	Isolated per stage+SHA

Stage Gating (`wait-for-jobs` action)

wait-for-base-a and wait-for-base-b are lightweight ubuntu-latest jobs that poll the GitHub Actions API.

How it works:

Calls listJobsForWorkflowRun to list all jobs in the current run
Matches jobs by exact name or prefix (for matrix jobs, e.g., base-b-test-1-gpu-small (3))
If any matched job has conclusion === 'failure' → fail immediately (fast-fail)
If all matched jobs are completed and count matches expected_count → success
Otherwise → sleep poll-interval-seconds (default: 60s) and retry
Timeout after max-wait-minutes (240 min for base-a, 480 min for base-b)

Job specs example (base-b):

[
  {"prefix": "base-b-test-1-gpu-small", "expected_count": 8},
  {"prefix": "base-b-test-1-gpu-large", "expected_count": 14},
  {"prefix": "base-b-test-2-gpu-large", "expected_count": 4},
  {"prefix": "base-b-test-4-gpu-b200", "expected_count": 1}
]

Critical: expected_count must match the matrix size. If you add/remove matrix entries, update the wait job's spec accordingly.

PR only: Condition github.event_name == 'pull_request' && !inputs.target_stage — scheduled runs and /rerun-stage skip these entirely, allowing parallel execution.

Cross-Job Fast-Fail (`check-pr-test-health` action)

Composite action called after checkout in every stage test job (21 jobs total across pr-test.yml, pr-test-multimodal-gen.yml, pr-test-sgl-kernel.yml, pr-test-jit-kernel.yml).

How it works:

Queries listJobsForWorkflowRun for the current workflow run
Filters for root cause failures only — jobs with conclusion === 'failure' whose failing step is NOT check-pr-test-health (excludes cascade failures)
If root cause failures found → calls core.setFailed() with the list of root cause job names
If none → does nothing (step succeeds)

Cascade filtering: When job A fast-fails due to health check, it also has conclusion: failure. Without filtering, job B would list both the original failure AND job A's fast-fail. The filter checks each failed job's steps array — if the failing step name contains check-pr-test-health or Check PR test health, it's excluded from the root cause list.

Usage pattern:

steps:
  - name: Checkout code
    uses: actions/checkout@v4
    ...

  - uses: ./.github/actions/check-pr-test-health
    id: pr-test-health

  - name: Install dependencies        # skipped automatically if health check failed
    ...                                # (default if: success() is false)

  - name: Run test                     # also skipped
    ...

Visual effect: Job shows red X (failure) with error annotation showing root cause job names. Subsequent steps are naturally skipped (default if: success() is false after a failed step). No per-step if guards needed.

No stage filtering: Checks ALL jobs in the run, not just the current stage. Any failure anywhere triggers fast-fail.

Error message example:

Fast-fail: skipping — root cause job(s): base-b-test-1-gpu-small (0), base-b-test-1-gpu-small (1)

Within-Suite Failure Handling

Controlled by run_unittest_files() in python/sglang/test/ci/ci_utils.py.

Flags

Flag	PR default	Scheduled default	Effect
`--continue-on-error`	Off	On	Off: stop at first failure. On: run all files, report all failures at end
`--enable-retry`	On	On	Retry retriable failures (accuracy/perf assertions)
`--max-attempts`	2	2	Max attempts per file including initial run

Retry Classification

When a test fails and retry is enabled, the output is classified:

Non-retriable (checked first — real code errors): SyntaxError, ImportError, ModuleNotFoundError, NameError, TypeError, AttributeError, RuntimeError, CUDA out of memory, OOM, Segmentation fault, core dumped, ConnectionRefusedError, FileNotFoundError

Retriable (accuracy/performance): AssertionError with comparison patterns (not greater than, not less than, not equal to), accuracy, score, latency, throughput, timeout

Default: Unknown AssertionError → retriable. Other unknown failures → not retriable.

How `continue_on_error` is set

In pr-test.yml's check-changes job:

schedule runs or run_all_tests flag → continue_on_error = 'true'
PR runs → continue_on_error = 'false'

Each test job propagates via:

env:
  CONTINUE_ON_ERROR_FLAG: ${{ needs.check-changes.outputs.continue_on_error == 'true' && '--continue-on-error' || '' }}
run: |
  python3 run_suite.py --hw cuda --suite <name> $CONTINUE_ON_ERROR_FLAG

Test Partitioning

Large suites are split across matrix jobs using the LPT (Longest Processing Time) heuristic in ci_register.py:auto_partition():

Sort tests by est_time descending, filename as tie-breaker (deterministic)
Greedily assign each test to the partition with smallest cumulative time
Result: roughly equal total time per partition

Partition table (CUDA per-commit suites):

Suite	Partitions	Runner	max_parallel
`base-a-test-1-gpu-small`	1 (no matrix)	`1-gpu-5090`	—
`base-a-test-cpu`	4	`ubuntu-latest`	—
`base-b-test-1-gpu-small`	8	`1-gpu-5090`	8
`base-b-test-1-gpu-large`	14	`1-gpu-h100`	dynamic (3 or 14)
`base-b-test-2-gpu-large`	4	`2-gpu-h100`	—
`base-b-test-4-gpu-b200`	1 (no matrix)	`4-gpu-b200`	—
`base-b-kernel-unit-1-gpu-large`	1 (no matrix)	`1-gpu-h100`	—
`base-b-kernel-unit-1-gpu-b200`	1 (no matrix)	`4-gpu-b200`	—
`base-b-kernel-unit-8-gpu-h200`	1 (no matrix)	`8-gpu-h200`	—
`base-b-kernel-benchmark-1-gpu-large`	1 (no matrix)	`1-gpu-h100`	—
`base-c-test-4-gpu-h100`	3	`4-gpu-h100`	—
`base-c-test-8-gpu-h200`	4	`8-gpu-h200`	—
`base-c-test-8-gpu-h20`	2	`8-gpu-h20`	—
`base-c-test-deepep-4-gpu-h100`	1 (no matrix)	`4-gpu-h100`	—
`base-c-test-4-gpu-b200`	3	`4-gpu-b200`	—
`base-c-test-4-gpu-b200-small`	3	`4-gpu-b200-low-disk`	—
`base-c-test-8-gpu-b200`	registered only	`8-gpu-b200`	—
`base-c-test-4-gpu-gb200`	registered only	`4-gpu-gb200`	—

Note: Kernel suites (base-b-kernel-*) run via pr-test-jit-kernel.yml and pr-test-sgl-kernel.yml, not the main pr-test.yml. base-c-test-8-gpu-b200 is registered in test/run_suite.py but not wired to PR CI. The GB200 job is currently commented out in pr-test.yml until a company-owned runner is provisioned. Multimodal diffusion uses python/sglang/multimodal_gen/test/run_suite.py, not test/run_suite.py.

Workflow usage:

strategy:
  matrix:
    partition: [0, 1, 2, 3, 4, 5, 6, 7]
steps:
  - run: python3 run_suite.py --hw cuda --suite base-b-test-1-gpu-small \
           --auto-partition-id ${{ matrix.partition }} --auto-partition-size 8

check-changes Job

Determines which test suites to run based on file changes.

Detection Methods

Trigger	Method	Details
`pull_request`	`dorny/paths-filter`	Detects changes via GitHub diff
`workflow_dispatch` (with `pr_head_sha`)	GitHub API	`repos/{repo}/compare/main...{sha}`
`schedule` / `run_all_tests`	Force all true	Runs everything

Output Flags

Output	Triggers
`main_package`	Base A/B/C test suites
`sgl_kernel`	Kernel wheel builds + kernel test suites; also switches B200 jobs to kernel-build runner labels outside `target_stage` mode
`jit_kernel`	JIT kernel test workflow
`multimodal_gen`	Multimodal-gen test workflow

Note: In target_stage mode, sgl_kernel is only active when include_wheel_build=true. Without that opt-in, kernel-change reruns fail validation instead of running a target stage without freshly built wheels. Outside target_stage, sgl_kernel=true switches B200 jobs from 4-gpu-b200 / 4-gpu-b200-low-disk to 4-gpu-b200-kernel / 4-gpu-b200-kernel-low-disk.

Concurrency Control

group: pr-test-{event_name}-{branch}-{pr_sha}-{stage}

Segment	Source	Purpose
`event_name`	`github.event_name`	Prevents scheduled runs colliding with fork PRs named `main`
`branch`	`github.head_ref \|\| github.ref_name`	Per-branch isolation
`pr_sha`	`inputs.pr_head_sha \|\| 'current'`	Isolates `/rerun-stage` from main runs
`stage`	`inputs.target_stage \|\| 'all'`	Allows parallel stage dispatches

cancel-in-progress: true for pull_request events (new push cancels old run), false for workflow_call.

How To: Add a New Stage Job

Define the job in pr-test.yml with needs: [check-changes, call-gate, wait-for-base-X, ...]
Copy the if: condition pattern from an existing same-stage job (handles target_stage, schedule, main_package)
Add checkout step
Add check-pr-test-health step (after checkout) — if any prior job failed, core.setFailed() fires and all subsequent steps auto-skip via default if: success()
Add check-maintenance step
Add download-artifact step if sgl_kernel changed
Add install dependencies step
Add run test step with $CONTINUE_ON_ERROR_FLAG
Add upload-cuda-coredumps step with if: always()
Register the suite name in PER_COMMIT_SUITES in test/run_suite.py
If using matrix, add --auto-partition-id and --auto-partition-size to the run command
Update wait-for-base-X job spec with the new job name and expected_count (if matrix)
Add the job to pr-test-finish.needs list

How To: Debug CI Failures

Symptom	Likely cause	What to check
All stage-B/C jobs green but steps skipped	Earlier job failed, `check-pr-test-health` triggered	Find the actual failed job (red X)
`wait-for-base-b` timeout	`expected_count` doesn't match matrix size	Verify job spec counts match `matrix:` array length
`pr-test-finish` fails but all jobs green	A job was `cancelled` (counts as failure in finish)	Check concurrency cancellation
Tests pass locally but fail in CI	Partition assignment, runner GPU type, or `est_time` inaccuracy	Check which partition the test lands in; verify runner label
Flaky test retried and passed	Retriable failure (accuracy/perf)	Check `[CI Retry]` markers in job logs
Flaky test NOT retried	Matched non-retriable pattern	Check if error matches `NON_RETRIABLE_PATTERNS` in `ci_utils.py`

Slash Commands

Command	Effect
`/tag-run-ci-label`	Adds `run-ci` label to PR
`/tag-run-ci-label extra`	Adds both `run-ci` and `run-ci-extra` labels
`/rerun-failed-ci`	Reruns failed jobs in the latest workflow run
`/tag-and-rerun-ci`	Adds `run-ci` label + reruns failed
`/tag-and-rerun-ci extra`	Adds both `run-ci` and `run-ci-extra` labels + reruns failed
`/rerun-stage <stage>`	Deprecated; posts deprecation notice
`/rerun-test <test-file> [<test-file> ...]`	Reruns specific test file(s) via `rerun-test.yml`. A file arg containing a glob metacharacter (``, `?`, `[...]`) expands against `test/registered/` and the multimodal test dir to every matching `test_.py` (e.g. `/rerun-test test_backend.py` — wrap in backticks so GitHub doesn't italicize the `*`); matches are deduped, grouped by dispatch shape, and can't carry a `::test` selector. No match → single ⛔ reply, nothing dispatched. Each reply echoes its originating command (`Results for …`) so concurrent commands stay distinguishable
`/rerun-group <group> [<group> ...]`	Expands registered test groups, then reuses `/rerun-test`

Handled by scripts/ci/utils/slash_command_handler.py → .github/workflows/slash-command-handler.yml.

Label-gated workflow dispatch (pr-test, pr-test-extra)

pr-test.yml and pr-test-extra.yml both listen for pull_request.labeled (in addition to opened/synchronize/reopened). The check-changes.if gate has two clauses:

For labeled events: the just-added label must be one of the gating labels (run-ci for pr-test, run-ci or run-ci-extra for pr-test-extra) — otherwise every unrelated label addition would dispatch a full CI run.
All events: the PR must currently carry the required labels.

This is what lets /tag-run-ci-label (and the extra variant) trigger a fresh CI run without an extra push.

Caveat — skipped runs cannot be un-skipped by run.rerun(): GitHub's rerun API reuses the original event payload, so rerunning a pull_request-event run that was skipped because of missing labels will skip again (label set in the frozen payload doesn't update). The only way to recover a label-skipped run is to add the missing label, which fires a fresh labeled event with the current label set. handle_rerun_failed_ci in the slash handler is for rerunning failed/non-label-skipped runs; it cannot revive label-skipped ones.