name: ci-workflow-guide description: Guide to SGLang CI workflow orchestration — stage ordering, fast-fail, gating, partitioning, execution modes, and debugging CI failures. Use when modifying CI workflows, adding stages, debugging CI pipeline issues, or understanding how tests are dispatched and gated across stages.
SGLang CI Workflow Orchestration Guide
This skill covers the CI infrastructure layer — how tests are dispatched, gated, and fast-failed across stages. For test authoring (templates, fixtures, registration, model selection), see the write-sglang-test skill.
Naming Conventions
- Suite:
base-{a,b,c}-test-{gpu_count}-gpu-{hardware}(e.g.,base-b-test-1-gpu-small) - Test group: Directory-level registered test group under
test/registered/(e.g.,hicachemaps totest/registered/hicache/test_*.py) - CI runner:
{gpu_count}-gpu-{hardware}(e.g.,1-gpu-5090,4-gpu-h100,8-gpu-h200)
Key Files
| File | Role |
|---|---|
.github/workflows/pr-test.yml |
Main workflow — all stages, jobs, conditions, matrix definitions |
.github/workflows/pr-test-extra.yml |
Extra workflow — gated by BOTH run-ci and run-ci-extra labels |
.github/workflows/pr-gate.yml |
PR gating: draft check, run-ci label, per-user rate limiting |
.github/actions/check-pr-test-health/action.yml |
Cross-job fast-fail: queries API for any failed job |
.github/actions/wait-for-jobs/action.yml |
Stage gating: polls API until stage jobs complete |
.github/actions/check-maintenance/action.yml |
Maintenance mode check |
test/run_suite.py |
Suite runner: collects, filters, partitions, executes tests |
python/sglang/test/ci/ci_register.py |
Test registration (AST-parsed markers), LPT auto-partition |
python/sglang/test/ci/ci_utils.py |
run_unittest_files(): execution, retry, continue-on-error |
scripts/ci/utils/slash_command_handler.py |
Handles slash commands from PR comments |
Architecture Overview
┌──────────────┐
│ build kernel │
└──────┬───────┘
│
├─ check-changes ──── detects which packages changed
│ (main_package, sgl_kernel, jit_kernel, multimodal_gen)
│
├─ call-gate ──────── pr-gate.yml (draft? label? rate limit?)
│
├─────────────────────────────────────────────────────┐
│ │
▼ │
┌─────────────────────────────────────┐ │
│ Base A (~3 min) │ │
│ pre-flight check │ │
│ │ │
│ ┌─────────────────────────────┐ │ │
│ │ base-a-test-1-gpu-small │ │ │
│ │ (small GPUs) │ │ │
│ └─────────────────────────────┘ │ │
│ ┌─────────────────────────────┐ │ │
│ │ base-a-test-cpu │ │ │
│ │ (CPU) │ │ │
│ └─────────────────────────────┘ │ │
└──────┬──────────────────────────────┘ │
│ │
▼ ▼
┌─────────────────────────────────────┐ ┌──────────────────────────┐
│ Base B (~30 min) │ │ kernel test │
│ base tests │ └──────────────────────────┘
│ │ ┌──────────────────────────┐
│ ┌─────────────────────────────┐ │ │ multimodal gen test │
│ │ base-b-test-1-gpu-small │ │ └──────────────────────────┘
│ │ (small GPUs, e.g. 5090) │ │
│ └─────────────────────────────┘ │
│ ┌─────────────────────────────┐ │
│ │ base-b-test-1-gpu-large │ │
│ │ (large GPUs, e.g. H100) │ │
│ └─────────────────────────────┘ │
│ ┌─────────────────────────────┐ │
│ │ base-b-test-2-gpu-large │ │
│ │ (large GPUs, e.g. H100) │ │
│ └─────────────────────────────┘ │
└──────┬──────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Base C (~30 min) │
│ advanced tests │
│ │
│ ┌─────────────────────────────┐ │
│ │ base-c-test-4-gpu-h100 │ │
│ │ (H100 GPUs) │ │
│ └─────────────────────────────┘ │
│ ┌─────────────────────────────┐ │
│ │ base-c-test-8-gpu-h200 │ │
│ │ (8 x H200 GPUs) │ │
│ └─────────────────────────────┘ │
│ ┌─────────────────────────────┐ │
│ │ base-c-test-4-gpu-b200 │ │
│ │ (4 x B200 GPUs) │ │
│ └─────────────────────────────┘ │
│ ┌─────────────────────────────┐ │
│ │ Other advanced tests │ │
│ │ (DeepEP, PD Disagg, GB300) │ │
│ └─────────────────────────────┘ │
└──────┬──────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ pr-test-finish │
│ aggregates all results, fails if │
│ any job failed/cancelled │
└─────────────────────────────────────┘
Every stage test job includes a check-pr-test-health step after checkout — if any job in the run has already failed, the job fast-fails (red X) with a root cause annotation.
Scheduled runs skip wait-for-base-* jobs, running all stages in parallel. Fast-fail is also disabled.
Fast-Fail Layers
4 layers of fast-fail, from fine to coarse:
| Layer | Mechanism | Granularity | Disabled on schedule? |
|---|---|---|---|
| 1. Test method → file | unittest -f (failfast) |
One test method fails → entire test file stops immediately | Yes |
| 2. File → suite | run_unittest_files() default |
One test file fails → entire suite stops (--continue-on-error off) |
Yes |
| 3. Job → job (same stage) | check-pr-test-health action |
One job fails → other waiting jobs in same stage fast-fail (red X) | Yes |
| 4. Stage → stage (cross-stage) | wait-for-base-* + needs |
Base A fails → base B/C jobs skip entirely (never get a runner) | Yes (wait jobs skipped) |
- Layer 1:
-fflag appended to allpython3 -m pytest/unittestinvocations inci_utils.py - Layer 2:
--continue-on-errorflag inrun_suite.py— off for PRs, on for scheduled runs - Layer 3:
check-pr-test-healthauto-detectsscheduleevent and skips; filters out cascade failures to show only root cause jobs - Layer 4:
wait-for-base-*jobs are conditioned ongithub.event_name == 'pull_request'— skipped for scheduled runs
Execution Modes
| Aspect | PR (pull_request) |
Scheduled (cron, every 6h) |
/rerun-stage (workflow_dispatch) |
|---|---|---|---|
| Stage ordering | Sequential: A → B → C via wait-for-base-* |
Parallel (all at once) | Single target stage only |
| Cross-job fast-fail | Yes (check-pr-test-health) |
Yes | Yes |
| continue-on-error | No (stop at first failure within suite) | Yes (run all tests) | No |
| Retry | Enabled | Enabled | Enabled |
| max_parallel | 3 (default), 14 if high priority label |
14 | 3 (default), 14 if high priority |
| PR gate | Yes (draft, label, rate limit) | Skipped | Skipped |
| Concurrency | cancel-in-progress: true per branch |
Queue (no cancel) | Isolated per stage+SHA |
Stage Gating (wait-for-jobs action)
wait-for-base-a and wait-for-base-b are lightweight ubuntu-latest jobs that poll the GitHub Actions API.
How it works:
- Calls
listJobsForWorkflowRunto list all jobs in the current run - Matches jobs by exact name or prefix (for matrix jobs, e.g.,
base-b-test-1-gpu-small (3)) - If any matched job has
conclusion === 'failure'→ fail immediately (fast-fail) - If all matched jobs are completed and count matches
expected_count→ success - Otherwise → sleep
poll-interval-seconds(default: 60s) and retry - Timeout after
max-wait-minutes(240 min for base-a, 480 min for base-b)
Job specs example (base-b):
[
{"prefix": "base-b-test-1-gpu-small", "expected_count": 8},
{"prefix": "base-b-test-1-gpu-large", "expected_count": 14},
{"prefix": "base-b-test-2-gpu-large", "expected_count": 4},
{"prefix": "base-b-test-4-gpu-b200", "expected_count": 1}
]
Critical:
expected_countmust match the matrix size. If you add/remove matrix entries, update the wait job's spec accordingly.
PR only: Condition github.event_name == 'pull_request' && !inputs.target_stage — scheduled runs and /rerun-stage skip these entirely, allowing parallel execution.
Cross-Job Fast-Fail (check-pr-test-health action)
Composite action called after checkout in every stage test job (21 jobs total across pr-test.yml, pr-test-multimodal-gen.yml, pr-test-sgl-kernel.yml, pr-test-jit-kernel.yml).
How it works:
- Queries
listJobsForWorkflowRunfor the current workflow run - Filters for root cause failures only — jobs with
conclusion === 'failure'whose failing step is NOTcheck-pr-test-health(excludes cascade failures) - If root cause failures found → calls
core.setFailed()with the list of root cause job names - If none → does nothing (step succeeds)
Cascade filtering: When job A fast-fails due to health check, it also has conclusion: failure. Without filtering, job B would list both the original failure AND job A's fast-fail. The filter checks each failed job's steps array — if the failing step name contains check-pr-test-health or Check PR test health, it's excluded from the root cause list.
Usage pattern:
steps:
- name: Checkout code
uses: actions/checkout@v4
...
- uses: ./.github/actions/check-pr-test-health
id: pr-test-health
- name: Install dependencies # skipped automatically if health check failed
... # (default if: success() is false)
- name: Run test # also skipped
...
Visual effect: Job shows red X (failure) with error annotation showing root cause job names. Subsequent steps are naturally skipped (default if: success() is false after a failed step). No per-step if guards needed.
No stage filtering: Checks ALL jobs in the run, not just the current stage. Any failure anywhere triggers fast-fail.
Error message example:
Fast-fail: skipping — root cause job(s): base-b-test-1-gpu-small (0), base-b-test-1-gpu-small (1)
Within-Suite Failure Handling
Controlled by run_unittest_files() in python/sglang/test/ci/ci_utils.py.
Flags
| Flag | PR default | Scheduled default | Effect |
|---|---|---|---|
--continue-on-error |
Off | On | Off: stop at first failure. On: run all files, report all failures at end |
--enable-retry |
On | On | Retry retriable failures (accuracy/perf assertions) |
--max-attempts |
2 | 2 | Max attempts per file including initial run |
Retry Classification
When a test fails and retry is enabled, the output is classified:
Non-retriable (checked first — real code errors):
SyntaxError, ImportError, ModuleNotFoundError, NameError, TypeError, AttributeError, RuntimeError, CUDA out of memory, OOM, Segmentation fault, core dumped, ConnectionRefusedError, FileNotFoundError
Retriable (accuracy/performance):
AssertionError with comparison patterns (not greater than, not less than, not equal to), accuracy, score, latency, throughput, timeout
Default: Unknown AssertionError → retriable. Other unknown failures → not retriable.
How continue_on_error is set
In pr-test.yml's check-changes job:
scheduleruns orrun_all_testsflag →continue_on_error = 'true'- PR runs →
continue_on_error = 'false'
Each test job propagates via:
env:
CONTINUE_ON_ERROR_FLAG: ${{ needs.check-changes.outputs.continue_on_error == 'true' && '--continue-on-error' || '' }}
run: |
python3 run_suite.py --hw cuda --suite <name> $CONTINUE_ON_ERROR_FLAG
Test Partitioning
Large suites are split across matrix jobs using the LPT (Longest Processing Time) heuristic in ci_register.py:auto_partition():
- Sort tests by
est_timedescending, filename as tie-breaker (deterministic) - Greedily assign each test to the partition with smallest cumulative time
- Result: roughly equal total time per partition
Partition table (CUDA per-commit suites):
| Suite | Partitions | Runner | max_parallel |
|---|---|---|---|
base-a-test-1-gpu-small |
1 (no matrix) | 1-gpu-5090 |
— |
base-a-test-cpu |
4 | ubuntu-latest |
— |
base-b-test-1-gpu-small |
8 | 1-gpu-5090 |
8 |
base-b-test-1-gpu-large |
14 | 1-gpu-h100 |
dynamic (3 or 14) |
base-b-test-2-gpu-large |
4 | 2-gpu-h100 |
— |
base-b-test-4-gpu-b200 |
1 (no matrix) | 4-gpu-b200 |
— |
base-b-kernel-unit-1-gpu-large |
1 (no matrix) | 1-gpu-h100 |
— |
base-b-kernel-unit-1-gpu-b200 |
1 (no matrix) | 4-gpu-b200 |
— |
base-b-kernel-unit-8-gpu-h200 |
1 (no matrix) | 8-gpu-h200 |
— |
base-b-kernel-benchmark-1-gpu-large |
1 (no matrix) | 1-gpu-h100 |
— |
base-c-test-4-gpu-h100 |
3 | 4-gpu-h100 |
— |
base-c-test-8-gpu-h200 |
4 | 8-gpu-h200 |
— |
base-c-test-8-gpu-h20 |
2 | 8-gpu-h20 |
— |
base-c-test-deepep-4-gpu-h100 |
1 (no matrix) | 4-gpu-h100 |
— |
base-c-test-4-gpu-b200 |
3 | 4-gpu-b200 |
— |
base-c-test-4-gpu-b200-small |
3 | 4-gpu-b200-low-disk |
— |
base-c-test-8-gpu-b200 |
registered only | 8-gpu-b200 |
— |
base-c-test-4-gpu-gb200 |
registered only | 4-gpu-gb200 |
— |
Note: Kernel suites (
base-b-kernel-*) run viapr-test-jit-kernel.ymlandpr-test-sgl-kernel.yml, not the mainpr-test.yml.base-c-test-8-gpu-b200is registered intest/run_suite.pybut not wired to PR CI. The GB200 job is currently commented out inpr-test.ymluntil a company-owned runner is provisioned. Multimodal diffusion usespython/sglang/multimodal_gen/test/run_suite.py, nottest/run_suite.py.
Workflow usage:
strategy:
matrix:
partition: [0, 1, 2, 3, 4, 5, 6, 7]
steps:
- run: python3 run_suite.py --hw cuda --suite base-b-test-1-gpu-small \
--auto-partition-id ${{ matrix.partition }} --auto-partition-size 8
check-changes Job
Determines which test suites to run based on file changes.
Detection Methods
| Trigger | Method | Details |
|---|---|---|
pull_request |
dorny/paths-filter |
Detects changes via GitHub diff |
workflow_dispatch (with pr_head_sha) |
GitHub API | repos/{repo}/compare/main...{sha} |
schedule / run_all_tests |
Force all true | Runs everything |
Output Flags
| Output | Triggers |
|---|---|
main_package |
Base A/B/C test suites |
sgl_kernel |
Kernel wheel builds + kernel test suites; also switches B200 jobs to kernel-build runner labels outside target_stage mode |
jit_kernel |
JIT kernel test workflow |
multimodal_gen |
Multimodal-gen test workflow |
Note: In
target_stagemode,sgl_kernelis only active wheninclude_wheel_build=true. Without that opt-in, kernel-change reruns fail validation instead of running a target stage without freshly built wheels. Outsidetarget_stage,sgl_kernel=trueswitches B200 jobs from4-gpu-b200/4-gpu-b200-low-diskto4-gpu-b200-kernel/4-gpu-b200-kernel-low-disk.
Concurrency Control
group: pr-test-{event_name}-{branch}-{pr_sha}-{stage}
| Segment | Source | Purpose |
|---|---|---|
event_name |
github.event_name |
Prevents scheduled runs colliding with fork PRs named main |
branch |
github.head_ref || github.ref_name |
Per-branch isolation |
pr_sha |
inputs.pr_head_sha || 'current' |
Isolates /rerun-stage from main runs |
stage |
inputs.target_stage || 'all' |
Allows parallel stage dispatches |
cancel-in-progress: true for pull_request events (new push cancels old run), false for workflow_call.
How To: Add a New Stage Job
- Define the job in
pr-test.ymlwithneeds: [check-changes, call-gate, wait-for-base-X, ...] - Copy the
if:condition pattern from an existing same-stage job (handlestarget_stage,schedule,main_package) - Add
checkoutstep - Add
check-pr-test-healthstep (after checkout) — if any prior job failed,core.setFailed()fires and all subsequent steps auto-skip via defaultif: success() - Add
check-maintenancestep - Add
download-artifactstep ifsgl_kernelchanged - Add
install dependenciesstep - Add
run teststep with$CONTINUE_ON_ERROR_FLAG - Add
upload-cuda-coredumpsstep withif: always() - Register the suite name in
PER_COMMIT_SUITESintest/run_suite.py - If using matrix, add
--auto-partition-idand--auto-partition-sizeto the run command - Update
wait-for-base-Xjob spec with the new job name andexpected_count(if matrix) - Add the job to
pr-test-finish.needslist
How To: Debug CI Failures
| Symptom | Likely cause | What to check |
|---|---|---|
| All stage-B/C jobs green but steps skipped | Earlier job failed, check-pr-test-health triggered |
Find the actual failed job (red X) |
wait-for-base-b timeout |
expected_count doesn't match matrix size |
Verify job spec counts match matrix: array length |
pr-test-finish fails but all jobs green |
A job was cancelled (counts as failure in finish) |
Check concurrency cancellation |
| Tests pass locally but fail in CI | Partition assignment, runner GPU type, or est_time inaccuracy |
Check which partition the test lands in; verify runner label |
| Flaky test retried and passed | Retriable failure (accuracy/perf) | Check [CI Retry] markers in job logs |
| Flaky test NOT retried | Matched non-retriable pattern | Check if error matches NON_RETRIABLE_PATTERNS in ci_utils.py |
Slash Commands
| Command | Effect |
|---|---|
/tag-run-ci-label |
Adds run-ci label to PR |
/tag-run-ci-label extra |
Adds both run-ci and run-ci-extra labels |
/rerun-failed-ci |
Reruns failed jobs in the latest workflow run |
/tag-and-rerun-ci |
Adds run-ci label + reruns failed |
/tag-and-rerun-ci extra |
Adds both run-ci and run-ci-extra labels + reruns failed |
/rerun-stage <stage> |
Deprecated; posts deprecation notice |
/rerun-test <test-file> [<test-file> ...] |
Reruns specific test file(s) via rerun-test.yml. A file arg containing a glob metacharacter (*, ?, [...]) expands against test/registered/ and the multimodal test dir to every matching test_*.py (e.g. /rerun-test test_*backend*.py — wrap in backticks so GitHub doesn't italicize the *); matches are deduped, grouped by dispatch shape, and can't carry a ::test selector. No match → single ⛔ reply, nothing dispatched. Each reply echoes its originating command (Results for …) so concurrent commands stay distinguishable |
/rerun-group <group> [<group> ...] |
Expands registered test groups, then reuses /rerun-test |
Handled by scripts/ci/utils/slash_command_handler.py → .github/workflows/slash-command-handler.yml.
Label-gated workflow dispatch (pr-test, pr-test-extra)
pr-test.yml and pr-test-extra.yml both listen for pull_request.labeled (in addition to opened/synchronize/reopened). The check-changes.if gate has two clauses:
- For
labeledevents: the just-added label must be one of the gating labels (run-cifor pr-test,run-ciorrun-ci-extrafor pr-test-extra) — otherwise every unrelated label addition would dispatch a full CI run. - All events: the PR must currently carry the required labels.
This is what lets /tag-run-ci-label (and the extra variant) trigger a fresh CI run without an extra push.
Caveat — skipped runs cannot be un-skipped by run.rerun(): GitHub's rerun API reuses the original event payload, so rerunning a pull_request-event run that was skipped because of missing labels will skip again (label set in the frozen payload doesn't update). The only way to recover a label-skipped run is to add the missing label, which fires a fresh labeled event with the current label set. handle_rerun_failed_ci in the slash handler is for rerunning failed/non-label-skipped runs; it cannot revive label-skipped ones.