snakemake-workflow-architecture

star 2

Snakemake workflow design for KINTSUGI SLURM processing: lambda resource routing, GPU-only cycle pre-assignment, multi-account scheduling, registration + QC aggregate rules. Trigger: Snakemake rules, workflow config, ruleorder, cycle assignment, DAG design, sentinel files, workflow run/check, registration, VALIS, QC reports.

smith6jt-cop By smith6jt-cop schedule Updated 2/24/2026

name: snakemake-workflow-architecture description: "Snakemake workflow design for KINTSUGI SLURM processing: lambda resource routing, GPU-only cycle pre-assignment, multi-account scheduling, registration + QC aggregate rules. Trigger: Snakemake rules, workflow config, ruleorder, cycle assignment, DAG design, sentinel files, workflow run/check, registration, VALIS, QC reports." author: KINTSUGI Team date: 2026-02-14

Snakemake Workflow Architecture for Multi-Account SLURM

Experiment Overview

Item Details
Date 2026-02-12
Goal Replace submit.sh orchestration with Snakemake while supporting multi-account GPU+CPU scheduling
Environment HiPerGator HPC, Snakemake >= 8.0, snakemake-executor-plugin-slurm, multiple SLURM accounts
Status Implemented

Context

The original submit.sh (~1500 lines) handled job orchestration, dependency wiring, skip-existing logic, and dual-pool slot calculation. Snakemake can handle all of this declaratively, but the multi-account GPU+CPU architecture requires careful design since Snakemake has no built-in concept of routing jobs to different accounts based on resource type.

Verified Workflow

3 Per-Cycle Rules with Lambda Resources + 1 Aggregate Rule (NOT 6 rules + ruleorder)

Each per-cycle processing step (stitch, deconvolve, edf) is a single rule. Account, partition, and resource allocation are set per-wildcard via lambda functions in the resources: block. Registration is an aggregate rule with static resources (see below).

rule stitch:
    input: ...
    output: "{project}/.snakemake_complete/stitch_cyc{cycle}"
    resources:
        slurm_partition=lambda wc: _assign(wc)["partition"],
        slurm_account=lambda wc: _assign(wc)["account"],
        gpus=lambda wc: 1 if _is_gpu(wc) else 0,
        cpus_per_task=lambda wc: 4 if _is_gpu(wc) else CPU_CPUS,
        runtime=lambda wc: RES.get("time_stitch", 240) * (1 if _is_gpu(wc) else CPU_TIME_MULT),
    envmodules: ...
    shell: "KINTSUGI_DEVICE_MODE={params.device_mode} python workflow/scripts/stitch.py ..."

Cycle Pre-Assignment (_build_cycle_assignment()) — GPU-Only

Cycles are assigned to accounts at DAG creation time (not runtime). All cycles route through GPU — CPU fallback was removed because GPU is 15-20x faster (see gpu-only-scheduling skill).

  1. GPU queue: Each account contributes gpu_slots entries
  2. Overflow: Round-robin across GPU accounts (still GPU mode, cycles queue in SLURM)

No CPU queue — overflow cycles wait for GPU slots to free up, which is dramatically faster than running on CPU.

Sentinel Files for Outputs

Rules use .snakemake_complete/stitch_cyc{cycle} sentinel files because:

  • Stitching/deconvolution produce hundreds of files (channels x z-planes)
  • EDF produces marker-named files that vary per cycle
  • Declaring every output would create an enormous, fragile DAG

A separate validate rule checks that all expected files exist.

Per-Cycle Pipeline Dependencies

Dependencies flow per-cycle, enabling pipelining:

stitch cyc01 → decon cyc01 → edf cyc01 ─┐
stitch cyc02 → decon cyc02 → edf cyc02 ─┼─→ registration (all cycles)
stitch cyc03 → decon cyc03 → edf cyc03 ─┘

Cycle 1's deconvolution starts the moment cycle 1's stitching finishes.

Rule 4: Registration (Aggregate, No Cycle Wildcard)

Registration (multi-cycle alignment via VALIS) processes ALL cycles in a single SLURM job. Unlike rules 1-3, it has no {cycle} wildcard — resources are static, not lambda-driven.

Rule 5: Signal Isolation (Aggregate, CPU-Only)

Added 2026-02-24. Signal isolation (autofluorescence subtraction) runs after registration, processing all registered channels. CPU-only (numpy/scipy), uses _QC_CPU_ASSIGN for partition routing (same as QC rules). See snakemake-signal-isolation skill for details.

Full pipeline: stitch → decon → edf (per-cycle) → registrationsignal_isolation (both aggregate) + 5 QC rules.

_REG_ASSIGN = _registration_assignment()  # Computed once at DAG creation

rule registration:
    input: all_edf_sentinels(),  # Depends on ALL EDF cycles completing
    output: sentinel="{project}/data/processed/registered/.snakemake_complete",
    resources:
        slurm_partition=_REG_ASSIGN["partition"],  # Static, not lambda
        slurm_account=_REG_ASSIGN["account"],
        gres="gpu:1" if _REG_ASSIGN["mode"] == "gpu" else "",
    script: "scripts/registration.py"

Key design decisions:

  • _registration_assignment(): Prefers first account with gpu_slots > 0 (for VggFD feature detector). Falls back to CPU with OrbFD.
  • Static resources: No lambda wildcards because there's no {cycle} wildcard to dispatch on.
  • Single-cycle handling: If len(CYCLES) < 2, copies EDF → registered without VALIS (no alignment needed).
  • Error fallback: On VALIS failure, copies EDF images unchanged (graceful degradation).
  • f-string output: Uses f"{PROJECT}/..." since there's no wildcard — same pattern as QC rules.

Cycle Directory Resolution

_resolve_raw_cycle_dir() handles multiple naming conventions at DAG creation time:

  • Long-form: cyc001_reg001_200210_170925
  • Short 3-digit: cyc001
  • Short 2-digit: cyc01
  • Capitalized: Cyc01

Config Format (workflow/config.yaml)

resources:
  accounts:
    - name: clive
      partition_gpu: "hpg-b200,hpg-turin"
      partition_cpu: hpg-default
      gpu_slots: 3
      cpu_slots: 11
    - name: maigan
      partition_gpu: "hpg-b200,hpg-turin"
      partition_cpu: hpg-default
      gpu_slots: 2
      cpu_slots: 8
  total_gpu_slots: 5
  total_cpu_slots: 19
  total_slots: 24
  cpu_utilization_cap: 0.85
  cpu_time_multiplier: 5
  cpu_cpus_per_task: 8

Legacy fallback: If accounts list is missing, reads old account_gpu/account_cpu scalars.

Registration parameters are also in config.yaml:

registration:
  reference_cycle: 1
  max_image_dim: 2048
  rigid_only: false
  feature_detector: "VggFD"

resources:
  mem_registration: 64000
  time_registration: 120
  cpu_mem_registration: 64000

CLI Commands

Command What it does
kintsugi workflow config . Discovers accounts via sacctmgr, generates config + copies Snakefile
kintsugi workflow check . Shows live per-account availability (allocation, in-use, available)
kintsugi workflow run . Submits via Snakemake with auto-calculated -j from live availability

workflow config always overwrites the Snakefile (so pipeline updates propagate) but only copies scripts/profiles if they don't already exist.

SLURM Profile (workflow/profiles/slurm/config.yaml)

Account and partition are set per-rule in the Snakefile via lambda resources, NOT in the profile. The profile provides:

  • executor: slurm
  • default-resources (mem_mb, runtime, cpus_per_task)
  • latency-wait: 120 (NFS propagation tolerance)
  • retries: 2 (automatic retry)
  • keep-going: true

Failed Attempts (Critical)

Attempt Why it Failed Lesson Learned
6 rules + ruleorder (stitch_gpu > stitch_cpu) ruleorder always picks the preferred rule — CPU variants NEVER execute Use lambda resources in a single rule to route per-wildcard
--resources gpus=N to limit GPU jobs Doesn't work with multi-account routing; Snakemake can't track per-account budgets Bake GPU budget into cycle pre-assignment
Runtime cycle assignment Race conditions, no reproducibility Pre-assign at DAG creation time for deterministic scheduling
Declaring all output files per rule Hundreds of files per stitch/decon (channels x z-planes), fragile DAG Use sentinel files + separate validation rule
sacctmgr show user USERNAME format=account Returns empty pipe on HiPerGator Use sacctmgr show associations user=USERNAME format=account -n -P
Including burst accounts (-b suffix) Burst QOS has unreliable memory; OOM kills Filter out accounts ending with -b
Setting account/partition in profile config.yaml Same settings for all rules — can't route GPU vs CPU Must be per-rule via lambda in Snakefile
gpus=1 or gpu=1 resource Both trigger SLURM_TRES_PER_TASK conflict on SLURM >= 24.11 Use gres="gpu:1" which maps to --gres=gpu:1
slurm_extra="'--gres=gpu:1'" Plugin blocks --gres in slurm_extra validation Use gres resource, not slurm_extra
No precommand in SLURM profile srun on compute nodes inherits bare shell — no conda env, cupy not importable Add precommand: "module load conda && conda activate KINTSUGI" to profile
No SLURM_TRES_PER_TASK cleanup SLURM >= 24.11 sets this env var in GPU jobs; jobstep plugin's srun inherits it and crashes Patch jobstep plugin __post_init__() to os.environ.pop("SLURM_TRES_PER_TASK", None)
CPU fallback for overflow cycles 15-20x slower — BaSiC fit() bottleneck (500 DCT iterations/z-plane) Route ALL cycles through GPU; queue in SLURM (see gpu-only-scheduling skill)
Hardcoded max_workers=4 in wrapper scripts Wasted half of 8 allocated CPU cores Read from snakemake.resources.cpus_per_task dynamically

Per-Channel Skip-Existing (Inside Wrapper Scripts)

Snakemake controls the DAG at the cycle level via sentinel files. If a sentinel is missing, Snakemake reruns the entire cycle. But if a job was interrupted after completing 3 of 4 channels, wrapper scripts now skip completed channels internally:

Script Completeness Check What Counts as "Complete"
stitch.py channel_complete(ch) All z-plane TIFs exist + result_df.pkl for CH1
deconvolve.py channel_decon_complete(ch) Decon TIF count >= stitched input TIF count
edf.py channel_edf_complete(ch) Marker-named output file exists

Key behaviors:

  • Partially-complete channels are fully reprocessed (only skip when ALL expected files exist)
  • If ALL channels were already complete, sentinel is still written and job exits 0
  • Sentinel files include skipped=N for log visibility
  • When the sentinel already exists, Snakemake skips the entire rule (coarser, still works)
# Pattern used in all 3 wrapper scripts:
channels_to_process = []
skipped_channels = []
for ch in CHANNELS:
    if channel_X_complete(ch):
        print(f"  Channel {ch} SKIPPED (...)")
        skipped_channels.append(ch)
    else:
        channels_to_process.append(ch)

# Process only incomplete channels
successful = sum(1 for _, ok in results if ok) + len(skipped_channels)

QC Report Rules (Aggregate)

Added 2026-02-13. Three aggregate rules produce rich QC reports after each processing stage completes across all cycles:

rule qc_stitch:
    input:
        stitch_done=_all_stitch_sentinels(),
    output:
        sentinel=f"{PROJECT}/qc_plots/.snakemake_complete_stitch",
    params:
        stage="stitch", ...
    resources:
        mem_mb=RES.get("mem_qc", 64000),
        slurm_partition=_REG_ASSIGN["partition"],
        slurm_account=_REG_ASSIGN["account"],
        gres="gpu:1" if _REG_ASSIGN["mode"] == "gpu" else "",
    script: "scripts/qc_report.py"

Key Design Decisions

Decision Rationale
Aggregate (all cycles), not per-cycle QC heatmaps need cross-cycle comparison — per-cycle would miss the point
Single dispatch script (qc_report.py) Avoids duplicating GPU init, path setup, and caching logic across 3 scripts
Reuses _REG_ASSIGN from registration QC rules have no cycle wildcard, can't use _assign(wc) — same pattern as registration
Cross-stage cache pickles (cache/stitch_stats.pkl) Each stage loads prior stage's stats for comparison plots
Included in rule all via all_qc_sentinels() QC runs automatically after processing; rule qc is standalone shortcut
matplotlib.use("Agg") before Kprocess import Compute nodes have no display; must set headless backend first
f-string outputs (not wildcard) QC sentinels use f"{PROJECT}/..." since there's no cycle wildcard

Config Resources

resources:
  mem_qc: 64000    # 64 GB
  time_qc: 120     # 2 hours

What Snakemake Replaces vs Keeps

Replaced by Snakemake Kept from submit.sh
Orchestration & dependency wiring Python processing scripts (workflow/scripts/*.py)
.complete marker polling KINTSUGI_DEVICE_MODE env var pattern
--array limit calculation Device-adaptive backends (CuPy/NumPy)
Dual-pool slot calculation (shared via hpc.py) Per-channel skip-existing (inside wrapper scripts)

Both systems produce files in the same data/processed/ tree. They can coexist but should NOT run simultaneously on the same project.

Registration is only available via Snakemake (not in submit.sh). It runs as a single aggregate job after all per-cycle EDF processing completes.

Key Insights

  • Lambda resources are the key mechanism — Snakemake has no built-in multi-account concept, but lambda functions in resources: give per-job control
  • Pre-assignment beats runtime scheduling — Deterministic, reproducible, and avoids race conditions
  • Sentinel files are a pragmatic compromise — Trade strict output tracking for a manageable DAG
  • Per-channel skip-existing complements sentinel-level control — Snakemake handles cross-rule dependencies; wrapper scripts prevent re-doing completed work within a single interrupted job
  • sacctmgr show associations is the correct SLURM queryshow user format doesn't work on all clusters
  • GPU-only scheduling — CPU is 15-20x slower for BaSiC; overflow cycles queue for GPU instead (see gpu-only-scheduling skill)
  • Dynamic worker counts — Wrapper scripts read cpus_per_task from Snakemake resources, not hardcoded 4
  • Always overwrite the Snakefile — Pipeline logic changes must propagate; user customization goes in config.yaml, not the Snakefile
  • precommand activates the existing conda env — Compute nodes need module load conda && conda activate KINTSUGI to access cupy, torch, etc.
  • SLURM_TRES_PER_TASK must be unset for jobstep srun — SLURM >= 24.11 bug; patch survives until upstream fix in snakemake-executor-plugin-slurm-jobstep
  • GPU resource: use gres="gpu:1" — Maps to --gres=gpu:1; both gpus and gpu resources trigger SLURM 24.11 TRES conflicts
  • Aggregate rules use static assignment, not lambdas — Registration and QC rules have no {cycle} wildcard, so use _REG_ASSIGN (from _registration_assignment()) computed once at DAG creation
  • Per-file script copy in workflow config — New scripts auto-propagate to projects without overwriting user-customized scripts

References

  • KINTSUGI CLAUDE.md - "Snakemake Workflow" and "Multi-Account Architecture" sections
  • gpu-only-scheduling skill - Why CPU fallback was removed (15-20x performance data)
  • slurm-concurrent-processing skill - Dual-pool resource calculation (submit.sh version, historical)
  • slurm-workflow-integration skill - Original SLURM CLI integration
  • workflow/Snakefile - Implementation
  • src/kintsugi/hpc.py - detect_multi_account_resources(), detect_live_multi_account()
  • src/kintsugi/cli.py - workflow config/check/run commands
Install via CLI
npx skills add https://github.com/smith6jt-cop/Skills_Registry --skill snakemake-workflow-architecture
Repository Details
star Stars 2
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator
smith6jt-cop
smith6jt-cop Explore all skills →