name: snakemake-rule description: Generate or edit a Snakemake rule following project conventions. Invoke when adding a new rule, modifying an existing rule, or debugging a rule in code/rules/. argument-hint: "[rule description or rule name]"
Snakemake Rule Conventions
Always invoked from code/ directory
All paths in rules are relative to code/. Key relative path prefixes:
scratch/— large intermediates (not git-tracked)config/— config files../analysis/— notebooks../output/— small committed outputs../logs/— job logs../data/— raw data
Required rule structure
Every rule MUST have a log: directive, and the shell/script command MUST redirect both stdout and stderr to it:
rule my_rule:
input:
bam = "scratch/alignments/{sample}.bam",
bai = "scratch/alignments/{sample}.bam.bai",
output:
counts = "scratch/counts/{sample}.txt",
log:
"../logs/my_rule.{sample}.log"
conda:
"envs/rnaseq.yaml"
resources:
mem_mb = 8000
shell:
"""
some_command {input.bam} > {output.counts} 2> {log}
"""
For multi-command shell blocks, redirect each command:
shell:
"""
cmd1 {input} > {output.tmp} 2> {log}
cmd2 {output.tmp} >> {output.result} 2>> {log}
"""
For Python scripts where data output goes to a file (via --out flag), and stdout is only progress messages, capture both to the rule log so nothing is lost in unreliable slurm .out files:
shell:
"python scripts/myscript.py --in {input} --out {output} >{log} 2>&1"
For scripts that write data to stdout, use the usual split (>{output} 2>{log}).
Conda env directive
- Use
conda: "envs/myenv.yaml"for rule-specific envs - Check
code/envs/for existing envs before creating a new one - Do NOT create
py_general.yaml— that name collides with the user's managed system env and causes confusion. Use a descriptive name likepysam_utils.yaml - Always reference with path relative to Snakefile location (i.e., relative to
code/)
Resources
- Default:
mem_mb = 4000(profile default) - Specify higher for memory-intensive steps:
mem_mb = 32000 - For multithreaded: add
threads: 8and use{threads}in shell command
Wildcard conventions
{sample}— sample name{chrom}— chromosome (e.g., chr1)- Use expand() in rule all or aggregate rules
Rule all / target rules
rule all:
input:
expand("scratch/counts/{sample}.txt", sample=config["samples"])
Config access
configfile: "config/config.yaml"
# Access as: config["key"]
When writing a new rule
- Check
code/rules/for existing rules to follow patterns - Check
code/envs/before creating a new conda env - Place rule in the most appropriate existing
.smkfile, or create a new one incode/rules/ - Add target outputs to
rule allin the main Snakefile
Before submitting to cluster (IMPORTANT)
For rules that run scripts or commands on large files — always test locally first on one small sample before targeting all samples. This catches conda env issues, script bugs, and bad args cheaply.
Step 1: validate the conda env has the right packages
conda run -n <env> python -c "import pysam, numpy; print('ok')"
If creating a new code/envs/<name>.yaml, verify all script imports are present in that yaml.
Step 2: run the script directly on the smallest available sample
conda run -n <env> python scripts/myscript.py \
--input alignment/smallest_sample.bam \
--output scratch/test_out.tsv.gz 2>&1 | tail -20
Run to completion. Write test outputs to scratch/, not output/.
Step 3: target a single output for the first snakemake run
conda run -n sm_splicingmodulators snakemake --profile slurm_midway3 \
output/path/smallest_sample.ext -T 0
Only after this passes → submit all samples.
Debugging cluster failures
- Slurm logs first:
logs/slurm/<rule>.<jobid>.err— the actual error - Rule log:
logs/<rule>/<wildcard>.log— stdout/stderr from the command - OOM: sacct
ExitCode=1:0withMaxRSSnearReqMemceiling = OOM kill.samtools sort -m Xovershoots ~50%; rule of thumb:mem_mb ≥ (threads × -m_bytes × 1.5) + 2G - sbatch not in PATH: slurm module loads only for login shells. Fix:
module load slurm/currentin~/.zshrc_local - PermissionError on scratch:
shadow-prefixin profile config must match the node's actual scratch mount - Unexpected re-runs after adding
log:: adding/changinglog:changes the rule code hash → snakemake re-runs the rule and all downstream