name: hpc version: 1.0 description: "Use when submitting jobs to UVA HPC (Rivanna/Afton), writing Slurm scripts (sbatch/srun/squeue), converting SGE to Slurm, running compute on any Slurm-managed cluster, or building WRDS data pipelines with polars on HPC. Triggers: 'submit to HPC', 'sbatch', 'squeue', 'slurm job', 'run on Rivanna', 'run on Afton', 'HPC array job', 'convert SGE to Slurm', 'polars on HPC', 'WRDS from HPC'." user-invocable: false
Contents
- When to Use What
- Login Node Enforcement
- Cluster Reference
- Slurm Job Submission
- Array Jobs
- SGE to Slurm Translation
- Environment Variables
- WRDS Data Access
- Monitoring & Debugging
- Resource Billing
When to Use What
Three compute environments, each with a clear role:
| Environment | Use For | Examples |
|---|---|---|
| Local / RJDS | Exploration, prototyping, notebooks | EDA, quick plots, marimo/Jupyter, test on small samples, iterate on code |
| WRDS (SGE) | Data access, SAS ETL, file parsing | SAS jobs against WRDS libraries, SEC filing parsers on /wrds/sec/, scan_covers, ad-hoc SQL |
| UVA HPC (Slurm) | Scale compute | Model estimation (PIN), large polars pipelines, anything needing >10 cores or >1 hour |
The Workflow
1. EXPLORE (local/RJDS) → Prototype code, test on 5-10 items
2. BUILD DATA (WRDS) → SAS ETL or PostgreSQL queries (data lives there)
3. ESTIMATE AT SCALE (HPC) → sbatch when you need 100+ cores
4. ANALYZE RESULTS (local) → Pull results back, notebooks, regressions, tables
Decision Rules
- Does it need WRDS filesystem access? (
/wrds/sec/, SAS libraries) → WRDS - Is it CPU-intensive and embarrassingly parallel? → HPC
- Is it exploratory or iterative? → Local / RJDS
- Is it a quick SQL query? → Either WRDS or HPC (both have PostgreSQL access)
HPC Interactive Partition
The interactive partition (42 nodes, 12h max) is for testing sbatch scripts on one chunk before submitting 176 tasks, not for replacing local dev work:
salloc -p interactive --cpus-per-task=4 --mem=16G --time=1:00:00
# test your script, then exit and sbatch the real job
Why This Split Matters
PIN estimation proved it: WRDS SGE has 10 concurrent slots and took 8+ hours without starting OWR. UVA HPC ran 70+ OWR tasks simultaneously and finished in 30 minutes. But WRDS is still the right place to build the data — the SAS libraries and SEC filings live there.
Login Node Enforcement
IRON LAW: NEVER RUN COMPUTE ON THE LOGIN NODE
ALWAYS write a Slurm submission script and submit via sbatch. No exceptions.
ssh uva-hpc 'python3 est.py owr 2020'→ WRONG. Use sbatch.ssh uva-hpc 'nohup ./process &'→ WRONG. Still the login node. Use sbatch.ssh uva-hpc 'for year in 2003..2024; do python3 ...; done'→ WRONG. Use sbatch --array.sbatch run_est.sh owr→ CORRECT.
The login node is for: sbatch, squeue, scancel, sinfo, scp, ls, head, short queries.
Login Node Facts
- Tests go through the scheduler too: write the sbatch script first and test with
--array=1-1. The login-node "quick test" is the run that flags the account — one stock becomes 5,000 when the args change, and you don't know it "only takes 30 seconds" until it runs.
Red Flags — STOP If You're About To
- Write
ssh uva-hpc 'python3 ... > output'→ STOP. Write a submit script. - Write
ssh uva-hpc 'nohup ... &'→ STOP. Use sbatch. - Run a loop over years/permnos interactively → STOP. Use
--array.
Cluster Reference
UVA HPC (Rivanna/Afton)
- SSH:
ssh uva-hpc(configured with ProxyJump through Mac via tailnet) - User:
vwh7mb - Home:
/home/vwh7mb(GPFS, 12PB shared, no per-user quota displayed) - Scratch:
/scratch/vwh7mb/(Weka, 12TB) - Allocation: 10M SUs (service units ≈ weighted CPU-core-hours)
Partitions
| Partition | Nodes | CPUs/Node | RAM/Node | MaxTime | MinNodes | MaxNodes | Use For |
|---|---|---|---|---|---|---|---|
standard |
301 | 40+ | 384GB+ | 7 days | 0 | 1 | Single-node jobs, array tasks |
parallel |
179 | 96 | 768GB | 3 days | 2 | 64 | Multi-node MPI jobs only |
gpu |
44 | 36+ | 257GB+ | 3 days | — | — | GPU workloads |
interactive |
42 | 32+ | 128GB+ | 12 hrs | — | — | Interactive/debugging |
CRITICAL: Partition Selection
The parallel partition requires MinNodes=2 — it will reject single-node jobs with "Node count specification invalid". It is designed for MPI jobs that span multiple nodes.
Wrong: #SBATCH --partition=parallel for array jobs → submission fails
Right: #SBATCH --partition=standard for array jobs → 301 nodes available
When to Use Each Partition
standard (default choice for most research computing):
- Embarrassingly parallel work: array jobs where each task is independent (PIN estimation, file parsing, per-stock/per-year processing)
- Single-node Python/R with
ProcessPoolExecutor,multiprocessing,mclapply - Any job where tasks don't communicate with each other
- MaxNodes=1, so each array element runs on exactly one node
parallel (multi-node distributed computing):
- MPI jobs where processes on different nodes exchange messages (
mpi4py, OpenMPI, MVAPICH) - Dask distributed or Ray clusters spanning multiple nodes
- Large linear algebra / matrix factorizations that exceed single-node RAM (ScaLAPACK, PETSc)
- Simulations with inter-process communication (CFD, molecular dynamics)
- Key requirement: your code must explicitly coordinate across nodes (MPI, Dask scheduler, etc.) —
ProcessPoolExecutorandmultiprocessingare single-node only - MinNodes=2, 96 CPUs and 768GB RAM per node — use when one node isn't enough
gpu (GPU-accelerated workloads):
- Deep learning training/inference (PyTorch, TensorFlow, JAX)
- GPU-accelerated linear algebra (CuPy, RAPIDS)
- LLM inference or fine-tuning
interactive (debugging and development):
- Testing job scripts before full submission:
salloc -p interactive --cpus-per-task=4 --mem=16G --time=1:00:00 - Debugging segfaults or data loading issues
- 12-hour max — not for production runs
Python/R Environment
- pixi: Install to
$HOME/.pixi/bin/pixiviacurl -fsSL https://pixi.sh/install.sh | bash - Project envs:
$HOME/projects/<name>/.pixi/envs/default/bin/python - Modules (alternative):
module load python— but pixi preferred for reproducibility - NEVER install Jupyter kernels globally on HPC
Slurm Job Submission
Basic Submit Script
#!/bin/bash
#SBATCH --job-name=my_job
#SBATCH --partition=standard
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
#SBATCH --time=3:00:00
#SBATCH --output=logs/job-%A_%a.log
mkdir -p logs
export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1
PYTHON=$HOME/projects/my-project/.pixi/envs/default/bin/python
$PYTHON -u my_script.py --workers ${SLURM_CPUS_PER_TASK:-8}
Submission
sbatch script.sh # submit
sbatch script.sh arg1 arg2 # args passed to script as $1, $2
Note: unlike SGE's qsub run.sh <model>, Slurm passes arguments after the script name directly. Use ${1:?Usage: sbatch script.sh <arg>} to enforce.
Array Jobs
Pattern
#SBATCH --array=1-176 # tasks 1 through 176
#SBATCH --array=1-176%50 # max 50 concurrent tasks
#SBATCH --array=1,5,9,13 # specific tasks only
Year × Chunk Sharding (PIN estimation pattern)
#SBATCH --array=1-176
# 22 years × 8 chunks = 176 tasks
# Decode: year = START_YEAR + (id-1)/NCHUNKS, chunk = (id-1)%NCHUNKS
NCHUNKS=8
START_YEAR=2003
idx=$((SLURM_ARRAY_TASK_ID - 1))
year=$((START_YEAR + idx / NCHUNKS))
chunk=$((idx % NCHUNKS))
Task List Sharding (file processing pattern)
# Equivalent to SGE's sed -n "${SGE_TASK_ID}p" pattern
ITEM=$(sed -n "${SLURM_ARRAY_TASK_ID}p" "$TASK_LIST")
Re-running Failed Tasks
# Re-run specific tasks
sbatch --array=5,12,87 script.sh
# Re-run a range
sbatch --array=10-20 script.sh
SGE to Slurm Translation
Directives
| SGE | Slurm | Notes |
|---|---|---|
#$ -N job_name |
#SBATCH --job-name=job_name |
|
#$ -cwd |
(default behavior) | Slurm runs from submit dir by default |
#$ -l m_mem_free=4G |
#SBATCH --mem=4G |
Per-node memory |
#$ -pe onenode N |
#SBATCH --ntasks=1 --cpus-per-task=N |
Single-node parallel |
#$ -j y |
(default behavior) | Slurm merges stderr into stdout by default |
#$ -o logs/out-$TASK_ID.log |
#SBATCH --output=logs/out-%A_%a.log |
%A=job, %a=array task |
#$ -t 1-176 |
#SBATCH --array=1-176 |
|
| (no equivalent) | #SBATCH --partition=standard |
Required — no default partition |
| (no equivalent) | #SBATCH --time=3:00:00 |
Default 5h, max 7d on standard |
Environment Variables
| SGE | Slurm | Description |
|---|---|---|
$SGE_TASK_ID |
$SLURM_ARRAY_TASK_ID |
Array task index |
$JOB_ID |
$SLURM_JOB_ID |
Job ID |
$NSLOTS |
$SLURM_CPUS_PER_TASK |
Allocated CPUs |
$HOSTNAME |
$SLURM_NODELIST |
Assigned node(s) |
$SGE_TASK_FIRST |
$SLURM_ARRAY_TASK_MIN |
First array index |
$SGE_TASK_LAST |
$SLURM_ARRAY_TASK_MAX |
Last array index |
Commands
| SGE | Slurm | Description |
|---|---|---|
qsub script.sh |
sbatch script.sh |
Submit job |
qstat -u $USER |
squeue -u $USER |
List running jobs |
qdel job_id |
scancel job_id |
Cancel job |
qstat -j job_id |
scontrol show job job_id |
Job details |
qacct -j job_id |
sacct -j job_id |
Job accounting |
| (no equivalent) | sinfo -p partition |
Partition info |
Conversion Checklist
When converting an SGE script to Slurm:
- Replace
#$directives with#SBATCHequivalents (see table above) - Add
#SBATCH --partition=standard(SGE has no equivalent — partition is implicit) - Add
#SBATCH --time=(SGE defaults to unlimited on WRDS) - Replace
$SGE_TASK_ID→$SLURM_ARRAY_TASK_ID - Replace
$NSLOTS→$SLURM_CPUS_PER_TASK - Replace
$JOB_ID→$SLURM_JOB_ID - Remove
#$ -cwdand#$ -j y(Slurm defaults) - Update log path variables:
$TASK_ID→%a,$JOB_ID→%A - Update data paths from WRDS scratch to HPC scratch
Monitoring & Debugging
Check Job Status
squeue -u $USER # all my jobs
squeue -j 12345678 # specific job
squeue -j 12345678 -t R | wc -l # count running tasks
squeue -j 12345678 -t PD # show pending tasks + reasons
squeue -u $USER --format='%.10i %.9P %.12j %.2t %.10M %.4C %R' # detailed
Common Pending Reasons
| Reason | Meaning |
|---|---|
(Priority) |
Lower priority than other queued jobs — will run eventually |
(Resources) |
Not enough free nodes/CPUs — waiting for running jobs to finish |
(QOSMaxCpuPerUserLimit) |
Hit per-user CPU limit on this QOS |
(AssocMaxJobsLimit) |
Hit max concurrent jobs for this account |
Job Accounting (after completion)
sacct -j 12345678 --format=JobID,State,ExitCode,Elapsed,MaxRSS,NCPUS
sacct -j 12345678 -a --format=JobID,State,ExitCode # all array tasks
Log Files
Output goes to --output path. With %A_%a pattern:
logs/est-12345678_1.log— job 12345678, array task 1- Check for errors:
grep -rl 'Error\|Traceback' logs/est-12345678_*.log
Resource Billing
UVA HPC bills in Service Units (SUs), which are weighted CPU-core-hours:
SU = (CPU_cores × 4.6369 + Memory_GB × 0.2842) × hours
Cost Examples (standard partition)
| Config | SU/hour | 176 tasks × 3 hrs |
|---|---|---|
| 1 CPU, 4GB | ~5.8 | ~3,062 |
| 8 CPU, 32G | ~46.2 | ~24,404 |
| 40 CPU, 160G | ~231 | ~121,968 |
With 10M SUs allocated, even aggressive usage (8 CPU × 176 tasks × 3 hrs = ~24K SUs) is negligible (<0.25% of allocation).
Check Balance
allocations # show allocation balance
allocations -a myallocation # specific allocation
WRDS Data Access
WRDS PostgreSQL is accessible from HPC compute nodes. Use polars + connectorx for fast data pipelines that replace SAS entirely.
Connection
- Host:
wrds-pgdata.wharton.upenn.edu:9737 - Credentials:
~/.pgpass(chmod 600) - User:
edwin_hu(UVA account)
Quick Start
from wrds_conn import read_wrds
# WRDS SQL → polars DataFrame in one line
df = read_wrds("SELECT * FROM crsp.msf WHERE date >= '2020-01-01'")
# Write to Parquet for reuse
df.write_parquet("/scratch/vwh7mb/data/crsp_msf.parquet")
wrds_conn.py (see examples/wrds_conn.py) parses .pgpass and builds a connectorx-compatible URI — connectorx doesn't read .pgpass natively.
Pipeline: SQL → polars → Parquet (replaces SAS)
Old: WRDS SAS → .sas7bdat (7GB) → Python HDF5 conversion → .h5 (390MB)
New: WRDS PostgreSQL → polars/connectorx → .parquet
No SAS license needed. Single step. Portable output.
See references/wrds-polars-pipeline.md for full examples (joins, partitioned output, Slurm submission for large queries).