name: lrz-remote description: Manage R-based HPC workflows on the LRZ Linux Cluster (CoolMUC-4) via SSH from the user's local machine. Use this skill whenever the user wants to run R scripts on the LRZ cluster, submit SLURM jobs, install R packages on LRZ, monitor running jobs, fetch results, or set up their LRZ compute environment. Also trigger when the user mentions CoolMUC, LRZ HPC, parallelized R, mclapply on a cluster, SLURM + R, or remote HPC job management. This skill assumes Claude Code is running locally and reaches LRZ through a pre-configured SSH multiplexed connection.
LRZ Remote: R on the LRZ Linux Cluster (CoolMUC-4)
How This Works
You (Claude Code) are running on the user's laptop. The user has an SSH alias lrz configured with ControlMaster multiplexing — meaning a single authenticated session stays open, and all your ssh lrz "..." commands piggyback on it with no password prompts.
You never run computation on the login node. Everything goes through SLURM batch jobs. The login node is only for file management, job submission, queue inspection, and small setup tasks.
Prerequisites — Verify Before Doing Anything
Before your first remote command, verify the connection is alive:
ssh lrz "hostname"
If this hangs or errors, tell the user:
"Your SSH tunnel to LRZ doesn't seem to be active. Please open a terminal and run
ssh lrzto authenticate (you'll need your password + 2FA). Leave that session open, then come back here."
User Configuration
The user MUST provide the following before you can do real work. If any are missing, ask for them.
| Setting | Example | How to discover |
|---|---|---|
| LRZ username | rs86ghi2 |
Given |
| SSH alias | lrz |
Check ~/.ssh/config |
| Home directory | /dss/dsshome1/lxc04/rs86ghi2 |
ssh lrz "echo \$HOME" |
| Work/scratch directory | /dss/dssfs04/... or similar |
ssh lrz "echo \$SCRATCH_DSS \$WORK" |
| Project account (if needed) | pn12ab |
ssh lrz "sacctmgr show assoc user=\$USER format=Account -n" |
| Email for SLURM notifications | bla.blub@stat.uni-muenchen.de |
Ask user |
Store these in a local project file (e.g., .lrz_config) so you don't have to re-ask.
Local ↔ Remote Directory Convention
Local project directories and their LRZ counterparts share the same folder name and are typically clones of the same git repository. Example:
- Local:
~/myproject/ - Remote:
~/myproject/(same name, under the LRZ home)
Prefer git pull over scp to sync script changes when both sides are git clones — it's cleaner and keeps history intact. Use scp only for data files or when the LRZ copy isn't a git repo.
# Sync R script changes via git (preferred)
# local: git push
# then: ssh lrz "cd ~/myproject && git pull"
# Fallback: scp a specific file
scp local_script.R lrz:~/myproject/scripts/
The R Environment on LRZ
Loading R
R is available through the module system. Always load slurm_setup first in batch scripts:
module load slurm_setup
module load r/{{VERSION}}-gcc13-mkl # e.g. r/4.3.3-gcc13-mkl
To discover available R versions:
ssh lrz "module av r 2>&1 | grep -Ei '^\s*r/[0-9]'"
Modules follow the pattern r/<version>-<compiler>-mkl, e.g., r/4.3.3-gcc13-mkl. The MKL-linked versions give better linear algebra performance. Always use the explicit versioned name (e.g., module load r/4.3.3-gcc13-mkl) rather than module load r. The -gcc13-mkl variant already bundles gcc — no separate gcc load/unload needed.
Installing R Packages
Package installation is lightweight — run it on the login node via SSH. Use pak::pkg_install(), not install.packages(): pak handles dependency version conflicts correctly when older package versions are already loaded in the R session; install.packages() can silently install to the wrong library or fail with namespace conflicts.
Step 1: Discover the active user library — do this before creating any directories or editing .Rprofile. The user may already have a working setup, possibly with a versioned library path from an older R installation.
# Check for existing .Rprofile and actual .libPaths()
ssh lrz "cat ~/.Rprofile 2>/dev/null || echo 'No .Rprofile'"
ssh lrz 'module load slurm_setup && module load r/<VERSION>-gcc13-mkl && Rscript -e ".libPaths()"'
The first writable path in .libPaths() is where packages will be installed. Use that path explicitly with -lib. Do not overwrite an existing .Rprofile — append to it only if the user library path is missing.
Step 2: Install packages with pak
ssh lrz 'module load slurm_setup && module load r/<VERSION>-gcc13-mkl && \
Rscript -e "pak::pkg_install(c(\"data.table\", \"future\", \"future.apply\"), \
lib=\"~/R/x86_64-pc-linux-gnu-library/<MAJOR.MINOR>\", ask=FALSE)"'
Replace <VERSION> with the full module version (e.g., 4.3.3-gcc13-mkl) and <MAJOR.MINOR> with the R major.minor version (e.g., 4.3). If pak is not yet installed, install it first with install.packages("pak").
Watch out: If .Rprofile hardcodes a versioned library directory (e.g., 4.4/) from a previous R installation that's no longer the active version, that directory takes precedence in .libPaths(). The right fix is to make .Rprofile version-dynamic (see troubleshooting table); as an immediate workaround, install packages to whichever path is first in .libPaths().
For packages with system dependencies (like sf, terra), you may need additional modules:
ssh lrz "module av 2>&1 | grep -i <depname>"
Submitting SLURM Jobs
Choosing a Partition
| Partition | Use case | Nodes shared? | Max cores/job | Max time |
|---|---|---|---|---|
serial_std |
Small R jobs (1-few cores) | Yes (shared) | Limited | 48h |
serial_long |
Long-running small jobs | Yes (shared) | Limited | 96h+ |
cm4_tiny |
Multi-core, single-node | Yes (shared) | up to full node | 24-48h |
cm4_std |
Full-node exclusive | No (exclusive) | full node | 48h |
cm4_inter |
Interactive testing | Yes (shared) | varies | short |
Default recommendation for parallelized R: Use serial_std for jobs needing ≤16 cores, cm4_tiny for single-node multi-core jobs needing more, and cm4_std only when you truly need a whole node.
SLURM Job Script Template for R
Every job script you generate should follow this template. Adapt the resource requests to the task.
#!/bin/bash
#SBATCH -J {{JOB_NAME}}
#SBATCH -o ./logs/%x.%j.out
#SBATCH -e ./logs/%x.%j.err
#SBATCH -D {{WORKING_DIR}}
#SBATCH --get-user-env
#SBATCH --clusters=cm4
#SBATCH --partition={{PARTITION}}
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task={{NCPUS}}
#SBATCH --mem={{MEMORY_MB}}mb
#SBATCH --time={{HH:MM:SS}}
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user={{EMAIL}}
#SBATCH --export=NONE
# Load modules
module load slurm_setup
module load r/{{R_VERSION}}-gcc13-mkl # e.g. r/4.3.3-gcc13-mkl
# Run the R script
Rscript {{SCRIPT_PATH}}
Critical rules for SLURM scripts:
- ALWAYS include
--export=NONE(LRZ policy — environment should be set up inside the script via modules) - ALWAYS include
--get-user-env - ALWAYS create the logs directory before submitting:
ssh lrz "mkdir -p {{WORKING_DIR}}/logs" - ALWAYS specify
--clusters=cm4for CoolMUC-4 - Use
--clusters=serialfor the serial partitions - Set
--memexplicitly on shared partitions (memory scales with cores by default, which may not be enough) - LRZ requires
--mail-userto be set
The Workflow: Upload → Submit → Monitor → Download
Step 1: Upload files
# Upload R script and any data files
scp local_script.R lrz:~/project/scripts/
scp data.csv lrz:~/project/data/
Step 2: Create and upload job script
scp job.slurm lrz:~/project/
Step 3: Submit
ssh lrz "cd ~/project && sbatch job.slurm"
# Returns: "Submitted batch job XXXXXX on cluster cm4"
# Save the job ID!
To resubmit with a different R script (e.g. after fixing an error) without editing or reuploading the .slurm file:
ssh lrz "cd ~/project && sed 's/old_script/new_script/' job.slurm | sbatch"
Step 4: Monitor
# Check queue status
ssh lrz "squeue --clusters=serial -j {{JOB_ID}} 2>/dev/null"
Important: if the job is no longer in squeue, it has already completed (or failed) — this is normal for short jobs. Do not interpret absence from the queue as an error. Immediately check the logs:
# stdout — R output and progress messages
ssh lrz "cat ~/project/logs/{{JOB_NAME}}.{{JOB_ID}}.out"
# stderr — module load messages appear here even on success; look for R errors/warnings
ssh lrz "grep -v 'Loading\|requirement' ~/project/logs/{{JOB_NAME}}.{{JOB_ID}}.err || true"
# Job accounting (confirms exit code and resource use)
ssh lrz "sacct --clusters=serial -j {{JOB_ID}} --format=JobID,State,ExitCode,Elapsed,MaxRSS"
The .err log always contains module load lines (e.g. Loading r/4.3.3-gcc13-mkl) — these are not errors. A successful job has ExitCode 0:0 in sacct and the expected output in .out.
Step 5: Download results
scp lrz:~/project/results/* ./results/
# Or for many files:
scp -r lrz:~/project/results/ ./local_results/
Parallelized R Patterns
Pattern 1: Single-node multicore (most common)
Use parallel::mclapply or future.apply::future_lapply. This is the simplest and usually sufficient.
R script template:
library(parallel)
# Detect cores allocated by SLURM (not all cores on the node!)
ncores <- as.integer(Sys.getenv("SLURM_CPUS_PER_TASK", unset = "1"))
cat("Using", ncores, "cores\n")
# Your computation
results <- mclapply(1:100, function(i) {
# expensive computation here
Sys.sleep(0.1)
return(i^2)
}, mc.cores = ncores)
# Save results
saveRDS(results, "results/output.rds")
Or with the future framework:
library(future)
library(future.apply)
ncores <- as.integer(Sys.getenv("SLURM_CPUS_PER_TASK", unset = "1"))
plan(multicore, workers = ncores)
results <- future_lapply(1:100, function(i) {
# expensive computation here
return(i^2)
})
saveRDS(results, "results/output.rds")
Pattern 2: SLURM Job Arrays (embarrassingly parallel)
When you have many independent tasks (e.g., run the same analysis on 50 datasets), use SLURM job arrays instead of R-level parallelism. This is more efficient and robust.
Job script:
#!/bin/bash
#SBATCH -J array_job
#SBATCH -o ./logs/%x.%A_%a.out
#SBATCH -e ./logs/%x.%A_%a.err
#SBATCH -D /path/to/workdir
#SBATCH --get-user-env
#SBATCH --clusters=serial
#SBATCH --partition=serial_std
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4000mb
#SBATCH --time=02:00:00
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=user@example.com
#SBATCH --export=NONE
#SBATCH --array=1-50
module load slurm_setup
module load r/{{R_VERSION}}-gcc13-mkl # e.g. r/4.3.3-gcc13-mkl
Rscript run_task.R $SLURM_ARRAY_TASK_ID
R script (run_task.R):
args <- commandArgs(trailingOnly = TRUE)
task_id <- as.integer(args[1])
cat("Running task", task_id, "\n")
# Load the task-specific data or parameters
# ... do work ...
saveRDS(result, paste0("results/result_", task_id, ".rds"))
Pattern 3: Bundled parallel tasks (many short jobs)
LRZ strongly discourages submitting thousands of small jobs. Instead, bundle them:
#!/bin/bash
#SBATCH --clusters=cm4
#SBATCH --partition=cm4_tiny
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --cpus-per-task=1
#SBATCH --time=04:00:00
# ... other directives ...
module load slurm_setup
module load r
for i in $(seq 1 8); do
srun -n1 -c1 --exclusive Rscript task.R $i &
done
wait
Monitoring & Troubleshooting
Common SLURM Commands
# Your jobs across all clusters
ssh lrz "squeue --clusters=all -u \$USER"
# Estimated start time for pending job
ssh lrz "squeue --clusters=cm4 -j {{JOB_ID}} --start"
# Job history (completed jobs)
ssh lrz "sacct --clusters=cm4 -j {{JOB_ID}} --format=JobID,JobName,State,ExitCode,Elapsed,MaxRSS"
# Cancel a job
ssh lrz "scancel --clusters=cm4 {{JOB_ID}}"
# Cancel all your jobs
ssh lrz "scancel --clusters=cm4 -u \$USER"
# Check disk quota
ssh lrz "quota -s"
Common Issues
| Symptom | Likely cause | Fix |
|---|---|---|
| Job pending forever | Wrong partition or too many resources requested | Check squeue --start, reduce resources |
module: command not found |
Missing --get-user-env or --export=NONE mismatch |
Ensure both flags are in the job script |
| R package not found | User library not on .libPaths() |
Check .Rprofile, add .libPaths() call in script |
| Wrong package version loaded | .Rprofile hardcodes a stale versioned lib dir (e.g., 4.4/ when only R 4.3 is available) |
Fix .Rprofile to use a version-dynamic .libPaths() call (see Installing R Packages section) |
| Out of memory (OOM killed) | Didn't request enough --mem |
Increase --mem, check sacct --format=MaxRSS |
| Permission denied on output | Output directory doesn't exist | mkdir -p before submitting |
scp fails |
Outgoing SSH blocked from LRZ | Always scp FROM your laptop, never from LRZ |
Safety Rules
- NEVER run heavy computation on the login node. File management,
pak::pkg_install(),.libPaths()checks, and shortRscript -ecalls are fine. Simulations, model fitting, and anything CPU/memory-intensive must go through SLURM. - NEVER submit thousands of tiny jobs. Bundle them or use job arrays.
- NEVER use empty SSH key passphrases. LRZ will ban the account.
- ALWAYS use
--export=NONEin SLURM scripts. - ALWAYS create output directories before submitting jobs.
- ALWAYS specify
--clusters=in SLURM commands — CoolMUC-4 usescm4orserial.