name: hpc-numerics description: "Practitioner knowledge base for the numerical and algorithmic theory of high-performance scientific computing — the science beneath the parallel-programming mechanics. Use when reasoning about numerical correctness, algorithm design, or performance modeling: floating-point arithmetic and round-off error (machine epsilon, catastrophic cancellation, non-associativity, Kahan summation); conditioning vs stability (condition number, backward stability); ODE/PDE discretization (finite differences, stencils, explicit vs implicit Euler, stiffness, CFL condition, method of lines); numerical linear algebra (LU factorization, pivoting, sparse matrices, fill-in, reordering); iterative and Krylov solvers (Jacobi/Gauss-Seidel, CG, GMRES, preconditioning, multigrid); performance programming (the memory wall, cache blocking/tiling, the roofline model, arithmetic intensity); high-performance linear algebra (BLAS levels, gemm, block algorithms); combinatorial algorithms (parallel sorting networks, graph algorithms as sparse linear algebra, graph coloring); and N-body (cutoffs, cell lists, Barnes-Hut, FMM) and Monte Carlo methods (1/sqrt(N) error, variance reduction). Covers the algorithmic theory and error/stability/performance analysis — not the MPI/OpenMP/CUDA implementation mechanics." allowed-tools: - Read - Grep argument-hint: [topic, method (CG/multigrid/FMM), or chapter (e.g. ch04)]
HPC Numerics — The Science of Scientific Computing
Scope: floating-point & error analysis · conditioning & stability · ODE/PDE discretization · numerical linear algebra · iterative/Krylov solvers · performance modeling · BLAS/block algorithms · combinatorial & graph algorithms · N-body & Monte Carlo | Chapters: 12 | Generated: 2026-06-09
How to Use This Skill
- Without arguments — load the core diagnostic rules below.
- With a topic — ask about
cancellation,conditioning,stiffness,CFL,fill-in,preconditioning,roofline,BLAS levels; I find and read the relevant chapter. - With a method — ask about
conjugate gradient,multigrid,Barnes-Hut; I load that chapter. - With a chapter — ask for
ch04; I load that file.
When you ask about something not in the Core section, I read the relevant chapter (and cheatsheet.md / patterns.md / glossary.md).
Core Diagnostic Framework
Scientific computing = three branches (Ch 1)
Modeling × numerical mathematics × computer architecture. A wrong/slow result is a failure in one of them — diagnose which. Everything funnels into numerical linear algebra; computation in finite precision makes error analysis fundamental.
Diagnose a bad numerical result: problem or algorithm? (Ch 3, 4)
- Conditioning (problem): κ large → ill-conditioned, no algorithm helps → reformulate/precondition.
output error ≤ κ × input error. - Stability (algorithm): round-off grows over steps → unstable → switch algorithm. Backward-stable = exact answer to a slightly perturbed problem.
- Floating point: never test equality;
(a+b)+c ≠ a+(b+c)(reassociation/parallel reductions break reproducibility); hunt catastrophic cancellation in subtractions of near-equal values and rewrite; use Kahan summation for long disparate sums.
Time-stepping (Ch 5, 6)
Explicit (cheap, conditionally stable, Δt < 2/λ or CFL-limited) vs implicit (solve per step, unconditionally stable). Stiffness decides — separated timescales force implicit. PDEs discretize via stencils → sparse linear systems (method of lines).
Linear solvers (Ch 7, 8)
Small/moderate dense or many-RHS → direct LU (always pivot). Large sparse → iterative Krylov (CG for SPD, GMRES for general; matvec-only, no fill-in). Convergence ∝ √κ — so the preconditioner dominates (Jacobi → ILU → multigrid, optimal for elliptic PDEs). Sparse direct → watch fill-in, reorder to cut it.
Performance (Ch 2, 9, 10)
The memory wall limits most code (memory-bound). Engineer locality (spatial: unit stride/SoA; temporal: blocking/tiling). The roofline (arithmetic intensity = FLOPs/byte) triages memory- vs compute-bound. Dense LA → cast as BLAS-3 (gemm, near-peak); matrix-vector and sparse matvec are memory-bound by nature. Never hand-code gemm/LU — call LAPACK.
Beyond linear algebra (Ch 11, 12)
Best parallel algorithm ≠ best sequential parallelized (sorting networks). Graphs = adjacency matrices (BFS = sparse matvec). N-body: never naive O(N²) → cutoffs/cell-lists/Barnes-Hut/FMM. Monte Carlo: 1/√N error, dimension-independent, for high-D integration.
Chapter Index
| # | Title | Key Topics |
|---|---|---|
| ch01 | Foundations | the three branches, continuous→discrete→linear algebra |
| ch02 | Architecture & Memory | von Neumann, memory wall, cache, locality, pipelining |
| ch03 | Floating-Point Arithmetic | ε_mach, cancellation, non-associativity, Kahan |
| ch04 | Conditioning & Stability | κ, backward stability, problem vs algorithm |
| ch05 | ODEs & Time-Stepping | explicit/implicit Euler, stability, stiffness |
| ch06 | PDEs & Discretization | stencils, 5-point star, sparse systems, CFL |
| ch07 | Numerical Linear Algebra | LU, pivoting, sparse, fill-in, reordering |
| ch08 | Iterative & Krylov Solvers | Jacobi/GS, CG, GMRES, preconditioning, multigrid |
| ch09 | Performance & Roofline | cache blocking, tiling, arithmetic intensity, roofline |
| ch10 | HP Linear Algebra | BLAS levels, gemm, block algorithms |
| ch11 | Combinatorial & Graph | sorting networks, graphs as sparse LA, coloring |
| ch12 | N-Body & Monte Carlo | cutoffs, Barnes-Hut, FMM, 1/√N sampling |
Topic Index
- arithmetic intensity / roofline → ch09
- BLAS levels / gemm / block algorithms → ch10
- Barnes-Hut / FMM / N-body → ch12
- cache blocking / tiling / locality → ch02, ch09
- catastrophic cancellation → ch03
- CFL condition → ch06
- condition number / conditioning → ch04
- Conjugate Gradient / GMRES / Krylov → ch08
- explicit vs implicit / stiffness → ch05
- fill-in / reordering → ch07
- finite difference / stencils → ch06
- floating-point / machine epsilon / Kahan → ch03
- graph algorithms / coloring → ch11
- LU factorization / pivoting → ch07
- memory wall / von Neumann → ch02
- method of lines → ch05, ch06
- Monte Carlo / variance reduction → ch12
- multigrid → ch08
- non-associativity / reproducibility → ch03, ch09
- preconditioning → ch08
- sorting (networks/parallel) → ch11
- sparse matrices → ch06, ch07, ch08
- stability (algorithm/numerical) → ch04, ch05
- truncation error / discretization → ch05, ch06
Supporting Files
- glossary.md — every key term with its defining chapter
- patterns.md — concrete techniques (diagnose problem-vs-algorithm, avoid cancellation, preconditioning, cache blocking, cast-as-BLAS-3, roofline triage, beat O(N²))
- cheatsheet.md — decision rules: solver picker, time-stepping picker, floating-point rules, BLAS levels, roofline triage
Scope & Limits
Covers the numerical and algorithmic theory of HPC — error analysis, stability, discretization, linear-algebra algorithms, performance modeling, and the algorithm-design principles behind scientific computing. It is the "science" layer beneath the parallel-programming mechanics. For the implementation tooling, see the sibling skills: gpu-multithreading and cpp-hpc (parallel programming models, MPI/OpenMP/CUDA/Kokkos), python-hpc (Python performance), and the spec skills mpi-5.0 / openmp-6.0 / cuda-programming. For exact numerical-library APIs (BLAS/LAPACK/PETSc) consult their documentation; this skill explains the algorithms they implement.