hpc-numerics

star 5

Practitioner knowledge base for the numerical and algorithmic theory of high-performance scientific computing — the science beneath the parallel-programming mechanics. Use when reasoning about numerical correctness, algorithm design, or performance modeling: floating-point arithmetic and round-off error (machine epsilon, catastrophic cancellation, non-associativity, Kahan summation); conditioning vs stability (condition number, backward stability); ODE/PDE discretization (finite differences, stencils, explicit vs implicit Euler, stiffness, CFL condition, method of lines); numerical linear algebra (LU factorization, pivoting, sparse matrices, fill-in, reordering); iterative and Krylov solvers (Jacobi/Gauss-Seidel, CG, GMRES, preconditioning, multigrid); performance programming (the memory wall, cache blocking/tiling, the roofline model, arithmetic intensity); high-performance linear algebra (BLAS levels, gemm, block algorithms); combinatorial algorithms (parallel sorting networks, graph algorithms as sparse li

szaghi By szaghi schedule Updated 6/9/2026

name: hpc-numerics description: "Practitioner knowledge base for the numerical and algorithmic theory of high-performance scientific computing — the science beneath the parallel-programming mechanics. Use when reasoning about numerical correctness, algorithm design, or performance modeling: floating-point arithmetic and round-off error (machine epsilon, catastrophic cancellation, non-associativity, Kahan summation); conditioning vs stability (condition number, backward stability); ODE/PDE discretization (finite differences, stencils, explicit vs implicit Euler, stiffness, CFL condition, method of lines); numerical linear algebra (LU factorization, pivoting, sparse matrices, fill-in, reordering); iterative and Krylov solvers (Jacobi/Gauss-Seidel, CG, GMRES, preconditioning, multigrid); performance programming (the memory wall, cache blocking/tiling, the roofline model, arithmetic intensity); high-performance linear algebra (BLAS levels, gemm, block algorithms); combinatorial algorithms (parallel sorting networks, graph algorithms as sparse linear algebra, graph coloring); and N-body (cutoffs, cell lists, Barnes-Hut, FMM) and Monte Carlo methods (1/sqrt(N) error, variance reduction). Covers the algorithmic theory and error/stability/performance analysis — not the MPI/OpenMP/CUDA implementation mechanics." allowed-tools: - Read - Grep argument-hint: [topic, method (CG/multigrid/FMM), or chapter (e.g. ch04)]

HPC Numerics — The Science of Scientific Computing

Scope: floating-point & error analysis · conditioning & stability · ODE/PDE discretization · numerical linear algebra · iterative/Krylov solvers · performance modeling · BLAS/block algorithms · combinatorial & graph algorithms · N-body & Monte Carlo | Chapters: 12 | Generated: 2026-06-09

How to Use This Skill

  • Without arguments — load the core diagnostic rules below.
  • With a topic — ask about cancellation, conditioning, stiffness, CFL, fill-in, preconditioning, roofline, BLAS levels; I find and read the relevant chapter.
  • With a method — ask about conjugate gradient, multigrid, Barnes-Hut; I load that chapter.
  • With a chapter — ask for ch04; I load that file.

When you ask about something not in the Core section, I read the relevant chapter (and cheatsheet.md / patterns.md / glossary.md).

Core Diagnostic Framework

Scientific computing = three branches (Ch 1)

Modeling × numerical mathematics × computer architecture. A wrong/slow result is a failure in one of them — diagnose which. Everything funnels into numerical linear algebra; computation in finite precision makes error analysis fundamental.

Diagnose a bad numerical result: problem or algorithm? (Ch 3, 4)

  • Conditioning (problem): κ large → ill-conditioned, no algorithm helps → reformulate/precondition. output error ≤ κ × input error.
  • Stability (algorithm): round-off grows over steps → unstable → switch algorithm. Backward-stable = exact answer to a slightly perturbed problem.
  • Floating point: never test equality; (a+b)+c ≠ a+(b+c) (reassociation/parallel reductions break reproducibility); hunt catastrophic cancellation in subtractions of near-equal values and rewrite; use Kahan summation for long disparate sums.

Time-stepping (Ch 5, 6)

Explicit (cheap, conditionally stable, Δt < 2/λ or CFL-limited) vs implicit (solve per step, unconditionally stable). Stiffness decides — separated timescales force implicit. PDEs discretize via stencils → sparse linear systems (method of lines).

Linear solvers (Ch 7, 8)

Small/moderate dense or many-RHS → direct LU (always pivot). Large sparse → iterative Krylov (CG for SPD, GMRES for general; matvec-only, no fill-in). Convergence ∝ √κ — so the preconditioner dominates (Jacobi → ILU → multigrid, optimal for elliptic PDEs). Sparse direct → watch fill-in, reorder to cut it.

Performance (Ch 2, 9, 10)

The memory wall limits most code (memory-bound). Engineer locality (spatial: unit stride/SoA; temporal: blocking/tiling). The roofline (arithmetic intensity = FLOPs/byte) triages memory- vs compute-bound. Dense LA → cast as BLAS-3 (gemm, near-peak); matrix-vector and sparse matvec are memory-bound by nature. Never hand-code gemm/LU — call LAPACK.

Beyond linear algebra (Ch 11, 12)

Best parallel algorithm ≠ best sequential parallelized (sorting networks). Graphs = adjacency matrices (BFS = sparse matvec). N-body: never naive O(N²) → cutoffs/cell-lists/Barnes-Hut/FMM. Monte Carlo: 1/√N error, dimension-independent, for high-D integration.


Chapter Index

# Title Key Topics
ch01 Foundations the three branches, continuous→discrete→linear algebra
ch02 Architecture & Memory von Neumann, memory wall, cache, locality, pipelining
ch03 Floating-Point Arithmetic ε_mach, cancellation, non-associativity, Kahan
ch04 Conditioning & Stability κ, backward stability, problem vs algorithm
ch05 ODEs & Time-Stepping explicit/implicit Euler, stability, stiffness
ch06 PDEs & Discretization stencils, 5-point star, sparse systems, CFL
ch07 Numerical Linear Algebra LU, pivoting, sparse, fill-in, reordering
ch08 Iterative & Krylov Solvers Jacobi/GS, CG, GMRES, preconditioning, multigrid
ch09 Performance & Roofline cache blocking, tiling, arithmetic intensity, roofline
ch10 HP Linear Algebra BLAS levels, gemm, block algorithms
ch11 Combinatorial & Graph sorting networks, graphs as sparse LA, coloring
ch12 N-Body & Monte Carlo cutoffs, Barnes-Hut, FMM, 1/√N sampling

Topic Index

  • arithmetic intensity / roofline → ch09
  • BLAS levels / gemm / block algorithms → ch10
  • Barnes-Hut / FMM / N-body → ch12
  • cache blocking / tiling / locality → ch02, ch09
  • catastrophic cancellation → ch03
  • CFL condition → ch06
  • condition number / conditioning → ch04
  • Conjugate Gradient / GMRES / Krylov → ch08
  • explicit vs implicit / stiffness → ch05
  • fill-in / reordering → ch07
  • finite difference / stencils → ch06
  • floating-point / machine epsilon / Kahan → ch03
  • graph algorithms / coloring → ch11
  • LU factorization / pivoting → ch07
  • memory wall / von Neumann → ch02
  • method of lines → ch05, ch06
  • Monte Carlo / variance reduction → ch12
  • multigrid → ch08
  • non-associativity / reproducibility → ch03, ch09
  • preconditioning → ch08
  • sorting (networks/parallel) → ch11
  • sparse matrices → ch06, ch07, ch08
  • stability (algorithm/numerical) → ch04, ch05
  • truncation error / discretization → ch05, ch06

Supporting Files

  • glossary.md — every key term with its defining chapter
  • patterns.md — concrete techniques (diagnose problem-vs-algorithm, avoid cancellation, preconditioning, cache blocking, cast-as-BLAS-3, roofline triage, beat O(N²))
  • cheatsheet.md — decision rules: solver picker, time-stepping picker, floating-point rules, BLAS levels, roofline triage

Scope & Limits

Covers the numerical and algorithmic theory of HPC — error analysis, stability, discretization, linear-algebra algorithms, performance modeling, and the algorithm-design principles behind scientific computing. It is the "science" layer beneath the parallel-programming mechanics. For the implementation tooling, see the sibling skills: gpu-multithreading and cpp-hpc (parallel programming models, MPI/OpenMP/CUDA/Kokkos), python-hpc (Python performance), and the spec skills mpi-5.0 / openmp-6.0 / cuda-programming. For exact numerical-library APIs (BLAS/LAPACK/PETSc) consult their documentation; this skill explains the algorithms they implement.

Install via CLI
npx skills add https://github.com/szaghi/dotfiles --skill hpc-numerics
Repository Details
star Stars 5
call_split Forks 4
navigation Branch main
article Path SKILL.md
More from Creator