worked-example-walkthrough

star 121

Produces step-by-step computational walkthroughs of vector and matrix operations as a sequence of numbered "frames", showing the explicit state at each step. The text-equivalent of a 3Blue1Brown animation — each frame shows what changed and why, so the learner can re-trace the operation by hand. Use when the learner needs to *see* a computation unfold (eigenvalue computation, attention with 3 tokens, gradient descent step, SVD on a 2×2, layer norm on a 3-vector, softmax of a small input), when an explanation has been given but the learner needs to ground it in a worked example, or when introducing an operation that's intimidating in symbol form but trivial in pencil-and-paper form.

lyndonkl By lyndonkl schedule Updated 5/9/2026

name: worked-example-walkthrough description: Produces step-by-step computational walkthroughs of vector and matrix operations as a sequence of numbered "frames", showing the explicit state at each step. The text-equivalent of a 3Blue1Brown animation — each frame shows what changed and why, so the learner can re-trace the operation by hand. Use when the learner needs to see a computation unfold (eigenvalue computation, attention with 3 tokens, gradient descent step, SVD on a 2×2, layer norm on a 3-vector, softmax of a small input), when an explanation has been given but the learner needs to ground it in a worked example, or when introducing an operation that's intimidating in symbol form but trivial in pencil-and-paper form.

Worked Example Walkthrough

Table of Contents

  1. Workflow
  2. Frame Anatomy
  3. Choosing the Right Example
  4. Common Patterns
  5. Guardrails
  6. Quick Reference

A walkthrough is a sequence of numbered frames showing the state of a computation at each step. Each frame answers three questions: what's the current state?, what operation produced it?, and what should the learner notice?

This is the closest pure text gets to a 3Blue1Brown animation. The learner can re-trace the operation by hand, and the explicit intermediate states catch confusions that a final-result-only treatment hides.

Quick example (Eigenvalues of a 2×2 matrix):

Compute the eigenvalues of A = [[4, 1], [2, 3]].

Frame 1. State: A = [[4, 1], [2, 3]]. We want λ such that det(A − λI) = 0.

Frame 2. Form A − λI:

A - λI = [[4-λ,  1 ],
         [ 2,  3-λ]]

Frame 3. Compute the determinant: det = (4 − λ)(3 − λ) − 1·2 = (12 − 7λ + λ²) − 2 = λ² − 7λ + 10

Frame 4. Solve λ² − 7λ + 10 = 0: Factor: (λ − 5)(λ − 2) = 0. So λ = 5 or λ = 2.

Frame 5. Eigenvalues: 5 and 2. Quick sanity check: trace(A) = 4 + 3 = 7 = sum of eigenvalues ✓; det(A) = 12 − 2 = 10 = product of eigenvalues ✓.

What to notice: The characteristic polynomial is just det(A − λI) expanded; finding eigenvalues is just root-finding. No magic.

Five frames, ~80 words. The learner can redo this on paper in under a minute.

Workflow

Copy this checklist and track your progress:

Walkthrough Progress:
- [ ] Step 1: Pick the smallest possible example that exercises the concept
- [ ] Step 2: Write the goal — what we'll compute and why
- [ ] Step 3: Plan the frames (3-7 typically)
- [ ] Step 4: Write each frame: state, operation, notice
- [ ] Step 5: End with a sanity check or invitation to verify
- [ ] Step 6: Optional — invite the learner to redo with a variant

Step 1: Pick the smallest possible example that exercises the concept

The example must be small. 2×2 matrices, 3-vector inputs, 3-token sequences. Bigger examples obscure the structure with arithmetic.

The example must also exercise the concept — not just trivially demonstrate it. A diagonal matrix has trivial eigenvectors; pick a 2×2 that's not diagonal. A scalar attention is trivial; pick at least 3 tokens.

For a catalog of recommended example sizes per concept, see resources/examples.md.

Step 2: Write the goal — what we'll compute and why

Single sentence. "Compute the eigenvalues of A = [[4, 1], [2, 3]]." or "Apply attention to a 3-token sequence with random Q, K, V vectors." The goal frames everything that follows.

Step 3: Plan the frames (3-7 typically)

Each frame is one operation. Sketch the frames before writing them out — this catches the "skipped step" problem before it hits the page.

A frame budget that works for most operations:

  • 3 frames: trivial chain (one transformation)
  • 5 frames: typical single-concept walkthrough (eigenvalues, softmax, single SGD step)
  • 7 frames: compound walkthrough (attention with all sub-steps shown)

If you find yourself wanting >7 frames, either the example is too big (shrink) or the operation has multiple sub-operations that each deserve their own walkthrough.

Step 4: Write each frame: state, operation, notice

Each frame has three parts (see Frame Anatomy):

  • State: the current values, shown explicitly.
  • Operation: what we just did to get here (or what we're about to do).
  • Notice: one sentence pointing out something the learner might miss.

The "notice" is what distinguishes a walkthrough from a worked solution. A worked solution shows the steps; a walkthrough also says what to look at.

Step 5: End with a sanity check or invitation to verify

Every walkthrough ends with one of:

  • A sanity check: "trace ✓, determinant ✓".
  • A consistency check: "the result has the expected shape".
  • An invitation: "try the same operation on A = [[1, 2], [3, 4]]; the answer should be λ ≈ −0.37 and λ ≈ 5.37."

The sanity check is what tells the learner the result is right. It also doubles as a reusable verification trick they can apply to similar problems.

Step 6 (optional): Invite the learner to redo with a variant

If the learner has time and the example is short, invite them to redo with a small change:

  • "Same matrix, different starting vector: redo with v = (0, 1) instead of (1, 0)."
  • "Same attention computation, but with one token's K replaced by zero: predict the change to the output before computing."

Variants check whether the picture transferred, not just the arithmetic.

Frame Anatomy

Each frame has three parts. Keep them visually distinct.

**Frame N.** [Operation: short verb phrase, what we're doing.]
[State: explicit values, in a code block if needed.]
[Notice: one sentence — what to look at.]

State block

Use a code block for matrices, vectors, and equations. Show actual numbers; resist the urge to leave things symbolic. The point is concreteness.

v = (3, 4)
|v| = √(3² + 4²) = √25 = 5

Operation phrase

Short verb phrase: "Compute…", "Apply…", "Substitute…", "Solve…". One line.

Notice sentence

One sentence pointing at the most important feature of this frame.

  • "The cross terms cancelled — that's because A is symmetric."
  • "Notice the second eigenvalue is negative — A reflects along that direction."
  • "The softmax peaked on the first row — token 1's query strongly matched key 1."

If a frame doesn't earn a notice sentence, it might not need to be its own frame. Consider merging.

Choosing the Right Example

The example you choose makes or breaks the walkthrough. Heuristics for choosing:

Use specific small numbers, not symbols

A walkthrough with a, b, c, d is a derivation, not a walkthrough. Pick numbers like 2, 3, 1, −1 — small enough to compute by eye, varied enough to expose pattern.

Avoid the simplest possible case

Identity matrix, zero vector, all-equal scores — these are too trivial; they don't exercise the operation. The walkthrough learner needs to see what happens when the operation is non-trivially active.

Pick examples with verifiable properties

Symmetric matrices have real eigenvalues — easy to spot a bug. Stochastic matrices have a stationary distribution — easy to verify. Pick examples with these checkable properties so Step 5's sanity check is meaningful.

When in doubt, use the smallest non-trivial size

  • Matrices: 2×2 first; 3×3 only if 2×2 doesn't exercise the concept.
  • Vectors: 2D or 3D.
  • Sequences: 3 tokens minimum (so attention is interesting), 4 if you need parity.
  • Networks: 1-layer, 2-input, 1-output minimum (for backprop demos).

For a recommended example per concept, see resources/examples.md.

Common Patterns

Pattern A: Single computation walkthrough

Used for one-shot operations: eigenvalue compute, single SGD step, single attention forward pass, softmax of a vector. Length: 3-5 frames. Closing: sanity check.

Pattern B: Iterative walkthrough

Used for processes that loop: gradient descent over multiple steps, power iteration finding eigenvectors, diffusion sampling. Length: 5-7 frames showing 2-3 iterations explicitly, then "and so on…". Closing: convergence comment + invitation to predict the limit.

Pattern C: Comparative walkthrough

Used to show the contrast between two operations: matrix-vector mul as row-dot vs as column-combination; layer norm vs batch norm; SGD vs Adam on the same gradient. Length: parallel frames in two columns or two passes. Closing: bridge sentence on what makes them equivalent or different.

For one filled walkthrough per pattern, see resources/examples.md.

Guardrails

  • Show actual numbers, not symbols. A walkthrough with Av is a derivation; a walkthrough with [5, 11] is a walkthrough. The learner needs to see the values.
  • Don't skip arithmetic the learner might be uncertain about. "(4−λ)(3−λ) − 2 = λ² − 7λ + 10" is fine; "(4−λ)(3−λ) − 2 = (λ−5)(λ−2)" skips the expansion. Give them the polynomial first, then factor.
  • Each frame should advance the state. A frame whose state is identical to the previous one (just rephrased) is wasted.
  • Use only one operation per frame. Two operations in one frame is two frames.
  • Don't bury the punchline in arithmetic. End with the result clearly stated and the sanity check explicit.
  • For long computations, consider Bash. If the arithmetic is genuinely long (e.g., a 4×4 eigenvalue problem), use Bash to numpy the intermediate states — but still show the operations as frames. The Bash output is the verification, not the walkthrough.

Quick Reference

Operation Recommended example Frame count
Matrix-vector mul A = [[1, 2], [3, 4]], v = (5, 6) 3
Eigenvalues A = [[4, 1], [2, 3]] 5
Eigenvector for known λ Same A, λ = 5 4
SVD A = [[3, 1], [1, 3]] (symmetric for clean SVD) 6
Softmax x = (2, 1, -1) 4
Cross-entropy p = (1, 0, 0), q = (0.7, 0.2, 0.1) 3
Single SGD step Loss x², start at x = 4, η = 0.5 4
Attention forward 3 tokens, d = 2 7
LayerNorm x = (1, 5, 9) 5
Backprop on tiny net y = w₂σ(w₁x), one input/output 6
PCA on tiny dataset 4 points in 2D 6

For full filled-in walkthroughs of each, see resources/examples.md.

Install via CLI
npx skills add https://github.com/lyndonkl/claude --skill worked-example-walkthrough
Repository Details
star Stars 121
call_split Forks 16
navigation Branch main
article Path SKILL.md
More from Creator