worked-example-walkthrough

name: worked-example-walkthrough description: Produces step-by-step computational walkthroughs of vector and matrix operations as a sequence of numbered "frames", showing the explicit state at each step. The text-equivalent of a 3Blue1Brown animation — each frame shows what changed and why, so the learner can re-trace the operation by hand. Use when the learner needs to see a computation unfold (eigenvalue computation, attention with 3 tokens, gradient descent step, SVD on a 2×2, layer norm on a 3-vector, softmax of a small input), when an explanation has been given but the learner needs to ground it in a worked example, or when introducing an operation that's intimidating in symbol form but trivial in pencil-and-paper form.

A walkthrough is a sequence of numbered frames showing the state of a computation at each step. Each frame answers three questions: what's the current state?, what operation produced it?, and what should the learner notice?

This is the closest pure text gets to a 3Blue1Brown animation. The learner can re-trace the operation by hand, and the explicit intermediate states catch confusions that a final-result-only treatment hides.

Quick example (Eigenvalues of a 2×2 matrix):

Compute the eigenvalues of A = [[4, 1], [2, 3]].

Frame 1. State: A = [[4, 1], [2, 3]]. We want λ such that det(A − λI) = 0.

Frame 2. Form A − λI:
A - λI = [[4-λ,  1 ],
         [ 2,  3-λ]]
Frame 3. Compute the determinant: det = (4 − λ)(3 − λ) − 1·2 = (12 − 7λ + λ²) − 2 = λ² − 7λ + 10

Frame 4. Solve λ² − 7λ + 10 = 0: Factor: (λ − 5)(λ − 2) = 0. So λ = 5 or λ = 2.

Frame 5. Eigenvalues: 5 and 2. Quick sanity check: trace(A) = 4 + 3 = 7 = sum of eigenvalues ✓; det(A) = 12 − 2 = 10 = product of eigenvalues ✓.

What to notice: The characteristic polynomial is just det(A − λI) expanded; finding eigenvalues is just root-finding. No magic.

Five frames, ~80 words. The learner can redo this on paper in under a minute.

Workflow

Copy this checklist and track your progress:

Walkthrough Progress:
- [ ] Step 1: Pick the smallest possible example that exercises the concept
- [ ] Step 2: Write the goal — what we'll compute and why
- [ ] Step 3: Plan the frames (3-7 typically)
- [ ] Step 4: Write each frame: state, operation, notice
- [ ] Step 5: End with a sanity check or invitation to verify
- [ ] Step 6: Optional — invite the learner to redo with a variant

Step 1: Pick the smallest possible example that exercises the concept

The example must be small. 2×2 matrices, 3-vector inputs, 3-token sequences. Bigger examples obscure the structure with arithmetic.

The example must also exercise the concept — not just trivially demonstrate it. A diagonal matrix has trivial eigenvectors; pick a 2×2 that's not diagonal. A scalar attention is trivial; pick at least 3 tokens.

For a catalog of recommended example sizes per concept, see resources/examples.md.

Step 2: Write the goal — what we'll compute and why

Single sentence. "Compute the eigenvalues of A = [[4, 1], [2, 3]]." or "Apply attention to a 3-token sequence with random Q, K, V vectors." The goal frames everything that follows.

Step 3: Plan the frames (3-7 typically)

Each frame is one operation. Sketch the frames before writing them out — this catches the "skipped step" problem before it hits the page.

A frame budget that works for most operations:

3 frames: trivial chain (one transformation)
5 frames: typical single-concept walkthrough (eigenvalues, softmax, single SGD step)
7 frames: compound walkthrough (attention with all sub-steps shown)

If you find yourself wanting >7 frames, either the example is too big (shrink) or the operation has multiple sub-operations that each deserve their own walkthrough.

Step 4: Write each frame: state, operation, notice

Each frame has three parts (see Frame Anatomy):

State: the current values, shown explicitly.
Operation: what we just did to get here (or what we're about to do).
Notice: one sentence pointing out something the learner might miss.

The "notice" is what distinguishes a walkthrough from a worked solution. A worked solution shows the steps; a walkthrough also says what to look at.

Step 5: End with a sanity check or invitation to verify

Every walkthrough ends with one of:

A sanity check: "trace ✓, determinant ✓".
A consistency check: "the result has the expected shape".
An invitation: "try the same operation on A = [[1, 2], [3, 4]]; the answer should be λ ≈ −0.37 and λ ≈ 5.37."

The sanity check is what tells the learner the result is right. It also doubles as a reusable verification trick they can apply to similar problems.

Step 6 (optional): Invite the learner to redo with a variant

If the learner has time and the example is short, invite them to redo with a small change:

"Same matrix, different starting vector: redo with v = (0, 1) instead of (1, 0)."
"Same attention computation, but with one token's K replaced by zero: predict the change to the output before computing."

Variants check whether the picture transferred, not just the arithmetic.

Frame Anatomy

Each frame has three parts. Keep them visually distinct.

**Frame N.** [Operation: short verb phrase, what we're doing.]
[State: explicit values, in a code block if needed.]
[Notice: one sentence — what to look at.]

State block

Use a code block for matrices, vectors, and equations. Show actual numbers; resist the urge to leave things symbolic. The point is concreteness.

v = (3, 4)
|v| = √(3² + 4²) = √25 = 5

Operation phrase

Short verb phrase: "Compute…", "Apply…", "Substitute…", "Solve…". One line.

Notice sentence

One sentence pointing at the most important feature of this frame.

"The cross terms cancelled — that's because A is symmetric."
"Notice the second eigenvalue is negative — A reflects along that direction."
"The softmax peaked on the first row — token 1's query strongly matched key 1."

If a frame doesn't earn a notice sentence, it might not need to be its own frame. Consider merging.

Choosing the Right Example

The example you choose makes or breaks the walkthrough. Heuristics for choosing:

Use specific small numbers, not symbols

A walkthrough with a, b, c, d is a derivation, not a walkthrough. Pick numbers like 2, 3, 1, −1 — small enough to compute by eye, varied enough to expose pattern.

Avoid the simplest possible case

Identity matrix, zero vector, all-equal scores — these are too trivial; they don't exercise the operation. The walkthrough learner needs to see what happens when the operation is non-trivially active.

Pick examples with verifiable properties

Symmetric matrices have real eigenvalues — easy to spot a bug. Stochastic matrices have a stationary distribution — easy to verify. Pick examples with these checkable properties so Step 5's sanity check is meaningful.

When in doubt, use the smallest non-trivial size

Matrices: 2×2 first; 3×3 only if 2×2 doesn't exercise the concept.
Vectors: 2D or 3D.
Sequences: 3 tokens minimum (so attention is interesting), 4 if you need parity.
Networks: 1-layer, 2-input, 1-output minimum (for backprop demos).

For a recommended example per concept, see resources/examples.md.

Common Patterns

Pattern A: Single computation walkthrough

Used for one-shot operations: eigenvalue compute, single SGD step, single attention forward pass, softmax of a vector. Length: 3-5 frames. Closing: sanity check.

Pattern B: Iterative walkthrough

Used for processes that loop: gradient descent over multiple steps, power iteration finding eigenvectors, diffusion sampling. Length: 5-7 frames showing 2-3 iterations explicitly, then "and so on…". Closing: convergence comment + invitation to predict the limit.

Pattern C: Comparative walkthrough

Used to show the contrast between two operations: matrix-vector mul as row-dot vs as column-combination; layer norm vs batch norm; SGD vs Adam on the same gradient. Length: parallel frames in two columns or two passes. Closing: bridge sentence on what makes them equivalent or different.

For one filled walkthrough per pattern, see resources/examples.md.

Guardrails

Show actual numbers, not symbols. A walkthrough with Av is a derivation; a walkthrough with [5, 11] is a walkthrough. The learner needs to see the values.
Don't skip arithmetic the learner might be uncertain about. "(4−λ)(3−λ) − 2 = λ² − 7λ + 10" is fine; "(4−λ)(3−λ) − 2 = (λ−5)(λ−2)" skips the expansion. Give them the polynomial first, then factor.
Each frame should advance the state. A frame whose state is identical to the previous one (just rephrased) is wasted.
Use only one operation per frame. Two operations in one frame is two frames.
Don't bury the punchline in arithmetic. End with the result clearly stated and the sanity check explicit.
For long computations, consider Bash. If the arithmetic is genuinely long (e.g., a 4×4 eigenvalue problem), use Bash to numpy the intermediate states — but still show the operations as frames. The Bash output is the verification, not the walkthrough.

Quick Reference

Operation	Recommended example	Frame count
Matrix-vector mul	A = [[1, 2], [3, 4]], v = (5, 6)	3
Eigenvalues	A = [[4, 1], [2, 3]]	5
Eigenvector for known λ	Same A, λ = 5	4
SVD	A = [[3, 1], [1, 3]] (symmetric for clean SVD)	6
Softmax	x = (2, 1, -1)	4
Cross-entropy	p = (1, 0, 0), q = (0.7, 0.2, 0.1)	3
Single SGD step	Loss x², start at x = 4, η = 0.5	4
Attention forward	3 tokens, d = 2	7
LayerNorm	x = (1, 5, 9)	5
Backprop on tiny net	y = w₂σ(w₁x), one input/output	6
PCA on tiny dataset	4 points in 2D	6

For full filled-in walkthroughs of each, see resources/examples.md.