algorithms-and-data-structures - SKILL.md Agent Skill

name: algorithms-and-data-structures description: Choose the right algorithm and data structure for a problem, analyze its complexity, and recognize classic problem patterns. Use when picking a structure (array vs hash vs tree vs graph), analyzing whether code will scale, designing a non-trivial algorithm, or recognizing that a problem reduces to a known one. Distills CLRS, Sedgewick & Wayne (Algorithms 4e), Skiena (Algorithm Design Manual), and Knuth TAoCP Vol 3 (Sorting & Searching).

Algorithms & Data Structures

The two questions to ask of any algorithm: does it terminate, and at what cost? The two questions to ask of any data structure: what does it make fast, and what does it make slow?

Choosing a data structure (decision rules)

If you need...	Use	Cost
Indexed access by position, growable	dynamic array / `Vec` / `ArrayList`	O(1) access, O(1)* push, O(n) insert/delete mid
Lookup by key, no order	hash table	O(1)* avg; O(n) worst, no ordering
Lookup by key, sorted iteration, range queries	balanced BST (red-black, AVL) / sorted map	O(log n) ops
Min or max, repeatedly	binary heap	O(log n) push/pop, O(1) peek
FIFO	queue (linked list / ring buffer)	O(1) ends
LIFO	stack	O(1)
Membership-only, fast	hash set	O(1)*
String/prefix search	trie or suffix array	O(m) for length-m key
Connectivity / dynamic equivalence	union-find (DSU)	nearly O(α(n)) ≈ O(1)
Range sum/min updates	Fenwick tree (BIT) or segment tree	O(log n)
Approximate membership in tiny space	Bloom filter	O(k) hashes, false positives only

The default: array + hash map solve more problems than anything else. Reach for trees only when you need ordered iteration or range queries.

Choosing an algorithmic strategy

When you're stuck on a hard problem, run down this list before inventing something:

Brute force / generate-and-test. Start here to validate correctness. Often gives you a baseline test for the optimized version.
Sorting. Sort first, then sweep. Many problems collapse from O(n²) to O(n log n) once sorted (e.g. closest pair, interval merging, k-th element).
Hashing. Replace search with lookup. Two-sum, anagram grouping, dedup, caching memoized calls.
Two pointers / sliding window. For problems on sorted arrays or contiguous subranges.
Divide and conquer. Mergesort, quicksort, binary search, FFT, closest pair. Recurrence: T(n) = aT(n/b) + f(n), solve via Master Theorem.
Greedy. When local optima provably extend to global (matroid structure, exchange argument). Activity selection, Huffman, Dijkstra, MST.
Dynamic programming. When the problem has overlapping subproblems and optimal substructure. Define the state precisely; write the recurrence; memoize or fill bottom-up.
Graph search. BFS for shortest unweighted, Dijkstra for non-negative weighted, Bellman-Ford for negative weights, Floyd-Warshall for all-pairs, A* when you have an admissible heuristic.
Reduction. Reduce your problem to a known one (max flow, 2-SAT, shortest path, MST, sorting). Most "weird" interview problems are flow or DP in disguise.
Randomization. When deterministic is hard, randomized often gives expected-good behavior with simple code (randomized quicksort, treaps, skip lists, Karger's min-cut).

Complexity analysis (rules of thumb)

Anything depending only on log n is essentially free up to billions.
O(n) is the baseline; O(n log n) is "I sorted."
O(n²) breaks somewhere around n = 10⁴–10⁵; O(n³) at 10³.
Exponential / O(2ⁿ) breaks around n = 25–40 — bound the search space (bitmask DP, branch-and-bound, meet-in-the-middle) or accept approximation.
Average ≠ worst. Hash tables, quicksort, randomized BSTs are O(log n) average and O(n) worst — assume worst for adversarial inputs (e.g. user-controlled keys).
Amortized counts. Vector push is amortized O(1); union-find is amortized O(α(n)). Fine for batch totals; not fine for hard real-time.

Recognizing classic problems

Many "hard" problems are an old friend in disguise:

Find duplicates / pairs that sum to k → hash set.
k-th smallest / largest → quickselect or heap.
Schedule non-overlapping intervals → sort by end-time, greedy.
Edit distance / longest common subsequence / knapsack / coin change → DP.
Shortest path through a grid with obstacles → BFS.
Min cost to connect all nodes → MST (Kruskal/Prim).
Topologically order tasks → Kahn's algorithm or DFS post-order.
Detect cycle in a directed graph → DFS with three colors, or Kahn's running out of nodes.
Strongly connected components → Tarjan or Kosaraju.
Match items left/right → bipartite matching (Hopcroft-Karp / Hungarian).
Find anagrams or duplicates of strings up to permutation → sort each string, group by sorted form, or use frequency-tuple hash.

Triggers (when to load this skill)

Designing or reviewing an algorithm.
Code is too slow; want to know if a better complexity exists.
Choosing a data structure for a workload.
Reading a paper or textbook chapter and want context.
Estimating whether n = 10⁶ will fit in time / memory.

Anti-heuristics

Optimizing constants before complexity. Get to the right Big-O first; only then chase cache locality.
Reaching for fancy structures (segment tree, splay) when a sorted array + binary search will do.
Ignoring worst-case in adversarial settings (web inputs, contest, security).
Confusing "average" complexity with what you'll actually observe.
Choosing recursion when iteration is clearer; choosing iteration when recursion mirrors the structure.
Believing benchmarks of yours that don't include warm-up and statistical variance.

See lessons.md for the long-form lessons distilled from each book.