schema-design - SKILL.md Agent Skill

name: schema-design description: Design a relational schema — normalize to 3NF then denormalize with justification, choose the right Postgres index type per data shape, enforce constraints at the DB. Use when modeling a new domain, when queries are slow, or before a migration. Not for diagnosing slow queries (use query-optimization) or shipping the change without downtime (use migration-strategy). license: MIT

Schema Design

Purpose

Model the domain so the database — not application code — enforces integrity, and so reads are fast by design via the right index type for each access pattern.

Universal — normalization (3NF then justified denormalization), constraint-at-the-DB discipline, and index-type-by-data-shape apply to any relational DB; index type names and DDL differ.

Procedure

Normalize to 3NF first
- One fact in one place; no transitive dependencies
- Denormalize ONLY with a measured read-pattern justification (documented)
Enforce integrity at the database
- NOT NULL, UNIQUE, CHECK, foreign keys with explicit ON DELETE
- The DB is the last line of defense — app validation (see data-validation) is the first, not the only
Choose index type by data shape
- B-tree — equality + range (default; most columns)
- GIN — JSONB containment, array membership, full-text search
- GiST — spatial / nearest-neighbor / range overlap
- BRIN — very large append-only, naturally-ordered tables (time-series) — tiny index, range pruning
- Hash — equality-only (rarely worth it over B-tree)
Index the access patterns, not the columns
- Composite index column order = most-selective / equality columns first, range last
- Covering index (INCLUDE) for index-only scans on hot queries
- Always index foreign-key columns — Postgres does NOT auto-index FK child columns; missing index → slow cascade deletes and exclusive locks on the parent update (silent perf cliff)
- Partial indexes (WHERE status = 'active') shrink the index and speed the dominant predicate — common 10x win on tables with a small active subset
- Don't over-index writes — each index is write overhead
Design keys deliberately
- Natural vs surrogate (UUID v7 / bigint) — UUID v7 for distributed-friendly time-ordered IDs
- Avoid random UUID v4 as a clustered/primary key on huge tables (index fragmentation)
- UUID v7 generation: Postgres 18+ has native uuidv7(); on earlier versions generate it app-side (no core/uuid-ossp function exists — uuid-ossp only provides v1/v3/v4/v5)

5b. Soft-delete vs hard-delete — decide deliberately

Hard delete (DELETE): smaller tables, simpler queries, irreversible. Default unless a real requirement says otherwise
Soft delete (deleted_at timestamp): audit / undo / cross-reference survival. Cost: every query must filter WHERE deleted_at IS NULL (a single forgotten filter leaks deleted data — automate via RLS or a view), and unique constraints need partial indexes to allow re-creation after deletion
Archive table: move-on-delete to a sibling table. Keeps the hot table small and the history intact — a middle ground

Validate (validation loop)
- For each planned hot query, run EXPLAIN ANALYZE; if Seq Scan on a large table or bad row estimates → adjust index and re-run (hand off to query-optimization)

Anti-patterns

❌ Anti-pattern	✅ Correct
Integrity enforced only in app code	DB constraints (NOT NULL / FK / CHECK / UNIQUE)
B-tree index on a JSONB containment query	GIN index
Indexing every column "just in case"	Index measured access patterns; each index costs writes
Premature denormalization	3NF first; denormalize with a documented read justification
Random UUID v4 PK on a 100M-row table	UUID v7 / bigint for time-ordered locality
FK column without an index (slow cascade / lock storm)	Index every FK child column
Index covering both active and archived rows	Partial index `WHERE status = 'active'`
Soft-delete with one missing `WHERE deleted_at IS NULL` filter (data leak)	Enforce via view or RLS, not application-by-application filtering

Severity tiers

Tier	Examples	Action SLA
Critical	Missing FK allowing orphaned rows in a money/ownership table; no unique constraint on a natural key causing duplicates	Fix immediately
Major	Wrong index type on a hot query (Seq Scan); integrity enforced only in app	Fix this sprint
Minor	Over-indexing; suboptimal composite column order	Schedule within 2 sprints

Completion Criteria

Schema in 3NF (denormalizations documented with read justification)
All integrity rules enforced by DB constraints
Index type matches each hot query's data shape (verified via EXPLAIN)
No orphan-allowing relationships (FKs with explicit ON DELETE)

Output

Schema: Prisma schema (or DDL) with constraints + indexes
Index rationale doc: which index, which query, which type and why
Commit format: feat(schema): model <domain> / perf(schema): add GIN index for <jsonb query>

Implementation

TypeScript + Prisma + Postgres (default)

Prisma schema: @@index, @@unique, @relation(onDelete:), @db.* types
Index types Prisma can't express directly (GIN/GiST/BRIN) → raw SQL in a migration
UUID v7: Postgres 18+ @default(dbgenerated("uuidv7()")); pre-18 generate app-side (e.g., uuidv7 npm package) — uuid_generate_v7() does NOT exist in core or uuid-ossp
Verify: EXPLAIN ANALYZE in psql / Supabase dashboard

Other stacks

Python: SQLAlchemy / Alembic — same Postgres index types via Index(..., postgresql_using='gin')
Go: GORM or sqlc — index types in migration SQL
Universal: 3NF, constraint-at-DB, and index-type-by-shape are relational-DB principles; MySQL has fewer index types (no BRIN/GiST) — note that difference

Related skills

query-optimization — indexes are validated against actual query plans there
migration-strategy — schema changes ship via safe migrations
transaction-management — constraints + isolation enforce integrity together

Reference

Key insight encoded: Match index type to data shape — B-tree for equality/range, GIN for jsonb/array/full-text, BRIN for large append-only ordered tables, GiST for spatial. Enforce integrity at the DB, not just the app.