schema-design

star 0

Design a relational schema — normalize to 3NF then denormalize with justification, choose the right Postgres index type per data shape, enforce constraints at the DB. Use when modeling a new domain, when queries are slow, or before a migration. Not for diagnosing slow queries (use query-optimization) or shipping the change without downtime (use migration-strategy).

JayKim88 By JayKim88 schedule Updated 6/8/2026

name: schema-design description: Design a relational schema — normalize to 3NF then denormalize with justification, choose the right Postgres index type per data shape, enforce constraints at the DB. Use when modeling a new domain, when queries are slow, or before a migration. Not for diagnosing slow queries (use query-optimization) or shipping the change without downtime (use migration-strategy). license: MIT

Schema Design

Purpose

Model the domain so the database — not application code — enforces integrity, and so reads are fast by design via the right index type for each access pattern.

Universal — normalization (3NF then justified denormalization), constraint-at-the-DB discipline, and index-type-by-data-shape apply to any relational DB; index type names and DDL differ.

Procedure

  1. Normalize to 3NF first

    • One fact in one place; no transitive dependencies
    • Denormalize ONLY with a measured read-pattern justification (documented)
  2. Enforce integrity at the database

    • NOT NULL, UNIQUE, CHECK, foreign keys with explicit ON DELETE
    • The DB is the last line of defense — app validation (see data-validation) is the first, not the only
  3. Choose index type by data shape

    • B-tree — equality + range (default; most columns)
    • GIN — JSONB containment, array membership, full-text search
    • GiST — spatial / nearest-neighbor / range overlap
    • BRIN — very large append-only, naturally-ordered tables (time-series) — tiny index, range pruning
    • Hash — equality-only (rarely worth it over B-tree)
  4. Index the access patterns, not the columns

    • Composite index column order = most-selective / equality columns first, range last
    • Covering index (INCLUDE) for index-only scans on hot queries
    • Always index foreign-key columns — Postgres does NOT auto-index FK child columns; missing index → slow cascade deletes and exclusive locks on the parent update (silent perf cliff)
    • Partial indexes (WHERE status = 'active') shrink the index and speed the dominant predicate — common 10x win on tables with a small active subset
    • Don't over-index writes — each index is write overhead
  5. Design keys deliberately

    • Natural vs surrogate (UUID v7 / bigint) — UUID v7 for distributed-friendly time-ordered IDs
    • Avoid random UUID v4 as a clustered/primary key on huge tables (index fragmentation)
    • UUID v7 generation: Postgres 18+ has native uuidv7(); on earlier versions generate it app-side (no core/uuid-ossp function exists — uuid-ossp only provides v1/v3/v4/v5)

5b. Soft-delete vs hard-delete — decide deliberately

  • Hard delete (DELETE): smaller tables, simpler queries, irreversible. Default unless a real requirement says otherwise
  • Soft delete (deleted_at timestamp): audit / undo / cross-reference survival. Cost: every query must filter WHERE deleted_at IS NULL (a single forgotten filter leaks deleted data — automate via RLS or a view), and unique constraints need partial indexes to allow re-creation after deletion
  • Archive table: move-on-delete to a sibling table. Keeps the hot table small and the history intact — a middle ground
  1. Validate (validation loop)
    • For each planned hot query, run EXPLAIN ANALYZE; if Seq Scan on a large table or bad row estimates → adjust index and re-run (hand off to query-optimization)

Anti-patterns

❌ Anti-pattern ✅ Correct
Integrity enforced only in app code DB constraints (NOT NULL / FK / CHECK / UNIQUE)
B-tree index on a JSONB containment query GIN index
Indexing every column "just in case" Index measured access patterns; each index costs writes
Premature denormalization 3NF first; denormalize with a documented read justification
Random UUID v4 PK on a 100M-row table UUID v7 / bigint for time-ordered locality
FK column without an index (slow cascade / lock storm) Index every FK child column
Index covering both active and archived rows Partial index WHERE status = 'active'
Soft-delete with one missing WHERE deleted_at IS NULL filter (data leak) Enforce via view or RLS, not application-by-application filtering

Severity tiers

Tier Examples Action SLA
Critical Missing FK allowing orphaned rows in a money/ownership table; no unique constraint on a natural key causing duplicates Fix immediately
Major Wrong index type on a hot query (Seq Scan); integrity enforced only in app Fix this sprint
Minor Over-indexing; suboptimal composite column order Schedule within 2 sprints

Completion Criteria

  • Schema in 3NF (denormalizations documented with read justification)
  • All integrity rules enforced by DB constraints
  • Index type matches each hot query's data shape (verified via EXPLAIN)
  • No orphan-allowing relationships (FKs with explicit ON DELETE)

Output

  • Schema: Prisma schema (or DDL) with constraints + indexes
  • Index rationale doc: which index, which query, which type and why
  • Commit format: feat(schema): model <domain> / perf(schema): add GIN index for <jsonb query>

Implementation

TypeScript + Prisma + Postgres (default)

  • Prisma schema: @@index, @@unique, @relation(onDelete:), @db.* types
  • Index types Prisma can't express directly (GIN/GiST/BRIN) → raw SQL in a migration
  • UUID v7: Postgres 18+ @default(dbgenerated("uuidv7()")); pre-18 generate app-side (e.g., uuidv7 npm package) — uuid_generate_v7() does NOT exist in core or uuid-ossp
  • Verify: EXPLAIN ANALYZE in psql / Supabase dashboard

Other stacks

  • Python: SQLAlchemy / Alembic — same Postgres index types via Index(..., postgresql_using='gin')
  • Go: GORM or sqlc — index types in migration SQL
  • Universal: 3NF, constraint-at-DB, and index-type-by-shape are relational-DB principles; MySQL has fewer index types (no BRIN/GiST) — note that difference

Related skills

  • query-optimization — indexes are validated against actual query plans there
  • migration-strategy — schema changes ship via safe migrations
  • transaction-management — constraints + isolation enforce integrity together

Reference

  • Key insight encoded: Match index type to data shape — B-tree for equality/range, GIN for jsonb/array/full-text, BRIN for large append-only ordered tables, GiST for spatial. Enforce integrity at the DB, not just the app.
Install via CLI
npx skills add https://github.com/JayKim88/claude-ai-engineering --skill schema-design
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator