database-schema-designer

star 0

Design robust, scalable, production-ready database schemas for SQL and NoSQL databases. Use this skill whenever the user mentions designing a schema, creating tables, modeling data, database architecture, normalization, indexing, migrations, or anything about structuring a database — even if they don't use the word "schema". Trigger on phrases like "I need a database for...", "how should I store...", "what tables do I need...", "help me model...", "design my data model", or any request to think through how to persist data. Covers normalization (1NF–3NF), indexing strategies, constraint design, foreign key patterns, soft deletes, audit trails, multi-tenancy, migration patterns, and performance optimization.

MGriot By MGriot schedule Updated 3/14/2026

name: database-schema-designer description: Design robust, scalable, production-ready database schemas for SQL and NoSQL databases. Use this skill whenever the user mentions designing a schema, creating tables, modeling data, database architecture, normalization, indexing, migrations, or anything about structuring a database — even if they don't use the word "schema". Trigger on phrases like "I need a database for...", "how should I store...", "what tables do I need...", "help me model...", "design my data model", or any request to think through how to persist data. Covers normalization (1NF–3NF), indexing strategies, constraint design, foreign key patterns, soft deletes, audit trails, multi-tenancy, migration patterns, and performance optimization. license: MIT

Database Schema Designer

Design production-ready database schemas with best practices built-in.


Quick Start

Just describe your data model:

design a schema for an e-commerce platform with users, products, orders

What to include in your request:

  • Entities and their key relationships
  • Scale hints (high-traffic, millions of records, multi-tenant)
  • Special requirements (soft deletes, audit trail, multi-tenancy)
  • Database preference (SQL/NoSQL, MySQL/PostgreSQL) — defaults to PostgreSQL if unspecified

Process Overview

Requirements → ANALYZE (entities, access patterns, SQL vs NoSQL)
            → DESIGN  (normalize to 3NF, define keys, constraints, types)
            → OPTIMIZE (indexes, denormalize hot paths, add timestamps)
            → MIGRATE  (reversible scripts, backward-compatible, zero-downtime)
            → Production-Ready Schema

Quick Reference

Task Approach Key Consideration
New schema Normalize to 3NF first Model the domain, not the UI
SQL vs NoSQL Access patterns decide Read/write ratio + consistency needs
Primary keys BIGINT or UUID UUID / ULID for distributed systems
Foreign keys Always constrain Define ON DELETE strategy explicitly
Indexes FKs + WHERE/ORDER BY cols Column order in composite indexes matters
Migrations Always reversible Backward-compatible first
Soft deletes deleted_at TIMESTAMP Filter WHERE deleted_at IS NULL
Audit trail created_by, updated_by FK to users table
Multi-tenancy tenant_id on every table Index it; enforce in app layer too

Commands

Command Action
design schema for {domain} Full schema generation
normalize {table} Apply normalization rules
add indexes for {query pattern} Generate index strategy
migration for {change} Create reversible migration
review schema Audit existing schema for issues

Core Principles

Principle Why How
Model the Domain UI changes, domain doesn't Names reflect business concepts
Data Integrity First Corruption is costly Constraints at the database level
Optimize for Access Pattern Can't optimize for everything OLTP: normalized; OLAP: denormalized
Plan for Scale Retrofitting hurts Index strategy + partitioning plan upfront
Soft-Delete by Default Hard deletes lose history deleted_at + partial index on NULL

Anti-Patterns

Avoid Why Instead
VARCHAR(255) everywhere Hides intent, wastes storage Size fields appropriately
FLOAT for money Rounding errors DECIMAL(10,2)
Missing FK constraints Orphaned/corrupt data Always define foreign keys
No indexes on FKs Slow JOINs Index every FK
Dates as strings Can't compare or sort DATE, TIMESTAMP types
SELECT * Fetches unnecessary data Explicit column lists
Non-reversible migrations Can't rollback Always write DOWN migration
NOT NULL without default Breaks existing rows Add nullable → backfill → constrain
is_deleted BOOLEAN Can't tell when or by whom deleted_at TIMESTAMP NULL
Multiple boolean status cols Mutually exclusive states blow up Single status ENUM or lookup table
EAV (Entity-Attribute-Value) Kills query performance and type safety Dedicated columns or JSONB for truly dynamic attrs
tenant_id-less tables Impossible to isolate tenant data later Add tenant_id from day one in SaaS

Common Patterns

Soft Deletes

-- Add to any table that should support soft delete
deleted_at  TIMESTAMP NULL DEFAULT NULL,
deleted_by  BIGINT    NULL REFERENCES users(id)

-- PostgreSQL: partial index keeps queries fast
CREATE INDEX idx_orders_active ON orders(customer_id)
  WHERE deleted_at IS NULL;

-- All queries must filter:
SELECT * FROM orders WHERE deleted_at IS NULL;

Audit Trail

-- Minimal audit columns on every table
created_at  TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
created_by  BIGINT    NOT NULL REFERENCES users(id),
updated_at  TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_by  BIGINT    NULL     REFERENCES users(id)

-- For full change history, use a separate audit table:
CREATE TABLE orders_audit (
  id         BIGINT PRIMARY KEY,
  order_id   BIGINT NOT NULL,
  action     VARCHAR(10) NOT NULL,   -- INSERT / UPDATE / DELETE
  changed_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  changed_by BIGINT NOT NULL REFERENCES users(id),
  snapshot   JSONB  NOT NULL         -- row state at time of change
);

Multi-Tenancy (Row-Level Isolation)

-- Every tenant-scoped table carries tenant_id
CREATE TABLE projects (
  id         BIGINT PRIMARY KEY,
  tenant_id  BIGINT NOT NULL REFERENCES tenants(id),
  name       VARCHAR(255) NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  INDEX idx_projects_tenant (tenant_id)
);

-- Application layer always scopes queries:
SELECT * FROM projects WHERE tenant_id = ? AND deleted_at IS NULL;

-- PostgreSQL: enforce at DB level with Row-Level Security
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON projects
  USING (tenant_id = current_setting('app.current_tenant')::BIGINT);

Status / State Machine

-- Use ENUM or a lookup table — never multiple boolean columns
status VARCHAR(20) NOT NULL DEFAULT 'pending'
  CHECK (status IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled'))

-- PostgreSQL native ENUM (rename is painful — prefer VARCHAR + CHECK)
CREATE TYPE order_status AS ENUM ('pending','processing','shipped','delivered','cancelled');

Primary Key Choice

-- Auto-increment (simple, opaque, performant)
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY

-- UUID (distributed, globally unique, URL-safe)
id UUID PRIMARY KEY DEFAULT gen_random_uuid()   -- PostgreSQL 13+

-- ULID/KSUID: sortable + unique — best of both worlds (use via app or extension)

Verification Checklist

After designing a schema, confirm:

  • Every table has a primary key
  • All FK relationships have constraints + ON DELETE strategy
  • Indexes on every FK column
  • Indexes on frequently queried WHERE/ORDER BY columns
  • DECIMAL for monetary values (not FLOAT)
  • NOT NULL on required fields; UNIQUE where needed
  • CHECK constraints for validation rules
  • created_at / updated_at timestamps on every table
  • Soft delete strategy defined (deleted_at or hard delete with reason)
  • Audit trail considered (who created/changed each record)
  • tenant_id on all tenant-scoped tables (SaaS apps)
  • Migrations are reversible (UP + DOWN scripts)
  • Tested on staging with production-like data

Deep Dive: Normalization (SQL)
Form Rule Violation Example
1NF Atomic values, no repeating groups product_ids = '1,2,3'
2NF 1NF + no partial dependencies customer_name in order_items
3NF 2NF + no transitive dependencies country derived from postal_code
-- 1NF violation → fix with a junction table
-- BAD
CREATE TABLE orders (id INT PRIMARY KEY, product_ids VARCHAR(255));
-- GOOD
CREATE TABLE order_items (
  id INT PRIMARY KEY,
  order_id INT REFERENCES orders(id),
  product_id INT REFERENCES products(id)
);

-- 3NF violation → fix with a lookup table
-- BAD
CREATE TABLE customers (id INT PRIMARY KEY, postal_code VARCHAR(10), country VARCHAR(50));
-- GOOD
CREATE TABLE postal_codes (code VARCHAR(10) PRIMARY KEY, country VARCHAR(50));

When to Denormalize

Scenario Strategy
Read-heavy reporting Pre-calculated aggregates / total_amount cache
Expensive JOINs Derived columns updated by trigger
Analytics dashboards Materialized views
Deep Dive: Data Types

Strings

Type Use Case
CHAR(n) Fixed-length codes (country_code CHAR(2))
VARCHAR(n) Variable text (email VARCHAR(255))
TEXT Long content (articles, descriptions)

Numbers

Type Use Case
TINYINT / SMALLINT Status codes, small quantities
INT / BIGINT IDs, counts
DECIMAL(p,s) Money — always
FLOAT / DOUBLE Scientific data only

Dates — always store in UTC

created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
-- PostgreSQL
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()

PostgreSQL Extras

-- JSONB for dynamic/sparse attributes (indexed, fast)
metadata JSONB DEFAULT '{}'
CREATE INDEX idx_meta ON table USING GIN(metadata);

-- Arrays
tags TEXT[] DEFAULT '{}'

-- Ranges
valid_during TSTZRANGE
Deep Dive: Indexing Strategy

When to Index

Always Reason
Foreign keys Speed up JOINs
WHERE clause columns Speed up filtering
ORDER BY columns Speed up sorting
Unique constraints Enforced uniqueness
-- Composite: put most selective column first, match query patterns
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);
-- Uses: WHERE customer_id = ?  /  WHERE customer_id = ? AND status = ?
-- Misses: WHERE status = ?  (status alone)

-- Partial (PostgreSQL) — smaller, faster for filtered queries
CREATE INDEX idx_orders_pending ON orders(created_at)
  WHERE status = 'pending';

Index Types

Type Best For
B-Tree (default) Ranges, equality, sorting
Hash Exact equality only
Full-text Text search (MATCH AGAINST)
GIN / GiST Arrays, JSONB, geospatial (PostgreSQL)

Pitfalls

  • Over-indexing slows writes — only index what's queried
  • Wrong column order in composite indexes → unused index
  • Missing FK indexes → slow JOINs (MySQL won't create them automatically)
Deep Dive: Constraints
-- Foreign key strategies
FOREIGN KEY (customer_id) REFERENCES customers(id)
  ON DELETE CASCADE     -- delete children with parent (order_items)
  ON DELETE RESTRICT    -- prevent deletion if referenced (safest default)
  ON DELETE SET NULL    -- optional relationships

-- Check constraints
price   DECIMAL(10,2) CHECK (price >= 0)
discount INT           CHECK (discount BETWEEN 0 AND 100)

-- Composite unique
UNIQUE (tenant_id, slug)   -- slug unique per tenant, not globally
Deep Dive: Relationship Patterns
-- One-to-Many
CREATE TABLE orders (
  id          BIGINT PRIMARY KEY,
  customer_id BIGINT NOT NULL REFERENCES customers(id) ON DELETE RESTRICT
);

-- Many-to-Many (junction table)
CREATE TABLE enrollments (
  student_id  BIGINT REFERENCES students(id) ON DELETE CASCADE,
  course_id   BIGINT REFERENCES courses(id)  ON DELETE CASCADE,
  enrolled_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (student_id, course_id)
);

-- Self-referencing (hierarchy)
CREATE TABLE employees (
  id         BIGINT PRIMARY KEY,
  name       VARCHAR(100) NOT NULL,
  manager_id BIGINT REFERENCES employees(id)
);

-- Polymorphic (prefer separate FKs for integrity)
CREATE TABLE comments (
  id       BIGINT PRIMARY KEY,
  content  TEXT NOT NULL,
  post_id  BIGINT REFERENCES posts(id),
  photo_id BIGINT REFERENCES photos(id),
  CHECK (
    (post_id IS NOT NULL AND photo_id IS NULL) OR
    (post_id IS NULL     AND photo_id IS NOT NULL)
  )
);
Deep Dive: NoSQL Design (MongoDB)

Embed vs Reference

Factor Embed Reference
Accessed together? Yes → embed No → reference
Relationship size 1:few 1:many
Update frequency Rarely Frequently
Document size concern Small Near 16 MB
// Embedded (order + its items read together)
{ "_id": "order_123", "items": [{ "product_id": "p1", "qty": 2 }], "total": 59.98 }

// Referenced (customer loaded separately)
{ "_id": "order_123", "customer_id": "cust_456", "total": 59.98 }
// MongoDB indexes
db.users.createIndex({ email: 1 }, { unique: true });
db.orders.createIndex({ customer_id: 1, created_at: -1 });
db.articles.createIndex({ title: "text", body: "text" });
db.stores.createIndex({ location: "2dsphere" });
Deep Dive: Migrations

Rules

Practice Why
Always reversible Need to rollback
Backward compatible Zero-downtime deploys
Schema change before data change Separate concerns
Test on staging first Catch issues early

Adding a Column (Zero-Downtime)

-- Step 1: Add nullable (deploy immediately — no lock on large tables)
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- Step 2: Deploy app code that writes to new column
-- Step 3: Backfill
UPDATE users SET phone = '' WHERE phone IS NULL;
-- Step 4: Add constraint
ALTER TABLE users ALTER COLUMN phone SET NOT NULL;

Migration Template

-- Migration: 20250314120000_add_phone_to_users.sql

-- UP
BEGIN;
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
CREATE INDEX idx_users_phone ON users(phone);
COMMIT;

-- DOWN
BEGIN;
DROP INDEX idx_users_phone;
ALTER TABLE users DROP COLUMN phone;
COMMIT;
Deep Dive: Performance Optimization
-- Analyze query plan
EXPLAIN ANALYZE SELECT * FROM orders
WHERE customer_id = 123 AND status = 'pending';
EXPLAIN output Meaning
Seq Scan Full table scan — add an index
Index Scan Good
Bitmap Index Scan Good for low-selectivity
rows= very high Many rows examined — refine index

N+1 Query Problem

# BAD: 1 query per order
for order in orders:
    customer = db.query("SELECT * FROM customers WHERE id = ?", order.customer_id)

# GOOD: single JOIN
results = db.query("""
    SELECT o.*, c.name AS customer_name
    FROM orders o
    JOIN customers c ON o.customer_id = c.id
    WHERE o.deleted_at IS NULL
""")

Optimization Toolkit

Technique When
Add indexes Slow WHERE/ORDER BY
Partial indexes Queries always filter on a value
Denormalize Expensive multi-table JOINs
Materialized views Heavy analytics queries
Pagination (LIMIT/OFFSET or keyset) Large result sets
Read replicas Read-heavy load
Partitioning Tables > 100M rows

Extension Points

For advanced patterns not covered here, consider adding reference files:

  • Database-specific: PostgreSQL vs MySQL vs SQLite variations
  • Advanced patterns: Time-series, event sourcing, CQRS
  • ORM integration: Prisma, TypeORM, SQLAlchemy
  • Monitoring: Slow query logs, pg_stat_statements, query performance dashboards
Install via CLI
npx skills add https://github.com/MGriot/.gemini --skill database-schema-designer
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator