name: ecto-expert description: Ecto database patterns — schemas, changesets, query composition, Multi transactions, migrations, ParameterizedType, UUIDv7, and performance tuning targets: elixir: "1.18+" ecto: "3.12+" ecto_sql: "3.12+" otp: "27+"
Ecto Expert
When to Use This Skill
- Designing or modifying schemas, embedded schemas, or
ParameterizedTypedefinitions - Writing or composing queries, multi-table joins, dynamic where clauses
- Building changesets: validation, association casting, custom validators
- Multi-step transactional operations via
Ecto.Multi - Writing migrations (especially zero-downtime patterns) or backfill scripts
- Performance tuning: preloading,
insert_all, streaming, connection pool sizing - Skip this skill when working on test factories or DB sandbox setup (use
elixir-testing-expert), GraphQL resolvers (useabsinthe-expert), or Commanded projections at the architectural level (usecommanded-expert— Ecto patterns inside projections still apply here).
Mental Model
- Schema declares shape; changeset declares change. Schemas are pure data definitions. Changesets carry the diff between current and desired state PLUS validation results. Every write goes through a changeset; every changeset is the audit trail of "what was attempted and why it (didn't) work."
- Queries are composable values, not strings.
from(u in User)returns a struct. You can pipe it through filter functions, hand it to other modules, store it in a list. Composition over construction. - The Repo is the boundary, the only side-effecting layer. Schemas don't talk to databases; queries don't either. Only
Repo.all,Repo.insert,Repo.transactionetc. touch the DB. Keep Repo calls in contexts (seephoenix-expert), never in schemas or changesets. - Migrations are forward-only in spirit. Even reversible migrations should treat rollback as a recovery tool, not a development workflow. Plan changes that work in both directions during deploy.
Decision Tree: Query Approach
What data operation?
├── Fetch by primary key? → Repo.get / Repo.get!
├── Fetch by unique field? → Repo.get_by / Repo.get_by!
├── Filtered list? → Composable query functions |> Repo.all()
├── Complex joins / aggregates? → from() + join + select
├── Existence check (don't need the row)? → Repo.exists?(query)
├── Count? → Repo.aggregate(query, :count)
├── Multiple related writes (atomic)? → Ecto.Multi
├── Upsert? → Repo.insert(changeset, on_conflict: ...)
├── Bulk insert (no changesets, no callbacks)? → Repo.insert_all
└── Streaming large results? → Repo.stream (inside transaction)
Decision Tree: Changeset Strategy
What kind of validation?
├── Standard field validation? → cast + validate_required + validate_format/_length/_inclusion
├── Cross-field validation? → validate_change/3 or a custom function in the pipeline
├── Database constraint enforcement? → unique_constraint / foreign_key_constraint / check_constraint (AFTER the cast)
├── Association management? → cast_assoc / put_assoc
├── Different rules per action? → separate changeset functions
│ ├── create_changeset/2 — strict, all required fields
│ └── update_changeset/2 — partial updates allowed
├── No schema (form-only data)? → Schemaless changeset ({types_map, %{}})
├── Embedded data (struct inside parent)? → embedded_schema + cast_embed
└── Field type isn't fixed at compile time (multi-tenant, plugin)? → Ecto.ParameterizedType
Decision Tree: Preload Strategy
How should this association load?
├── Always need it on this query? → preload in the query (single SQL with join, or 2 queries)
├── Conditional / optional? → Repo.preload after fetch
├── Listing rows that all need the assoc (N+1 risk)? → preload in query (always)
├── Nested associations? → preload([:posts, comments: :author])
├── Need to filter the preloaded rows? → preload(query: from p in Post, where: p.published)
└── GraphQL/Dataloader context? → Dataloader (see absinthe-expert) — never inline preloads in resolvers
Decision Tree: ID Strategy
What kind of primary key?
├── Default integer auto-increment? → Default (no @primary_key needed)
├── Distributed system, want to generate IDs in the app? → :binary_id (UUID v4)
├── Need IDs that sort by creation time? → UUIDv7 (Postgres 17+ via uuidv7() OR ecto_uuidv7 lib)
├── Composite key? → @primary_key false + multiple field/3 with primary_key: true
├── External ID from another system? → use that as a non-primary unique field, keep your own surrogate key
└── No primary key (read-only views)? → @primary_key false
Core Patterns
Schema
defmodule MyApp.Accounts.User do
use Ecto.Schema
import Ecto.Changeset
alias MyApp.Accounts.Organization
@primary_key {:id, :binary_id, autogenerate: true}
@foreign_key_type :binary_id
schema "users" do
field :email, :string
field :name, :string
field :role, Ecto.Enum, values: [:admin, :member, :viewer]
field :password_hash, :string, redact: true
# Virtual fields — not persisted, available on the struct
field :password, :string, virtual: true, redact: true
# Associations
belongs_to :organization, Organization
has_many :posts, MyApp.Content.Post
many_to_many :teams, MyApp.Accounts.Team, join_through: "users_teams"
timestamps()
end
def changeset(user, attrs) do
user
|> cast(attrs, [:email, :name, :role, :password])
|> validate_required([:email, :name])
|> validate_format(:email, ~r/^[^\s@]+@[^\s@]+\.[^\s@]+$/)
|> validate_length(:password, min: 12)
|> unique_constraint(:email)
|> hash_password()
end
defp hash_password(%{valid?: true, changes: %{password: pw}} = cs) do
put_change(cs, :password_hash, Bcrypt.hash_pwd_salt(pw))
end
defp hash_password(cs), do: cs
end
Rule: Changesets must be pure — never call Repo.* from inside one. Validation that needs DB lookup belongs in the context layer, before or after the changeset.
Composable query functions
defmodule MyApp.Accounts.UserQuery do
import Ecto.Query
def base, do: from(u in User, as: :user)
def active(query), do: where(query, [user: u], u.active == true)
def by_role(query, role), do: where(query, [user: u], u.role == ^role)
def by_org(query, org_id), do: where(query, [user: u], u.organization_id == ^org_id)
def with_posts(query), do: preload(query, [:posts])
def ordered(query), do: order_by(query, [user: u], desc: u.inserted_at)
def limit_to(query, n), do: limit(query, ^n)
end
# Compose as a pipeline in the context
UserQuery.base()
|> UserQuery.active()
|> UserQuery.by_role(:admin)
|> UserQuery.ordered()
|> Repo.all()
Rule: Use named bindings (as: :user) so query composition stays readable as joins are added. Without them, [u, p, c] positional bindings get fragile.
Ecto.Multi for atomic multi-step writes
Ecto.Multi.new()
|> Ecto.Multi.insert(:user, User.changeset(%User{}, user_attrs))
|> Ecto.Multi.run(:profile, fn repo, %{user: user} ->
# Arbitrary logic with prior step results in scope
repo.insert(Profile.changeset(%Profile{user_id: user.id}, profile_attrs))
end)
|> Ecto.Multi.update(:org, fn %{user: user} ->
Organization.increment_members_changeset(org, user)
end)
|> Repo.transaction()
|> case do
{:ok, %{user: user, profile: profile}} -> {:ok, user}
{:error, failed_op, changeset, _changes_so_far} -> {:error, {failed_op, changeset}}
end
Rule: Reach for Multi whenever 2+ writes need to commit together OR the second write depends on the first's primary key. The error shape is {:error, failed_op, failed_value, changes_so_far} — failed_value is usually a changeset (the common case shown above), but a Multi.run/3 step can return {:error, anything}, so the third element isn't guaranteed to be a changeset. Pattern-match accordingly.
Ecto.ParameterizedType for runtime-shaped fields
defmodule MyApp.PrefixedID do
use Ecto.ParameterizedType
def type(_params), do: :string
def init(opts), do: Keyword.fetch!(opts, :prefix)
def cast(value, prefix) when is_binary(value) do
if String.starts_with?(value, prefix), do: {:ok, value}, else: :error
end
def load(value, _loader, _prefix), do: {:ok, value}
def dump(value, _dumper, _prefix), do: {:ok, value}
end
# Use in a schema with per-field config
schema "invoices" do
field :invoice_id, MyApp.PrefixedID, prefix: "inv_"
field :customer_id, MyApp.PrefixedID, prefix: "cust_"
end
Rule: Ecto.ParameterizedType lets a single type module behave differently per use site (multi-tenant prefixes, configurable formats, plugin-shaped fields). Reach for it when one custom type would otherwise become five near-identical copies.
Migrations (zero-downtime defaults)
defmodule MyApp.Repo.Migrations.AddRoleToUsers do
use Ecto.Migration
def change do
# Step 1: add nullable column
alter table(:users) do
add :role, :string
end
# Step 2: backfill in a separate migration (or after deploy), then add constraint
# alter table(:users), do: modify(:role, :string, null: false)
# Indexes — concurrent so reads aren't blocked
create index(:users, [:role], concurrently: true)
end
end
# When using `concurrently: true` you must disable the migration transaction:
defmodule MyApp.Repo.Migrations.AddIndexConcurrent do
use Ecto.Migration
@disable_ddl_transaction true
@disable_migration_lock true
def change do
create index(:users, [:email], concurrently: true)
end
end
Safe migration rules:
- Add columns nullable first; backfill in a separate deploy; then
null: falsein a third - Indexes go in their own migration with
concurrently: true+@disable_ddl_transaction - Never rename columns in a single deploy — add new, dual-write/read, drop old
Repo.checkout/2 for explicit connection pinning
# Hold one connection across many operations — avoids pool churn for batch work
Repo.checkout(fn ->
for batch <- Stream.chunk_every(huge_data, 1000) do
Repo.insert_all(Item, batch)
end
end)
Rule: Repo.checkout/2 reserves a connection from the pool for the duration of the function. Use it for batch operations that would otherwise check out and return the same connection many times. Don't hold the checkout across slow external calls — the pool is shared.
Performance patterns
# Bulk insert — bypasses changesets, no validation, no callbacks
Repo.insert_all(User, users_list, on_conflict: :nothing)
# Stream large datasets — must be inside a transaction
Repo.transaction(fn ->
User
|> where([u], u.active == true)
|> Repo.stream(max_rows: 500)
|> Stream.each(&process/1)
|> Stream.run()
end)
# Preload to avoid N+1
# BAD: users |> Enum.map(& &1.posts) # N+1 if not preloaded
# GOOD: from(u in User, preload: [:posts]) |> Repo.all()
Anti-patterns
Don't: call Repo.* from inside a changeset
# BAD
def changeset(user, attrs) do
user
|> cast(attrs, [:email])
|> validate_email_unique() # calls Repo.exists?(...)
end
defp validate_email_unique(cs) do
if Repo.exists?(from u in User, where: u.email == ^get_field(cs, :email)) do
add_error(cs, :email, "already taken")
else
cs
end
end
Why it bites: Changesets get built in many contexts — tests, form previews, dry-runs — where DB access isn't available or wanted. Race condition between the check and the insert means you'll get duplicates anyway. Coupling validation to the DB makes changesets non-deterministic.
Instead: Use unique_constraint(:email) on the changeset and let the database enforce uniqueness via a UNIQUE index. Ecto translates the constraint violation into a changeset error.
Don't: use Repo.get! when missing data is a normal outcome
# BAD
def show(conn, %{"id" => id}) do
user = Repo.get!(User, id) # raises Ecto.NoResultsError
render(conn, :show, user: user)
end
Why it bites: The bang variant is for "missing data is a bug, crash so I see it." A user looking up an ID that doesn't exist is a normal outcome — should render a 404, not crash. The exception turns into a 500 unless you specifically rescue.
Instead: Use Repo.get/2 (returns nil) and pattern-match, or build a fetch_user/1 context function that returns {:ok, _} | {:error, :not_found}.
Don't: write to the database without a changeset
# BAD
def update_email(user_id, new_email) do
Repo.update_all(from(u in User, where: u.id == ^user_id), set: [email: new_email])
end
Why it bites: update_all bypasses changesets entirely. No validation, no constraint translation, no updated_at bump (unless you set it manually), no triggers downstream that depend on changeset behavior. Audit trails miss the change. Bad data slips in.
Instead: Build a changeset, validate, and Repo.update. update_all is for legitimate bulk operations (mark all as expired, increment a counter) where validation is provably unnecessary.
Don't: preload after the fact in a list
# BAD — N+1
users = Repo.all(User)
Enum.map(users, fn user ->
user_with_posts = Repo.preload(user, :posts) # one query per user
# ...
end)
Why it bites: Repo.preload on a single struct fires its own SELECT. Doing it inside Enum.map means N queries for N users — exactly the N+1 you were trying to avoid by using preload at all.
Instead:
# GOOD — single batched query for the association
users = Repo.all(from u in User, preload: [:posts])
If preload is conditional (only sometimes needed), preload the list once after fetching: Repo.preload(users, :posts) (note: passing the list, not iterating).
Don't: bake compile-time secrets into schema field options
# BAD
schema "events" do
field :payload, :map, default: System.get_env("DEFAULT_PAYLOAD") # compiled in
end
Why it bites: Schema definitions are evaluated at compile time. System.get_env here reads env vars during the build, freezing whatever was set then. The deployed app uses the build-time value, not the runtime one. (See phoenix-expert runtime config patterns.)
Instead: Use a function default (default: &my_function/0) only if the function is pure. For runtime config, set the field in the changeset or context layer where runtime values are accessible.
Common Gotchas
cast/3silently drops fields not in the allowed list — passingemailwhen only[:name]is allowed produces a changeset whereemailis ignored, no warning. Adding a field requires updating both the schema AND everycastcall that should accept it.validate_requiredandunique_constraintrun at different times —validate_requiredruns at validation;unique_constraintonly fires AFTER the DB rejects the insert. A changeset can bevalid?: trueand still fail to insert. Always handle the{:error, changeset}fromRepo.insert/2.has_manycascade requires a migration-level constraint —has_manyin the schema does NOT cascade deletes. You needreferences(:users, on_delete: :delete_all)in the migration, OR:on_deleteoption inbelongs_to(Ecto-level, slower), OR explicit deletion in aMulti.:binary_id(UUIDv4) is not sortable — UUID v4 is random. If you need IDs that sort by creation time (cursor pagination, time-series queries), use UUIDv7 (Postgres 17+ has built-inuuidv7(), or useecto_uuidv7for app-level generation).Repo.preloaddoesn't reload — callingRepo.preload(user, :posts)when posts are already loaded is a no-op. To force fresh:Repo.preload(user, :posts, force: true).Repo.transaction(fn -> ... end)differs fromRepo.transaction(multi)— the function form expects you to return{:error, _}or callRepo.rollback/1to abort. Multi handles all of this automatically. Prefer Multi for clarity.embedded_schemahas noidby default — the embedded struct doesn't get a primary key unless you declare one. Useful for value objects (address, money), confusing if you expected a normal schema.insert_all/3returns{count, returning}not{:ok, _}— without:returning, the second element isnil({count, nil}). Withreturning: [:id, :email], it's a list of maps ({count, [%{id: ..., email: ...}, ...]}). Treating it as a normal Repo write breaks; always destructure the tuple.- Sandbox
:automode requiresasync: false—:autouses a single shared transaction across the suite, which precludes parallel runs. Forasync: truetests, switch to:manualmode and callSandbox.checkout/1in each test's setup. (Seeelixir-testing-expertfor the pattern.) - Changesets retain the original struct, not the merged data —
cs.datais the input struct,cs.changesis the diff,Ecto.Changeset.apply_changes/1materializes the merged result. Readingcs.data.emailafter a cast won't show the new email.
Quick Reference
Common Repo ops:
Repo.get(User, id) # nil if missing
Repo.get!(User, id) # raises if missing
Repo.get_by(User, email: e)
Repo.all(query)
Repo.one(query) # nil or struct (raises if multiple)
Repo.aggregate(query, :count)
Repo.exists?(query)
Repo.insert(changeset) # {:ok, struct} | {:error, changeset}
Repo.insert!(changeset) # raises on error
Repo.update(changeset)
Repo.delete(struct_or_changeset)
Repo.insert_all(schema, list, on_conflict: :nothing)
Repo.preload(struct_or_list, :assoc)
Query macros (require import Ecto.Query):
from u in User
from u in User, where: u.active == true
from u in User, join: p in assoc(u, :posts), preload: [posts: p]
from u in User, select: %{id: u.id, name: u.name}
from u in User, group_by: u.role, select: {u.role, count(u.id)}
Changeset essentials:
cast(struct, attrs, allowed_fields)
validate_required([:field])
validate_length(:field, min: 1, max: 100)
validate_format(:email, ~r/.../)
validate_inclusion(:role, [:admin, :member])
validate_change(:field, fn _, val -> [...] end)
unique_constraint(:email)
foreign_key_constraint(:org_id)
cast_assoc(:profile, with: &Profile.changeset/2)
put_change(cs, :field, value)
put_assoc(cs, :tags, [%Tag{...}])
When to Load Deeper References
- Writing window functions, CTEs, lateral joins, dynamic queries, or complex aggregations? → Read
references/queries-advanced.md - Planning a zero-downtime migration, large data backfill, or cross-table refactor? → Read
references/migrations.md