builder-smoke-test

name: builder-smoke-test description: Smoke test the Agent Builder feature branch end-to-end against a hermetic project scaffolded by the skill (linked to the current worktree). Covers workspace reconciliation, stored agents/skills CRUD, ownership, visibility, stars, registry/library Copy flow, picker allowlists, model policy, RBAC role gating, role impersonation UI, builder defaults, infrastructure diagnostics, channels, and Studio + Agent Builder UI. Trigger when validating the agent-builder feature branch, PRs that touch packages/server, packages/playground, packages/playground-ui agent-builder routes, or builder EE code paths.

Builder Smoke Test

End-to-end smoke testing of the Agent Builder feature set against a hermetic project the skill scaffolds at ~/mastra-builder-smoke-tests/builder-smoke (configurable). The project links to the current worktree via pnpm link: overrides, so changes to packages under packages/, stores/, auth/, channels/, observability/, browser/, and client-sdks/ take effect on the next mastra dev restart.

This skill is for branch QA — it complements the release-time mastra-smoke-test. It exercises the Builder EE surface (stored entities, RBAC, registry, infra, channels) using a minimal, predictable project rather than the kitchen-sink examples/agent.

⚠️ Mandatory Test Checklist

Use task_write to track progress. Run ALL sections unless --test or --scope narrows the run.

Do not skip sections unless you hit an actual blocker. "Seemed complex" or "I'll come back to it" are not valid reasons. Attempt every step — only stop when you literally cannot proceed. Report what you tried and what blocked you.

#	Section	Reference	When required
1	Setup	`references/setup.md`	Always
2	Workspace	`references/workspace.md`	`--test workspace` or full
3	Reconciliation	`references/reconciliation.md`	Steps 1 + 5 only; steps 2/3/4/6 are out of smoke-test scope (see below)
4	Defaults	`references/defaults.md`	`--test defaults` or full
5	Model Policy	`references/model-policy.md`	`--test model-policy` or full
6	Skills	`references/skills.md`	`--test skills` or full
7	Registry	`references/registry.md`	`--test registry` or full
8	Agents	`references/agents.md`	`--test agents` or full
9	Picker Allowlists	`references/picker-allowlist.md`	`--test pickers` or full
10	Favorites	`references/favorites.md`	`--test favorites` or full (formerly `stars`)
11	Permissions / RBAC	`references/permissions.md`	`--test permissions` or full
12	Infrastructure	`references/infrastructure.md`	`--test infrastructure` or full
13	Channels	`references/channels.md`	`--test channels` or full
14	UI	`references/ui.md`	`--test ui` or full
15	Auth	`references/auth.md`	`--test auth` or `--auth on`

Execution flow

Confirm the project directory. Before scaffolding, ask the user where they want $PROJECT_DIR to live. Offer the default (~/mastra-builder-smoke-tests/builder-smoke) as a suggestion. Skip the question if they already passed --dir or have $BUILDER_SMOKE_TEST_DIR exported. See references/setup.md step 0.
Read the reference file for each section you're about to run.
Under --auth on, extract the session cookie before running any other section. The WorkOS cookie is httpOnly, so curl cannot mint it and document.cookie cannot read it. The scaffold ships a debug route at GET /smoke-test/cookie gated by SMOKE_TEST_COOKIE_LEAK=1. Follow the "Extracting the session cookie for curl (auth on)" section below before touching any auth-on endpoint. Do not pivot to UI-only testing because curl is "blocked" — the cookie route is the unblock path.
Seed non-owner data after the server has booted at least once. A fresh scaffold has no skills authored by anyone other than the test user, which makes non-owner / Library Copy / non-owner visibility / non-admin stars flows untestable. Run bash .claude/skills/builder-smoke-test/scripts/seed-multi-user.sh (or with --dir $PROJECT_DIR) before sections 6 (Skills), 7 (Registry), and 10 (Stars). The script is idempotent and bypasses RBAC by writing directly to libsql, so it works regardless of --auth mode or current role. Do not mark non-owner steps as "blocked" without running this first.
Execute the steps — use curl for API checks (with -H "Cookie: $COOKIE" under --auth on), whichever browser tool the harness has wired up (Stagehand, Chrome MCP, etc.) for UI checks.
Record results in the summary table.
Mark the section complete with task_write before moving to the next.

Partial testing (`--test`)

If --test is provided:

Always run Setup.
Run only the specified section(s).
Skip everything else.

Example: --test skills,registry,agents → Setup + Skills + Registry + Agents.

Scope shortcuts (`--scope`)

--scope runs a curated group of related sections. Setup is always implied.

Scope	Includes
`rbac`	permissions, auth
`skills`	skills, registry, defaults
`agents`	agents, pickers, defaults, model-policy
`infra`	infrastructure, channels, reconciliation
`ui`	ui
`quick`	workspace, skills, agents, favorites, ui (skips long-running)

--scope and --test can be combined; the union is run.

Usage

# Full smoke (interactive)
/builder-smoke-test

# Specific sections
/builder-smoke-test --test workspace,skills
/builder-smoke-test --test agents,favorites
/builder-smoke-test --test reconciliation
/builder-smoke-test --test ui

# Scope shortcuts
/builder-smoke-test --scope rbac
/builder-smoke-test --scope skills
/builder-smoke-test --scope quick

# Force auth on / off (otherwise auto-detected from WORKOS_* env vars)
/builder-smoke-test --auth on
/builder-smoke-test --auth off

# Run auth-on as a non-admin role (must match the logged-in user's actual role)
/builder-smoke-test --auth on --role viewer
/builder-smoke-test --auth on --role member

# Skip the browser pass (API-only run)
/builder-smoke-test --skip-browser

Parameters

Parameter	Description	Default
`--test`	Comma-separated section names (see table above).	(all sections)
`--scope`	Named group of sections (`rbac`, `skills`, `agents`, `infra`, `ui`, `quick`). Combinable with `--test`.	(none)
`--auth`	`on`, `off`, or `auto`. `auto` enables the Auth section iff `WORKOS_CLIENT_ID` + `WORKOS_API_KEY` are set.	`auto`
`--role`	Expected role of the logged-in user under `--auth on`: `owner`, `admin`, `member`, or `viewer`. Setup asserts the live `/api/auth/me` roles match; on mismatch the run stops and the user is told to either change their WorkOS role or re-run with the correct `--role`. Ignored under `--auth off`.	`admin`
`--clean`	Delete test entities (smoke-test workspaces / agents / skills) at the end of each section.	`false`
`--skip-browser`	Run only API/`curl` checks. UI section is skipped.	`false`
`--dir`	Project directory the skill scaffolds into. Forwarded to `scripts/scaffold.sh`. Also reads `$BUILDER_SMOKE_TEST_DIR` from the environment when the flag is omitted.	`~/mastra-builder-smoke-tests/builder-smoke`
`--reuse`	If the project already exists at `$PROJECT_DIR` and has `node_modules/@mastra/core`, skip `pnpm install`. Forwarded to `scripts/scaffold.sh`.	`false`
`--openai-key`	OPENAI_API_KEY value to write into the scaffolded `.env`. If omitted, the scaffold script falls back to `$OPENAI_API_KEY` in the shell, then to an interactive prompt.	(shell or prompt)
`--workos-api-key` `--workos-client-id` `--workos-organization-id`	All three are required together to scaffold an auth-on project. Writes `AUTH_PROVIDER=workos` plus the three keys plus `WORKOS_REDIRECT_URI=http://localhost:4111/api/auth/callback` into `.env`.	(auth off)

If --auth auto and no WorkOS env vars are present, the Auth section is auto-skipped and reported as ⏭️ Skipped (no WORKOS_* env vars).

Canonical order

When running multiple sections, execute them in the order shown in the section table (1 → 15). The order is intentional:

Setup must run first — preflight + readiness probe gate every later section.
Workspace / Reconciliation / Defaults / Model Policy establish that the server's view of the project matches what the rest of the run assumes. Run them before any CRUD pass.
Skills → Registry → Agents → Pickers → Stars is a build-up: agents reference skills, pickers depend on the entities created above.
Permissions / Infrastructure / Channels / UI are read-mostly inspections that benefit from existing entities.
Auth runs last because it requires restarting mastra dev with a different .env.

If --test or --scope narrows the run, keep the relative order — just skip the sections that fall outside the selection.

Required vs optional reference tiers

References fall into three tiers; an agent should treat them accordingly:

Required (every run): setup.md. Any failure here blocks the rest of the run.
Standard (default tiers for full, quick, scope shortcuts): workspace.md, skills.md, agents.md, favorites.md, ui.md (core), auth.md when --auth on.
Extended (only when explicitly selected via --test/--scope or the matching code surface changed): reconciliation.md, defaults.md, model-policy.md, registry.md, picker-allowlist.md, permissions.md, infrastructure.md, channels.md, ui.md extended tier.

When skipping an extended section, mark it ⏭️ Skipped (not in scope) in the result table — don't silently omit it.

Cleanup

The scaffold is a self-contained throwaway directory at $PROJECT_DIR. All fixture state (workspaces, agents, skills, libsql DB, .mastra/workspace files) lives inside it. The smoke test never writes to anything outside $PROJECT_DIR (other than the dev server it runs).

At the end of every run:

Stop the dev server (kill $(lsof -i :4111 -sTCP:LISTEN -t) or foreground Ctrl-C).
Choose how to dispose of fixture state:
- Reuse: leave $PROJECT_DIR in place. The next run can pass --reuse (or --skip-scaffold to preflight) and pick up where this one left off. Fastest for iterating.
- Reset: rm -rf "$PROJECT_DIR" (or re-run scripts/scaffold.sh without --reuse). Cheapest way to get back to a known-clean state. Don't bother per-entity DELETE — the directory IS the state.
If a section bailed mid-flight (assertion failure, network error), record the partial state in the report's Issues section so the next run knows what to expect.

Per-entity DELETE calls are only needed when a specific section explicitly tests DELETE behavior (those sections include the DELETE step inline). Otherwise the throwaway-directory model handles cleanup.

Never leave the dev server running on :4111 after the report is filed — it blocks future runs.

Prerequisites

Working tree on the agent-builder feature branch (or any branch you want to QA).
pnpm (10.x) and node on $PATH. The scaffold uses pnpm install --ignore-workspace inside the project dir so the repo-level workspace doesn't interfere.
An OPENAI_API_KEY. Supply via --openai-key, export OPENAI_API_KEY in the shell, or let the scaffold prompt for it.
(Optional) WorkOS credentials for --auth on runs: --workos-api-key, --workos-client-id, --workos-organization-id.
Whichever browser MCP/tool the harness has access to. If none is available, run with --skip-browser and report UI as ⏭️ Skipped (no browser tool).

Project layout (scaffolded for you)

$PROJECT_DIR/                                    ← see "Project dir resolution" below
├── package.json                                 ← pnpm overrides → link:<worktree>/packages/*
├── tsconfig.json
├── .env                                         ← OPENAI_API_KEY (+ AUTH_PROVIDER + WORKOS_* on auth-on)
└── src/mastra/
    ├── index.ts                                 ← single Mastra instance, reads exported bindings from auth.ts
    ├── auth.ts                                  ← top-level switch(process.env.AUTH_PROVIDER); no-op when unset
    ├── agents/index.ts                          ← weather-agent (gpt-4o-mini)
    ├── tools/index.ts                           ← weather-info tool
    └── workflows/index.ts                       ← greet-workflow

The .env is the only thing that flips auth on/off — the same src/mastra/index.ts runs in both modes. Re-run scripts/scaffold.sh with or without --workos-* to switch.

Project dir resolution

$PROJECT_DIR is determined by every script (scaffold, preflight, wait-for-server) using this order:

--dir <path> flag
BUILDER_SMOKE_TEST_DIR env var (e.g. export BUILDER_SMOKE_TEST_DIR=~/code/builder-smoke)
~/mastra-builder-smoke-tests/builder-smoke (default)

For a long-lived setup, exporting BUILDER_SMOKE_TEST_DIR once in your shell rc is the lowest-friction option — every script picks it up automatically.

Running scripts (cwd matters)

All scripts under .claude/skills/builder-smoke-test/scripts/ resolve the worktree root from their own location. They can be invoked from anywhere, but conventionally the repo root.

Script	Run from	Notes
`scaffold.sh`	anywhere	Creates / refreshes `$PROJECT_DIR`. Forwards `--openai-key`, `--workos-*`, `--reuse`, `--dir`.
`preflight.sh`	anywhere	Calls `scaffold.sh` then asserts the resulting `.env` matches `--expect off\|on`.
`wait-for-server.sh`	anywhere	Hits `http://localhost:4111/api/agents`. cwd doesn't matter.
`seed-multi-user.sh`	anywhere	Inserts two skills owned by `user_seed_other` (1 public + 1 private) into the scaffold's libsql DB so non-owner / Library Copy flows can be tested without a second WorkOS account. Server must have booted at least once first. Idempotent.

Invoke them as bash .claude/skills/builder-smoke-test/scripts/<name>.sh. Don't cd into scripts/ first — relative path resolution will break.

pnpm mastra:dev must be run from $PROJECT_DIR (where the scaffolded package.json is).

How `mastra dev` reads env (important)

mastra dev loads $PROJECT_DIR/.env via dotenv and unconditionally overwrites process.env with whatever's there (packages/cli/src/commands/dev/dev.ts ~line 384). Practical consequences:

.env is the source of truth for the running server. Inline overrides like AUTH_PROVIDER= pnpm mastra:dev are silently clobbered.
Shell-only vars survive only if .env has no entry for the same key. Re-running scripts/scaffold.sh always overwrites .env, so to toggle modes, re-scaffold.
The auth mode the server actually runs in is determined by .env alone. A globally exported AUTH_PROVIDER=workos in your shell does NOT enable WorkOS auth in the server if .env doesn't have it — but it WILL leak into anything else this process runs, which is its own kind of confusing. Preflight flags this case.

Auth modes

Two states matter:

auth off — AUTH_PROVIDER is absent (or blank) in $PROJECT_DIR/.env. No WorkOS, no RBAC, no FGA. This is the state for the auth-off run.
auth on — AUTH_PROVIDER=workos plus WORKOS_API_KEY, WORKOS_CLIENT_ID, WORKOS_ORGANIZATION_ID all present in $PROJECT_DIR/.env. WorkOS authentication + role-based access + per-resource FGA all engage. This is the state for the auth-on runs. FGA is wired through the WorkOS auth provider — it can't be disabled independently.

To switch modes, re-run the scaffold with or without the --workos-* flags; that's faster and safer than hand-editing .env.

Detection: run preflight before each section

# Scaffold (or refresh) the project and assert the auth-off baseline:
bash .claude/skills/builder-smoke-test/scripts/preflight.sh --expect off \
  --openai-key "$OPENAI_API_KEY"

# Scaffold an auth-on project (re-runs scaffold with WorkOS keys, asserts auth on):
bash .claude/skills/builder-smoke-test/scripts/preflight.sh --expect on \
  --openai-key "$OPENAI_API_KEY" \
  --workos-api-key "$WORKOS_API_KEY" \
  --workos-client-id "$WORKOS_CLIENT_ID" \
  --workos-organization-id "$WORKOS_ORGANIZATION_ID"

Preflight chains scaffold.sh followed by validation checks (project exists with node_modules/@mastra/core, $PROJECT_DIR/.env has OPENAI_API_KEY, optional WorkOS keys present when --expect on, and auth mode matches --expect). Each failure prints a stable error code; this table tells the agent what to do.

Resolving missing env vars

If scaffold.sh or preflight.sh reports a missing OPENAI_API_KEY or WORKOS_* var, the agent must not silently source any rc file. Instead, work down this list and stop at the first one that resolves:

Check whether the var is already in the process env you can see (echo "${OPENAI_API_KEY:-<unset>}"). If yes, re-run scaffold with --openai-key "$OPENAI_API_KEY" (and equivalent for WorkOS).
Check whether the var is in $PROJECT_DIR/.env from a prior run (grep -E "^(OPENAI_API_KEY|WORKOS_)" "$PROJECT_DIR/.env" 2>/dev/null). If yes, you can pass --reuse to the next scaffold call.
If neither, look for rc files that exist on disk. Common candidates: ~/.zshrc, ~/.bashrc, ~/.zshenv, ~/.profile, ~/.env.global, and any project-local .env you find. Use ls -1 (or test -f) to confirm before listing — don't fabricate paths.
Ask the user in one message: "Can you paste the value(s), or give me permission to source one of these files?" Include the list of files that actually exist.

Only after the user explicitly approves a specific file, source it in a subshell and rerun preflight with the inherited env. Pattern:

# auth off
zsh -c 'source <approved-file> && bash .claude/skills/builder-smoke-test/scripts/preflight.sh --expect off --reuse'

# auth on (preflight auto-picks WORKOS_API_KEY / WORKOS_CLIENT_ID / WORKOS_ORGANIZATION_ID from the sourced env)
zsh -c 'source <approved-file> && bash .claude/skills/builder-smoke-test/scripts/preflight.sh --expect on --reuse'

Use bash -c instead of zsh -c if the approved file is a bashrc.

Never write the secret value back into any rc file, never export it into the user's interactive shell, and never echo it back in chat in full. Refer to it as <your-openai-key> once you've used it.

Error code	What it means	What the agent should do
`project-dir-missing`	`$PROJECT_DIR` is unset or the directory does not exist (scaffold did not run, or was given a bad `--dir`).	Re-run preflight without `--skip-scaffold`, or pass an existing `--dir <path>` that scaffold has already populated.
`scaffold-failed`	`scripts/scaffold.sh` returned non-zero.	Re-run scaffold with `--no-reuse` to force a fresh install. Inspect the printed `pnpm install` output for the real error.
`project-deps-missing`	`$PROJECT_DIR/node_modules/@mastra/core` missing after scaffold.	Re-run scaffold without `--reuse` to force a fresh install. If that still fails, delete `$PROJECT_DIR` and re-run.
`openai-key-missing-in-project-env`	`$PROJECT_DIR/.env` has no usable `OPENAI_API_KEY`.	Follow the "Resolving missing env vars" section above. Re-run preflight with `--openai-key <value>` once you have it.
`workos-keys-missing-in-project-env`	`--expect on` but one or more of `WORKOS_API_KEY` / `WORKOS_CLIENT_ID` / `WORKOS_ORGANIZATION_ID` is absent or blank in `.env`.	Follow the "Resolving missing env vars" section above. Re-run preflight with all three `--workos-*` flags.
`mode-mismatch`	`--expect` disagrees with the auth mode detected from `$PROJECT_DIR/.env`.	Re-run the scaffold with (auth on) or without (auth off) `--workos-*` flags. The scaffold is idempotent for the parts that don't change.
`bad-expect-value`	`--expect` got something other than `off` or `on`.	Fix the invocation. (Parser also rejects flag-like values at parse time with exit 2.)

.env policy: the scaffold owns $PROJECT_DIR/.env. Re-running scaffold overwrites it. Do not hand-edit the scaffolded .env; instead, re-run scaffold with different flags. (The skill never edits .env files outside $PROJECT_DIR.)

Extracting the session cookie for curl (auth on)

The WorkOS session cookie is httpOnly, so document.cookie and Stagehand's extract cannot read it from a normal page. To hit authenticated endpoints from curl after a browser SSO login, the scaffold exposes a tiny debug route gated by an env var:

Add SMOKE_TEST_COOKIE_LEAK=1 to $PROJECT_DIR/.env (single line append; the scaffold leaves this var alone on re-run as long as the file already exists).
Restart mastra dev so the new env is picked up.
Sign in once in the Stagehand browser (stagehand_navigate to http://localhost:4111, complete WorkOS SSO).
From the same browser tab, navigate to http://localhost:4111/smoke-test/cookie and use stagehand_extract to read the page body. The page is a single text/plain line containing the request's Cookie header verbatim (e.g. wos_session=…).
Export it once: export COOKIE='<the-string-from-step-4>'. From here on, every authenticated curl is curl -H "Cookie: $COOKIE" "$BASE/…".

The route is only registered when SMOKE_TEST_COOKIE_LEAK=1 and is intentionally insecure — never enable it in a real project. The WORKOS_COOKIE_PASSWORD written by the scaffold is derived from $PROJECT_DIR, so the cookie value stays valid across mastra dev restarts within the same scaffold; you only need to repeat step 4 if you re-scaffold to a new directory.

/smoke-test/cookie returns 404? Always an env-ordering issue. The apiRoutes list is built once when mastra dev boots from process.env.SMOKE_TEST_COOKIE_LEAK. The flag has to be in .env before the boot — adding it after start has no effect until you restart. If you see a 404, run grep SMOKE_TEST_COOKIE_LEAK "$PROJECT_DIR/.env", then stop and restart mastra dev. Don't pivot to "UI only" because of this.

Seeding non-owner skills (Library Copy / non-owner flows)

A fresh scaffold has zero skills, and everything created through the API is owned by either the auth-off "no caller" (no authorId) or the currently signed-in user under auth-on. To exercise flows that require a skill owned by someone else (Library Copy, non-owner read-only view, private-skill visibility from a non-owner) without provisioning a second WorkOS account, run the seed script after the server has booted at least once:

# Start the server once so libsql initializes the skills tables.
cd $PROJECT_DIR
pnpm mastra:dev                # leave running, then in another shell:

bash .claude/skills/builder-smoke-test/scripts/seed-multi-user.sh
# → seeds smoke-seed-public-skill  (visibility=public,  status=published)
#         smoke-seed-private-skill (visibility=private, status=published)
#   both owned by authorId='user_seed_other'

The script writes directly to $PROJECT_DIR/src/mastra/public/mastra.db via the sqlite3 CLI (no Node deps). It's idempotent — re-running replaces the seeded rows. Use the seeded skills wherever a reference file asks for "a skill owned by another user"; clean them up with DELETE curls against /api/stored/skills/:id or by re-scaffolding.

Starting the dev server

If the server is not running on :4111, the Setup section starts it. The convenience helpers live under scripts/:

# Scaffold + preflight (writes .env, installs deps, detects auth mode)
bash .claude/skills/builder-smoke-test/scripts/preflight.sh --expect off

# Start the server from the scaffolded project
cd ~/mastra-builder-smoke-tests/builder-smoke
pnpm mastra:dev

# Poll /api/agents until 200 (60s budget). Detects mastra dev's port-bump.
bash .claude/skills/builder-smoke-test/scripts/wait-for-server.sh

wait-for-server.sh probes /api/agents — not / — because the SPA shell can return 200 before the API mounts. If it reports the server is up on :4112+ instead of :4111, mastra dev fell through to the next port; stop, free :4111, and restart. Continuing on a non-default port silently breaks every curl in every reference.

API base URL

Every reference assumes $BASE is exported. Set it once at the start of the run:

export BASE=http://localhost:4111/api

All curl examples in the references use $BASE and won't work in a shell that hasn't exported it.

Quick reference: key endpoints

This table lists the surfaces an agent will hit and where to look for the authoritative request/response shape. Don't copy curl blocks from here — run the per-section commands in references/<section>.md.

Surface	Endpoint
Builder settings	`GET /editor/builder/settings`
Builder infra	`GET /editor/builder/infrastructure`
Registries (list)	`GET /editor/builder/registries`
Registry search	`GET /editor/builder/registries/:registryId/search?q=…`
Registry popular	`GET /editor/builder/registries/:registryId/popular`
Registry preview	`GET /editor/builder/registries/:registryId/preview?owner=…&repo=…&path=…`
Registry install	`POST /editor/builder/registries/:registryId/install`
Workspace CRUD	`GET/POST/PATCH/DELETE /stored/workspaces[/:id]`
Agent CRUD	`GET/POST/PATCH/DELETE /stored/agents[/:id]`
Agent favorite	`PUT / DELETE /stored/agents/:id/favorite`
Agent avatar	`PATCH /stored/agents/:id` with `metadata.avatarUrl` (owner-only)
Skill CRUD	`GET/POST/PATCH/DELETE /stored/skills[/:id]`
Skill publish	`POST /stored/skills/:id/publish`
Skill favorite	`PUT / DELETE /stored/skills/:id/favorite`
Auth me	`GET /api/auth/me` (returns logged-in user + roles + permissions)
Auth refresh	`POST /auth/refresh`

Builder Studio routes

Feature	Route
Agent Builder shell	`/agent-builder`
Agents (default view)	`/agent-builder`
Agent detail (view)	`/agent-builder/agents/:id/view` (bare `:id` redirects to `/view`)
Agent detail (edit)	`/agent-builder/agents/:id/edit`
Skills	`/agent-builder/skills`
Library (public skills)	`/agent-builder/library`
Skill detail	`/agent-builder/skills/:id/edit` (owner) or `/agent-builder/skills/:id/view` (non-owner)
Workspaces	`/agent-builder/workspaces`
Infrastructure	`/agent-builder/infrastructure` (readable by every default role — see `infrastructure.md`)

Mobile renders a bottom-bar with the same primary entries.

Browser smoke

Use whichever browser tool the harness has wired up (Stagehand, Chrome MCP, etc.). Don't assume a specific provider — discover what's available, then drive the same checklist in references/ui.md.

The scaffolded project registers StagehandBrowser (matching examples/agent-builder). If BROWSERBASE_* keys aren't set in the shell, Stagehand falls back to local Playwright; that's fine for smoke. If neither Stagehand nor a local browser is reachable, mark UI as ⏭️ Skipped (no browser provider).

Result reporting

After testing, provide:

## Builder Smoke Test Results

**Date**: <date>
**Branch**: <branch>
**Commit**: <short sha>
**Server**: scaffolded project @ localhost:4111 (`$PROJECT_DIR`)
**Auth**: on / off / auto-skipped

| #   | Section            | Status   | Notes                           |
| --- | ------------------ | -------- | ------------------------------- |
| 1   | Setup              | ✅/❌    |                                 |
| 2   | Workspace          | ✅/❌    |                                 |
| 3   | Reconciliation     | ✅/❌/⏭️ |                                 |
| 4   | Defaults           | ✅/❌    |                                 |
| 5   | Model Policy       | ✅/❌    |                                 |
| 6   | Skills             | ✅/❌    |                                 |
| 7   | Registry           | ✅/❌    |                                 |
| 8   | Agents             | ✅/❌    |                                 |
| 9   | Pickers            | ✅/❌    |                                 |
| 10  | Stars              | ✅/❌    |                                 |
| 11  | Permissions / RBAC | ✅/❌    |                                 |
| 12  | Infrastructure     | ✅/❌    |                                 |
| 13  | Channels           | ✅/❌    |                                 |
| 14  | UI                 | ✅/❌/⏭️ |                                 |
| 15  | Auth               | ✅/❌/⏭️ | (skipped if no WORKOS\_\* vars) |

**Product issues**: (list any — server/UI behaved unexpectedly. For each: HTTP method + path or UI route, expected vs actual, one-sentence guess at the cause. Do not pre-decide "known bug" — log what the server actually did. Say "none" if empty.)
**Skill issues**: (list any — the skill itself was wrong, unclear, stale, or unreachable. For each: which file + step (e.g. `references/skills.md` step F2), and what was wrong. Doc drift, not product bugs. Say "none" if empty.)

**Verify before filing.** Before adding anything to either list, re-confirm against the live response in this run, not memory of an earlier call:

- For any **shape mismatch / missing field / wrong key name** claim, paste the actual JSON fragment (or the relevant keys) directly under the bullet so the claim is reproducible. If the skill says `features.agent.skills` and the response has `features.agent.skills`, that is not a skill issue — names that look similar in passing (`featSkills`, `agent.features.skill`, etc.) are easy to misread.
- For any **endpoint inconsistency** claim (e.g. "endpoint A returns X but B returns Y"), re-curl both endpoints fresh in the same run rather than reusing a stale response from earlier in the section.
- For any **RBAC / authz** claim (403 where you expected 200, or vice versa), check `references/permissions.md` for the matrix _and_ check the "Design decisions" list in this file. Several roles intentionally share `*:read`, which means infra/list/get endpoints look "ungated" but are working as intended. Also confirm the cookie you sent belongs to the role you think it does (`curl -H "Cookie: $(cat /tmp/cookie.txt)" $BASE/auth/me | jq '.role // .roles'`).
- For any **missing endpoint** claim (e.g. "agent avatar 404"), confirm the contract first — several flows are client-composed on top of generic CRUD (avatar = `PATCH metadata.avatarUrl`; Library Copy = `POST /stored/skills` with `metadata.origin`). The "Design decisions (don't file as bugs)" section enumerates the common ones.
- If a claim can't be reproduced on a fresh request, drop it.
  **Regressions**: (list any behavioral changes from a previous run)
  **Warnings**: (e.g., dev-server crash on `/auth/refresh` polling, OPENAI_API_KEY required at startup)
  **Skipped sections**: (list with reason)

Known rough edges

The branch has accumulated minor papercuts. Note these in your report only if you hit them; don't fail the run on them:

Don't rm $PROJECT_DIR/mastra.db by hand while the server is up — stop the server first, then delete.
Dev server can crash on hot-reload from /auth/refresh polling. Restart and continue.
OPENAI_API_KEY is required at startup — server won't boot without it, even if you only test non-LLM surfaces.
mastra dev overwrites process.env from .env at boot, so inline env overrides on the command line don't reach the server. Re-run scaffold to change .env.
The scaffold links against the current worktree's packages via link: overrides. If you switch worktrees, re-run scaffold so the symlinks point at the right tree.

Design decisions (don't file as bugs)

These have come up across multiple runs and are intentional. If you observe one, note it in your report as "expected behavior" — do not open a product issue.

GET /auth/me without a cookie returns 200 with a null-ish body. The route is mounted as a public route (createPublicRoute); the contract is "return the current user or null", not "401 if missing". A 401 here would break the public app shell.
/editor/builder/infrastructure is readable by every default role (admin / member / viewer). The handler gates on infrastructure:read and every default role has *:read, which matches by resource-wildcard. The page only exposes deployment-shape data (provider names, registered flags, configured/unconfigured booleans) — no secrets.
Flipping a skill's visibility from private to public does not auto-publish unless the skill has a registered skillPath. Visibility and publication are independent fields by design. A plain-create skill flipped public stays at activeVersionId: null until a real POST /publish runs against a source path.
Zod schema validation runs before the permission middleware on /stored/* writes. A malformed body from a viewer returns a 400, not a 403. This is standard request lifecycle; the response surface doesn't leak resource state.
The role-impersonation picker only lists roles different from the current one. Logged in as admin, you'll see Member and Viewer and nothing else — there is no Admin self-item. This is intentional (admin is the baseline; you're already there).
Impersonation is UI-only. The API still answers per the real logged-in role. A curl while impersonating viewer will still return the admin's response.
Favorites sidebar entry links to /agent-builder/favorite (singular). The plural /favorites is not a registered route and renders the React Router 404. Use the sidebar link or the singular URL when scripting.
Avatar upload uses agent PATCH with metadata.avatarUrl, not a dedicated /avatar endpoint. See references/agents.md.
Copy is client-side. There is no POST /stored/skills/:id/copy. The UI fetches the source skill and POSTs a new row to /stored/skills with metadata.origin = "library-copy". See references/registry.md.

Out of smoke-test scope

Some flows are documented in references/ but are not driven by the smoke-test agent because they require server-lifecycle gymnastics that don't fit a single run:

Reconciliation steps 2/3/4/6 (references/reconciliation.md) require editing $PROJECT_DIR/src/mastra/index.ts (changing basePath / workspaceId / config), restarting mastra dev multiple times, and observing drift detection or orphan archival across restarts. The smoke-test agent runs only Step 1 (fresh-startup persistence) and Step 5 (non-builder workspaces untouched). Run the rest by hand when changing reconciliation code.
Real role-swap testing (logging in as multiple WorkOS users with different roles in the same run) is out of scope. The agent verifies whichever role the live --role user actually has, and additionally exercises the UI-only role impersonation flow under --role admin (see references/ui.md).

References

references/setup.md — server health, builder settings sanity, baseline counts, builder workspace existence
references/workspace.md — workspace CRUD via API
references/reconciliation.md — config-driven workspace lifecycle (fresh, idempotent, drift, archival, backfill)
references/defaults.md — builder defaults applied at agent create (memory, workspace, browser, model)
references/model-policy.md — allowed list, default model, dropdown filtering, rejection
references/skills.md — skill CRUD, visibility, publish, filesystem writes, files array
references/registry.md — skills.sh browse/install, library Copy flow, origin badges, gating
references/agents.md — stored agent CRUD, skill attachment, model swap, delete-from-edit, avatar upload
references/picker-allowlist.md — tools/agents/workflows pickers respect allowlists
references/favorites.md — favorite/unfavorite agents and skills, idempotency (formerly stars.md)
references/permissions.md — viewer/member/admin/owner gating, role expectation matrix, UI impersonation, auth-off bypass
references/infrastructure.md — /editor/builder/infrastructure payload + UI
references/channels.md — Slack provider visibility, connectChannel tool
references/ui.md — browser checklist across Builder routes
references/auth.md — WorkOS on/off, 401 behavior, authorId, mode-toggle via .env
scripts/scaffold.sh — scaffold or refresh the hermetic project at $PROJECT_DIR
scripts/preflight.sh — wraps scaffold.sh + mode expectation (--expect off|on)
scripts/wait-for-server.sh — poll :4111 until healthy