name: builder-smoke-test description: Smoke test the Agent Builder feature branch end-to-end against a hermetic project scaffolded by the skill (linked to the current worktree). Covers workspace reconciliation, stored agents/skills CRUD, ownership, visibility, stars, registry/library Copy flow, picker allowlists, model policy, RBAC role gating, role impersonation UI, builder defaults, infrastructure diagnostics, channels, and Studio + Agent Builder UI. Trigger when validating the agent-builder feature branch, PRs that touch packages/server, packages/playground, packages/playground-ui agent-builder routes, or builder EE code paths.
Builder Smoke Test
End-to-end smoke testing of the Agent Builder feature set against a hermetic project the skill scaffolds at ~/mastra-builder-smoke-tests/builder-smoke (configurable). The project links to the current worktree via pnpm link: overrides, so changes to packages under packages/, stores/, auth/, channels/, observability/, browser/, and client-sdks/ take effect on the next mastra dev restart.
This skill is for branch QA — it complements the release-time mastra-smoke-test. It exercises the Builder EE surface (stored entities, RBAC, registry, infra, channels) using a minimal, predictable project rather than the kitchen-sink examples/agent.
⚠️ Mandatory Test Checklist
Use task_write to track progress. Run ALL sections unless --test or --scope narrows the run.
Do not skip sections unless you hit an actual blocker. "Seemed complex" or "I'll come back to it" are not valid reasons. Attempt every step — only stop when you literally cannot proceed. Report what you tried and what blocked you.
| # | Section | Reference | When required |
|---|---|---|---|
| 1 | Setup | references/setup.md |
Always |
| 2 | Workspace | references/workspace.md |
--test workspace or full |
| 3 | Reconciliation | references/reconciliation.md |
Steps 1 + 5 only; steps 2/3/4/6 are out of smoke-test scope (see below) |
| 4 | Defaults | references/defaults.md |
--test defaults or full |
| 5 | Model Policy | references/model-policy.md |
--test model-policy or full |
| 6 | Skills | references/skills.md |
--test skills or full |
| 7 | Registry | references/registry.md |
--test registry or full |
| 8 | Agents | references/agents.md |
--test agents or full |
| 9 | Picker Allowlists | references/picker-allowlist.md |
--test pickers or full |
| 10 | Favorites | references/favorites.md |
--test favorites or full (formerly stars) |
| 11 | Permissions / RBAC | references/permissions.md |
--test permissions or full |
| 12 | Infrastructure | references/infrastructure.md |
--test infrastructure or full |
| 13 | Channels | references/channels.md |
--test channels or full |
| 14 | UI | references/ui.md |
--test ui or full |
| 15 | Auth | references/auth.md |
--test auth or --auth on |
Execution flow
- Confirm the project directory. Before scaffolding, ask the user where they want
$PROJECT_DIRto live. Offer the default (~/mastra-builder-smoke-tests/builder-smoke) as a suggestion. Skip the question if they already passed--diror have$BUILDER_SMOKE_TEST_DIRexported. Seereferences/setup.mdstep 0. - Read the reference file for each section you're about to run.
- Under
--auth on, extract the session cookie before running any other section. The WorkOS cookie ishttpOnly, socurlcannot mint it anddocument.cookiecannot read it. The scaffold ships a debug route atGET /smoke-test/cookiegated bySMOKE_TEST_COOKIE_LEAK=1. Follow the "Extracting the session cookie for curl (auth on)" section below before touching any auth-on endpoint. Do not pivot to UI-only testing because curl is "blocked" — the cookie route is the unblock path. - Seed non-owner data after the server has booted at least once. A fresh scaffold has no skills authored by anyone other than the test user, which makes non-owner / Library Copy / non-owner visibility / non-admin stars flows untestable. Run
bash .claude/skills/builder-smoke-test/scripts/seed-multi-user.sh(or with--dir $PROJECT_DIR) before sections 6 (Skills), 7 (Registry), and 10 (Stars). The script is idempotent and bypasses RBAC by writing directly to libsql, so it works regardless of--authmode or current role. Do not mark non-owner steps as "blocked" without running this first. - Execute the steps — use
curlfor API checks (with-H "Cookie: $COOKIE"under--auth on), whichever browser tool the harness has wired up (Stagehand, Chrome MCP, etc.) for UI checks. - Record results in the summary table.
- Mark the section complete with
task_writebefore moving to the next.
Partial testing (--test)
If --test is provided:
- Always run Setup.
- Run only the specified section(s).
- Skip everything else.
Example: --test skills,registry,agents → Setup + Skills + Registry + Agents.
Scope shortcuts (--scope)
--scope runs a curated group of related sections. Setup is always implied.
| Scope | Includes |
|---|---|
rbac |
permissions, auth |
skills |
skills, registry, defaults |
agents |
agents, pickers, defaults, model-policy |
infra |
infrastructure, channels, reconciliation |
ui |
ui |
quick |
workspace, skills, agents, favorites, ui (skips long-running) |
--scope and --test can be combined; the union is run.
Usage
# Full smoke (interactive)
/builder-smoke-test
# Specific sections
/builder-smoke-test --test workspace,skills
/builder-smoke-test --test agents,favorites
/builder-smoke-test --test reconciliation
/builder-smoke-test --test ui
# Scope shortcuts
/builder-smoke-test --scope rbac
/builder-smoke-test --scope skills
/builder-smoke-test --scope quick
# Force auth on / off (otherwise auto-detected from WORKOS_* env vars)
/builder-smoke-test --auth on
/builder-smoke-test --auth off
# Run auth-on as a non-admin role (must match the logged-in user's actual role)
/builder-smoke-test --auth on --role viewer
/builder-smoke-test --auth on --role member
# Skip the browser pass (API-only run)
/builder-smoke-test --skip-browser
Parameters
| Parameter | Description | Default |
|---|---|---|
--test |
Comma-separated section names (see table above). | (all sections) |
--scope |
Named group of sections (rbac, skills, agents, infra, ui, quick). Combinable with --test. |
(none) |
--auth |
on, off, or auto. auto enables the Auth section iff WORKOS_CLIENT_ID + WORKOS_API_KEY are set. |
auto |
--role |
Expected role of the logged-in user under --auth on: owner, admin, member, or viewer. Setup asserts the live /api/auth/me roles match; on mismatch the run stops and the user is told to either change their WorkOS role or re-run with the correct --role. Ignored under --auth off. |
admin |
--clean |
Delete test entities (smoke-test workspaces / agents / skills) at the end of each section. | false |
--skip-browser |
Run only API/curl checks. UI section is skipped. |
false |
--dir |
Project directory the skill scaffolds into. Forwarded to scripts/scaffold.sh. Also reads $BUILDER_SMOKE_TEST_DIR from the environment when the flag is omitted. |
~/mastra-builder-smoke-tests/builder-smoke |
--reuse |
If the project already exists at $PROJECT_DIR and has node_modules/@mastra/core, skip pnpm install. Forwarded to scripts/scaffold.sh. |
false |
--openai-key |
OPENAI_API_KEY value to write into the scaffolded .env. If omitted, the scaffold script falls back to $OPENAI_API_KEY in the shell, then to an interactive prompt. |
(shell or prompt) |
--workos-api-key--workos-client-id--workos-organization-id |
All three are required together to scaffold an auth-on project. Writes AUTH_PROVIDER=workos plus the three keys plus WORKOS_REDIRECT_URI=http://localhost:4111/api/auth/callback into .env. |
(auth off) |
If --auth auto and no WorkOS env vars are present, the Auth section is auto-skipped and reported as ⏭️ Skipped (no WORKOS_* env vars).
Canonical order
When running multiple sections, execute them in the order shown in the section table (1 → 15). The order is intentional:
- Setup must run first — preflight + readiness probe gate every later section.
- Workspace / Reconciliation / Defaults / Model Policy establish that the server's view of the project matches what the rest of the run assumes. Run them before any CRUD pass.
- Skills → Registry → Agents → Pickers → Stars is a build-up: agents reference skills, pickers depend on the entities created above.
- Permissions / Infrastructure / Channels / UI are read-mostly inspections that benefit from existing entities.
- Auth runs last because it requires restarting
mastra devwith a different.env.
If --test or --scope narrows the run, keep the relative order — just
skip the sections that fall outside the selection.
Required vs optional reference tiers
References fall into three tiers; an agent should treat them accordingly:
- Required (every run):
setup.md. Any failure here blocks the rest of the run. - Standard (default tiers for
full,quick, scope shortcuts):workspace.md,skills.md,agents.md,favorites.md,ui.md(core),auth.mdwhen--auth on. - Extended (only when explicitly selected via
--test/--scopeor the matching code surface changed):reconciliation.md,defaults.md,model-policy.md,registry.md,picker-allowlist.md,permissions.md,infrastructure.md,channels.md,ui.mdextended tier.
When skipping an extended section, mark it ⏭️ Skipped (not in scope)
in the result table — don't silently omit it.
Cleanup
The scaffold is a self-contained throwaway directory at $PROJECT_DIR. All
fixture state (workspaces, agents, skills, libsql DB, .mastra/workspace
files) lives inside it. The smoke test never writes to anything outside
$PROJECT_DIR (other than the dev server it runs).
At the end of every run:
- Stop the dev server (
kill $(lsof -i :4111 -sTCP:LISTEN -t)or foregroundCtrl-C). - Choose how to dispose of fixture state:
- Reuse: leave
$PROJECT_DIRin place. The next run can pass--reuse(or--skip-scaffoldto preflight) and pick up where this one left off. Fastest for iterating. - Reset:
rm -rf "$PROJECT_DIR"(or re-runscripts/scaffold.shwithout--reuse). Cheapest way to get back to a known-clean state. Don't bother per-entity DELETE — the directory IS the state.
- Reuse: leave
- If a section bailed mid-flight (assertion failure, network error), record the partial state in the report's Issues section so the next run knows what to expect.
Per-entity DELETE calls are only needed when a specific section explicitly tests DELETE behavior (those sections include the DELETE step inline). Otherwise the throwaway-directory model handles cleanup.
Never leave the dev server running on :4111 after the report is filed —
it blocks future runs.
Prerequisites
- Working tree on the agent-builder feature branch (or any branch you want to QA).
pnpm(10.x) andnodeon$PATH. The scaffold usespnpm install --ignore-workspaceinside the project dir so the repo-level workspace doesn't interfere.- An
OPENAI_API_KEY. Supply via--openai-key, exportOPENAI_API_KEYin the shell, or let the scaffold prompt for it. - (Optional) WorkOS credentials for
--auth onruns:--workos-api-key,--workos-client-id,--workos-organization-id. - Whichever browser MCP/tool the harness has access to. If none is available, run with
--skip-browserand report UI as⏭️ Skipped (no browser tool).
Project layout (scaffolded for you)
$PROJECT_DIR/ ← see "Project dir resolution" below
├── package.json ← pnpm overrides → link:<worktree>/packages/*
├── tsconfig.json
├── .env ← OPENAI_API_KEY (+ AUTH_PROVIDER + WORKOS_* on auth-on)
└── src/mastra/
├── index.ts ← single Mastra instance, reads exported bindings from auth.ts
├── auth.ts ← top-level switch(process.env.AUTH_PROVIDER); no-op when unset
├── agents/index.ts ← weather-agent (gpt-4o-mini)
├── tools/index.ts ← weather-info tool
└── workflows/index.ts ← greet-workflow
The .env is the only thing that flips auth on/off — the same src/mastra/index.ts runs in both modes. Re-run scripts/scaffold.sh with or without --workos-* to switch.
Project dir resolution
$PROJECT_DIR is determined by every script (scaffold, preflight, wait-for-server) using this order:
--dir <path>flagBUILDER_SMOKE_TEST_DIRenv var (e.g.export BUILDER_SMOKE_TEST_DIR=~/code/builder-smoke)~/mastra-builder-smoke-tests/builder-smoke(default)
For a long-lived setup, exporting BUILDER_SMOKE_TEST_DIR once in your shell rc is the lowest-friction option — every script picks it up automatically.
Running scripts (cwd matters)
All scripts under .claude/skills/builder-smoke-test/scripts/ resolve the worktree root from their own location. They can be invoked from anywhere, but conventionally the repo root.
| Script | Run from | Notes |
|---|---|---|
scaffold.sh |
anywhere | Creates / refreshes $PROJECT_DIR. Forwards --openai-key, --workos-*, --reuse, --dir. |
preflight.sh |
anywhere | Calls scaffold.sh then asserts the resulting .env matches --expect off|on. |
wait-for-server.sh |
anywhere | Hits http://localhost:4111/api/agents. cwd doesn't matter. |
seed-multi-user.sh |
anywhere | Inserts two skills owned by user_seed_other (1 public + 1 private) into the scaffold's libsql DB so non-owner / Library Copy flows can be tested without a second WorkOS account. Server must have booted at least once first. Idempotent. |
Invoke them as bash .claude/skills/builder-smoke-test/scripts/<name>.sh. Don't cd into scripts/ first — relative path resolution will break.
pnpm mastra:dev must be run from $PROJECT_DIR (where the scaffolded package.json is).
How mastra dev reads env (important)
mastra dev loads $PROJECT_DIR/.env via dotenv and unconditionally overwrites process.env with whatever's there (packages/cli/src/commands/dev/dev.ts ~line 384). Practical consequences:
.envis the source of truth for the running server. Inline overrides likeAUTH_PROVIDER= pnpm mastra:devare silently clobbered.- Shell-only vars survive only if
.envhas no entry for the same key. Re-runningscripts/scaffold.shalways overwrites.env, so to toggle modes, re-scaffold. - The auth mode the server actually runs in is determined by
.envalone. A globally exportedAUTH_PROVIDER=workosin your shell does NOT enable WorkOS auth in the server if.envdoesn't have it — but it WILL leak into anything else this process runs, which is its own kind of confusing. Preflight flags this case.
Auth modes
Two states matter:
- auth off —
AUTH_PROVIDERis absent (or blank) in$PROJECT_DIR/.env. No WorkOS, no RBAC, no FGA. This is the state for the auth-off run. - auth on —
AUTH_PROVIDER=workosplusWORKOS_API_KEY,WORKOS_CLIENT_ID,WORKOS_ORGANIZATION_IDall present in$PROJECT_DIR/.env. WorkOS authentication + role-based access + per-resource FGA all engage. This is the state for the auth-on runs. FGA is wired through the WorkOS auth provider — it can't be disabled independently.
To switch modes, re-run the scaffold with or without the --workos-* flags; that's faster and safer than hand-editing .env.
Detection: run preflight before each section
# Scaffold (or refresh) the project and assert the auth-off baseline:
bash .claude/skills/builder-smoke-test/scripts/preflight.sh --expect off \
--openai-key "$OPENAI_API_KEY"
# Scaffold an auth-on project (re-runs scaffold with WorkOS keys, asserts auth on):
bash .claude/skills/builder-smoke-test/scripts/preflight.sh --expect on \
--openai-key "$OPENAI_API_KEY" \
--workos-api-key "$WORKOS_API_KEY" \
--workos-client-id "$WORKOS_CLIENT_ID" \
--workos-organization-id "$WORKOS_ORGANIZATION_ID"
Preflight chains scaffold.sh followed by validation checks (project exists with node_modules/@mastra/core, $PROJECT_DIR/.env has OPENAI_API_KEY, optional WorkOS keys present when --expect on, and auth mode matches --expect). Each failure prints a stable error code; this table tells the agent what to do.
Resolving missing env vars
If scaffold.sh or preflight.sh reports a missing OPENAI_API_KEY or WORKOS_* var, the agent must not silently source any rc file. Instead, work down this list and stop at the first one that resolves:
Check whether the var is already in the process env you can see (
echo "${OPENAI_API_KEY:-<unset>}"). If yes, re-run scaffold with--openai-key "$OPENAI_API_KEY"(and equivalent for WorkOS).Check whether the var is in
$PROJECT_DIR/.envfrom a prior run (grep -E "^(OPENAI_API_KEY|WORKOS_)" "$PROJECT_DIR/.env" 2>/dev/null). If yes, you can pass--reuseto the next scaffold call.If neither, look for rc files that exist on disk. Common candidates:
~/.zshrc,~/.bashrc,~/.zshenv,~/.profile,~/.env.global, and any project-local.envyou find. Usels -1(ortest -f) to confirm before listing — don't fabricate paths.Ask the user in one message: "Can you paste the value(s), or give me permission to source one of these files?" Include the list of files that actually exist.
Only after the user explicitly approves a specific file, source it in a subshell and rerun preflight with the inherited env. Pattern:
# auth off zsh -c 'source <approved-file> && bash .claude/skills/builder-smoke-test/scripts/preflight.sh --expect off --reuse' # auth on (preflight auto-picks WORKOS_API_KEY / WORKOS_CLIENT_ID / WORKOS_ORGANIZATION_ID from the sourced env) zsh -c 'source <approved-file> && bash .claude/skills/builder-smoke-test/scripts/preflight.sh --expect on --reuse'Use
bash -cinstead ofzsh -cif the approved file is a bashrc.Never write the secret value back into any rc file, never
exportit into the user's interactive shell, and never echo it back in chat in full. Refer to it as<your-openai-key>once you've used it.
| Error code | What it means | What the agent should do |
|---|---|---|
project-dir-missing |
$PROJECT_DIR is unset or the directory does not exist (scaffold did not run, or was given a bad --dir). |
Re-run preflight without --skip-scaffold, or pass an existing --dir <path> that scaffold has already populated. |
scaffold-failed |
scripts/scaffold.sh returned non-zero. |
Re-run scaffold with --no-reuse to force a fresh install. Inspect the printed pnpm install output for the real error. |
project-deps-missing |
$PROJECT_DIR/node_modules/@mastra/core missing after scaffold. |
Re-run scaffold without --reuse to force a fresh install. If that still fails, delete $PROJECT_DIR and re-run. |
openai-key-missing-in-project-env |
$PROJECT_DIR/.env has no usable OPENAI_API_KEY. |
Follow the "Resolving missing env vars" section above. Re-run preflight with --openai-key <value> once you have it. |
workos-keys-missing-in-project-env |
--expect on but one or more of WORKOS_API_KEY / WORKOS_CLIENT_ID / WORKOS_ORGANIZATION_ID is absent or blank in .env. |
Follow the "Resolving missing env vars" section above. Re-run preflight with all three --workos-* flags. |
mode-mismatch |
--expect disagrees with the auth mode detected from $PROJECT_DIR/.env. |
Re-run the scaffold with (auth on) or without (auth off) --workos-* flags. The scaffold is idempotent for the parts that don't change. |
bad-expect-value |
--expect got something other than off or on. |
Fix the invocation. (Parser also rejects flag-like values at parse time with exit 2.) |
.env policy: the scaffold owns $PROJECT_DIR/.env. Re-running scaffold overwrites it. Do not hand-edit the scaffolded .env; instead, re-run scaffold with different flags. (The skill never edits .env files outside $PROJECT_DIR.)
Extracting the session cookie for curl (auth on)
The WorkOS session cookie is httpOnly, so document.cookie and Stagehand's
extract cannot read it from a normal page. To hit authenticated endpoints
from curl after a browser SSO login, the scaffold exposes a tiny debug
route gated by an env var:
- Add
SMOKE_TEST_COOKIE_LEAK=1to$PROJECT_DIR/.env(single line append; the scaffold leaves this var alone on re-run as long as the file already exists). - Restart
mastra devso the new env is picked up. - Sign in once in the Stagehand browser (
stagehand_navigatetohttp://localhost:4111, complete WorkOS SSO). - From the same browser tab, navigate to
http://localhost:4111/smoke-test/cookieand usestagehand_extractto read the page body. The page is a singletext/plainline containing the request'sCookieheader verbatim (e.g.wos_session=…). - Export it once:
export COOKIE='<the-string-from-step-4>'. From here on, every authenticated curl iscurl -H "Cookie: $COOKIE" "$BASE/…".
The route is only registered when SMOKE_TEST_COOKIE_LEAK=1 and is intentionally insecure — never enable it in a real project. The WORKOS_COOKIE_PASSWORD written by the scaffold is derived from $PROJECT_DIR, so the cookie value stays valid across mastra dev restarts within the same scaffold; you only need to repeat step 4 if you re-scaffold to a new directory.
/smoke-test/cookiereturns 404? Always an env-ordering issue. TheapiRouteslist is built once whenmastra devboots fromprocess.env.SMOKE_TEST_COOKIE_LEAK. The flag has to be in.envbefore the boot — adding it after start has no effect until you restart. If you see a 404, rungrep SMOKE_TEST_COOKIE_LEAK "$PROJECT_DIR/.env", then stop and restartmastra dev. Don't pivot to "UI only" because of this.
Seeding non-owner skills (Library Copy / non-owner flows)
A fresh scaffold has zero skills, and everything created through the API
is owned by either the auth-off "no caller" (no authorId) or the
currently signed-in user under auth-on. To exercise flows that require a
skill owned by someone else (Library Copy, non-owner read-only view,
private-skill visibility from a non-owner) without provisioning a second
WorkOS account, run the seed script after the server has booted at least
once:
# Start the server once so libsql initializes the skills tables.
cd $PROJECT_DIR
pnpm mastra:dev # leave running, then in another shell:
bash .claude/skills/builder-smoke-test/scripts/seed-multi-user.sh
# → seeds smoke-seed-public-skill (visibility=public, status=published)
# smoke-seed-private-skill (visibility=private, status=published)
# both owned by authorId='user_seed_other'
The script writes directly to $PROJECT_DIR/src/mastra/public/mastra.db
via the sqlite3 CLI (no Node deps). It's idempotent — re-running
replaces the seeded rows. Use the seeded skills wherever a reference
file asks for "a skill owned by another user"; clean them up with
DELETE curls against /api/stored/skills/:id or by re-scaffolding.
Starting the dev server
If the server is not running on :4111, the Setup section starts it. The convenience helpers live under scripts/:
# Scaffold + preflight (writes .env, installs deps, detects auth mode)
bash .claude/skills/builder-smoke-test/scripts/preflight.sh --expect off
# Start the server from the scaffolded project
cd ~/mastra-builder-smoke-tests/builder-smoke
pnpm mastra:dev
# Poll /api/agents until 200 (60s budget). Detects mastra dev's port-bump.
bash .claude/skills/builder-smoke-test/scripts/wait-for-server.sh
wait-for-server.sh probes /api/agents — not / — because the SPA shell can return 200 before the API mounts. If it reports the server is up on :4112+ instead of :4111, mastra dev fell through to the next port; stop, free :4111, and restart. Continuing on a non-default port silently breaks every curl in every reference.
API base URL
Every reference assumes $BASE is exported. Set it once at the start of the run:
export BASE=http://localhost:4111/api
All curl examples in the references use $BASE and won't work in a shell that hasn't exported it.
Quick reference: key endpoints
This table lists the surfaces an agent will hit and where to look for the
authoritative request/response shape. Don't copy curl blocks from here —
run the per-section commands in references/<section>.md.
| Surface | Endpoint |
|---|---|
| Builder settings | GET /editor/builder/settings |
| Builder infra | GET /editor/builder/infrastructure |
| Registries (list) | GET /editor/builder/registries |
| Registry search | GET /editor/builder/registries/:registryId/search?q=… |
| Registry popular | GET /editor/builder/registries/:registryId/popular |
| Registry preview | GET /editor/builder/registries/:registryId/preview?owner=…&repo=…&path=… |
| Registry install | POST /editor/builder/registries/:registryId/install |
| Workspace CRUD | GET/POST/PATCH/DELETE /stored/workspaces[/:id] |
| Agent CRUD | GET/POST/PATCH/DELETE /stored/agents[/:id] |
| Agent favorite | PUT / DELETE /stored/agents/:id/favorite |
| Agent avatar | PATCH /stored/agents/:id with metadata.avatarUrl (owner-only) |
| Skill CRUD | GET/POST/PATCH/DELETE /stored/skills[/:id] |
| Skill publish | POST /stored/skills/:id/publish |
| Skill favorite | PUT / DELETE /stored/skills/:id/favorite |
| Auth me | GET /api/auth/me (returns logged-in user + roles + permissions) |
| Auth refresh | POST /auth/refresh |
Builder Studio routes
| Feature | Route |
|---|---|
| Agent Builder shell | /agent-builder |
| Agents (default view) | /agent-builder |
| Agent detail (view) | /agent-builder/agents/:id/view (bare :id redirects to /view) |
| Agent detail (edit) | /agent-builder/agents/:id/edit |
| Skills | /agent-builder/skills |
| Library (public skills) | /agent-builder/library |
| Skill detail | /agent-builder/skills/:id/edit (owner) or /agent-builder/skills/:id/view (non-owner) |
| Workspaces | /agent-builder/workspaces |
| Infrastructure | /agent-builder/infrastructure (readable by every default role — see infrastructure.md) |
Mobile renders a bottom-bar with the same primary entries.
Browser smoke
Use whichever browser tool the harness has wired up (Stagehand, Chrome MCP, etc.). Don't assume a specific provider — discover what's available, then drive the same checklist in references/ui.md.
The scaffolded project registers StagehandBrowser (matching examples/agent-builder). If BROWSERBASE_* keys aren't set in the shell, Stagehand falls back to local Playwright; that's fine for smoke. If neither Stagehand nor a local browser is reachable, mark UI as ⏭️ Skipped (no browser provider).
Result reporting
After testing, provide:
## Builder Smoke Test Results
**Date**: <date>
**Branch**: <branch>
**Commit**: <short sha>
**Server**: scaffolded project @ localhost:4111 (`$PROJECT_DIR`)
**Auth**: on / off / auto-skipped
| # | Section | Status | Notes |
| --- | ------------------ | -------- | ------------------------------- |
| 1 | Setup | ✅/❌ | |
| 2 | Workspace | ✅/❌ | |
| 3 | Reconciliation | ✅/❌/⏭️ | |
| 4 | Defaults | ✅/❌ | |
| 5 | Model Policy | ✅/❌ | |
| 6 | Skills | ✅/❌ | |
| 7 | Registry | ✅/❌ | |
| 8 | Agents | ✅/❌ | |
| 9 | Pickers | ✅/❌ | |
| 10 | Stars | ✅/❌ | |
| 11 | Permissions / RBAC | ✅/❌ | |
| 12 | Infrastructure | ✅/❌ | |
| 13 | Channels | ✅/❌ | |
| 14 | UI | ✅/❌/⏭️ | |
| 15 | Auth | ✅/❌/⏭️ | (skipped if no WORKOS\_\* vars) |
**Product issues**: (list any — server/UI behaved unexpectedly. For each: HTTP method + path or UI route, expected vs actual, one-sentence guess at the cause. Do not pre-decide "known bug" — log what the server actually did. Say "none" if empty.)
**Skill issues**: (list any — the skill itself was wrong, unclear, stale, or unreachable. For each: which file + step (e.g. `references/skills.md` step F2), and what was wrong. Doc drift, not product bugs. Say "none" if empty.)
**Verify before filing.** Before adding anything to either list, re-confirm against the live response in this run, not memory of an earlier call:
- For any **shape mismatch / missing field / wrong key name** claim, paste the actual JSON fragment (or the relevant keys) directly under the bullet so the claim is reproducible. If the skill says `features.agent.skills` and the response has `features.agent.skills`, that is not a skill issue — names that look similar in passing (`featSkills`, `agent.features.skill`, etc.) are easy to misread.
- For any **endpoint inconsistency** claim (e.g. "endpoint A returns X but B returns Y"), re-curl both endpoints fresh in the same run rather than reusing a stale response from earlier in the section.
- For any **RBAC / authz** claim (403 where you expected 200, or vice versa), check `references/permissions.md` for the matrix _and_ check the "Design decisions" list in this file. Several roles intentionally share `*:read`, which means infra/list/get endpoints look "ungated" but are working as intended. Also confirm the cookie you sent belongs to the role you think it does (`curl -H "Cookie: $(cat /tmp/cookie.txt)" $BASE/auth/me | jq '.role // .roles'`).
- For any **missing endpoint** claim (e.g. "agent avatar 404"), confirm the contract first — several flows are client-composed on top of generic CRUD (avatar = `PATCH metadata.avatarUrl`; Library Copy = `POST /stored/skills` with `metadata.origin`). The "Design decisions (don't file as bugs)" section enumerates the common ones.
- If a claim can't be reproduced on a fresh request, drop it.
**Regressions**: (list any behavioral changes from a previous run)
**Warnings**: (e.g., dev-server crash on `/auth/refresh` polling, OPENAI_API_KEY required at startup)
**Skipped sections**: (list with reason)
Known rough edges
The branch has accumulated minor papercuts. Note these in your report only if you hit them; don't fail the run on them:
- Don't
rm$PROJECT_DIR/mastra.dbby hand while the server is up — stop the server first, then delete. - Dev server can crash on hot-reload from
/auth/refreshpolling. Restart and continue. OPENAI_API_KEYis required at startup — server won't boot without it, even if you only test non-LLM surfaces.mastra devoverwritesprocess.envfrom.envat boot, so inline env overrides on the command line don't reach the server. Re-run scaffold to change.env.- The scaffold links against the current worktree's packages via
link:overrides. If you switch worktrees, re-run scaffold so the symlinks point at the right tree.
Design decisions (don't file as bugs)
These have come up across multiple runs and are intentional. If you observe one, note it in your report as "expected behavior" — do not open a product issue.
GET /auth/mewithout a cookie returns200with anull-ish body. The route is mounted as a public route (createPublicRoute); the contract is "return the current user ornull", not "401 if missing". A401here would break the public app shell./editor/builder/infrastructureis readable by every default role (admin / member / viewer). The handler gates oninfrastructure:readand every default role has*:read, which matches by resource-wildcard. The page only exposes deployment-shape data (provider names, registered flags, configured/unconfigured booleans) — no secrets.- Flipping a skill's
visibilityfromprivatetopublicdoes not auto-publish unless the skill has a registeredskillPath. Visibility and publication are independent fields by design. A plain-create skill flipped public stays atactiveVersionId: nulluntil a realPOST /publishruns against a source path. - Zod schema validation runs before the permission middleware on
/stored/*writes. A malformed body from a viewer returns a 400, not a 403. This is standard request lifecycle; the response surface doesn't leak resource state. - The role-impersonation picker only lists roles different from the current one. Logged in as
admin, you'll seeMemberandViewerand nothing else — there is noAdminself-item. This is intentional (admin is the baseline; you're already there). - Impersonation is UI-only. The API still answers per the real logged-in role. A
curlwhile impersonatingviewerwill still return the admin's response. Favoritessidebar entry links to/agent-builder/favorite(singular). The plural/favoritesis not a registered route and renders the React Router 404. Use the sidebar link or the singular URL when scripting.- Avatar upload uses agent
PATCHwithmetadata.avatarUrl, not a dedicated/avatarendpoint. Seereferences/agents.md. - Copy is client-side. There is no
POST /stored/skills/:id/copy. The UI fetches the source skill and POSTs a new row to/stored/skillswithmetadata.origin = "library-copy". Seereferences/registry.md.
Out of smoke-test scope
Some flows are documented in references/ but are not driven by the smoke-test agent because they require server-lifecycle gymnastics that don't fit a single run:
- Reconciliation steps 2/3/4/6 (
references/reconciliation.md) require editing$PROJECT_DIR/src/mastra/index.ts(changingbasePath/workspaceId/ config), restartingmastra devmultiple times, and observing drift detection or orphan archival across restarts. The smoke-test agent runs only Step 1 (fresh-startup persistence) and Step 5 (non-builder workspaces untouched). Run the rest by hand when changing reconciliation code. - Real role-swap testing (logging in as multiple WorkOS users with different roles in the same run) is out of scope. The agent verifies whichever role the live
--roleuser actually has, and additionally exercises the UI-only role impersonation flow under--role admin(seereferences/ui.md).
References
references/setup.md— server health, builder settings sanity, baseline counts, builder workspace existencereferences/workspace.md— workspace CRUD via APIreferences/reconciliation.md— config-driven workspace lifecycle (fresh, idempotent, drift, archival, backfill)references/defaults.md— builder defaults applied at agent create (memory, workspace, browser, model)references/model-policy.md— allowed list, default model, dropdown filtering, rejectionreferences/skills.md— skill CRUD, visibility, publish, filesystem writes, files arrayreferences/registry.md— skills.sh browse/install, library Copy flow, origin badges, gatingreferences/agents.md— stored agent CRUD, skill attachment, model swap, delete-from-edit, avatar uploadreferences/picker-allowlist.md— tools/agents/workflows pickers respect allowlistsreferences/favorites.md— favorite/unfavorite agents and skills, idempotency (formerlystars.md)references/permissions.md— viewer/member/admin/owner gating, role expectation matrix, UI impersonation, auth-off bypassreferences/infrastructure.md—/editor/builder/infrastructurepayload + UIreferences/channels.md— Slack provider visibility, connectChannel toolreferences/ui.md— browser checklist across Builder routesreferences/auth.md— WorkOS on/off, 401 behavior, authorId, mode-toggle via.envscripts/scaffold.sh— scaffold or refresh the hermetic project at$PROJECT_DIRscripts/preflight.sh— wrapsscaffold.sh+ mode expectation (--expect off|on)scripts/wait-for-server.sh— poll:4111until healthy