name: playwright-e2e-tests description: Write and maintain Playwright end-to-end tests for the Onyx application. Use when creating new E2E tests, debugging test failures, adding test coverage, or when the user mentions Playwright, E2E tests, or browser testing.
Playwright E2E Tests
Spec-authoring rules live in web/tests/e2e/README.md — the Page Object Model, locator
priority, and auto-retrying matchers. Read it before adding or changing a spec. This skill covers
the surrounding workflow: layout, environment, running tests, auth, and Onyx-specific utilities.
Project Layout
- Tests:
web/tests/e2e/— organized by feature (auth/,admin/,chat/,assistants/,connectors/,mcp/) - Config:
web/playwright.config.ts - Utilities:
web/tests/e2e/utils/ - Constants:
web/tests/e2e/constants.ts - Global setup:
web/tests/e2e/global-setup.ts - Output:
web/output/playwright/
Imports
Always use absolute imports with the @tests/e2e/ prefix — never relative paths (../, ../../). The alias is defined in web/tsconfig.json and resolves to web/tests/.
import { loginAs } from "@tests/e2e/utils/auth";
import { OnyxApiClient } from "@tests/e2e/utils/onyxApiClient";
import { TEST_ADMIN_CREDENTIALS } from "@tests/e2e/constants";
All new files should be .ts, not .js.
Environment
Run Playwright against your local dev servers so the tests exercise your working-tree changes:
start ods web dev (frontend, :3000) and ods backend api (backend, :8080), then use the
config default BASE_URL=http://localhost:3000 — no override needed. In dev, next dev proxies
/api/* to the backend itself (the route handler web/src/app/api/[...path]/route.ts), so the UI
and /api both live on :3000; global setup needs the backend up too, since it registers/logs in
users over /api/auth/*.
Readiness check: curl -s -o /dev/null -w '%{http_code}' http://localhost:3000/api/auth/type → 200.
Other bootstrap notes:
- The devcontainer image bakes in Playwright's OS deps + Node, but not the Chromium browser binary. If a run fails with a missing-browser error, install it once:
bunx playwright install chromium. - Run commands from the
web/directory;global-setup.tswrites auth-state files (admin_auth.json, …) to the cwd. It's idempotent, so setup is fast. web/.vscode/.envmay be absent —dotenvskips it silently (fine for most tests). One exception:tests/e2e/mcp/mcp_oauth_flow.spec.tsthrows at import withoutMCP_OAUTH_*vars, which also breaks a whole-suite--list. Scope runs to specific files/dirs to avoid it.
Running Tests
cd web # from the repo root
# Run a specific test file
bunx playwright test tests/e2e/chat/default_assistant.spec.ts
# Narrow to a single test by title (fast smoke check)
bunx playwright test tests/e2e/chat/welcome_page.spec.ts \
-g "chat input is visible and focusable" --project admin
Test Projects
| Project | Description | Parallelism |
|---|---|---|
admin |
Standard tests (excludes @exclusive) |
Parallel |
exclusive |
Serial, slower tests (tagged @exclusive) |
1 worker |
All tests use admin_auth.json storage state by default (pre-authenticated admin session).
Authentication
Global setup (global-setup.ts) runs automatically before all tests and handles:
- Server readiness check (polls health endpoint, 60s timeout)
- Provisioning test users: admin, admin2, and a pool of worker users (
worker0@example.comthroughworker7@example.com) (idempotent) - API login + saving storage states:
admin_auth.json,admin2_auth.json, andworker{N}_auth.jsonfor each worker user - Setting display name to
"worker"for each worker user - Promoting admin2 to admin role
- Ensuring a public LLM provider exists
Both test projects set storageState: "admin_auth.json", so every test starts pre-authenticated as admin with no login code needed.
When a test needs a different user, use API-based login — never drive the login UI:
import { loginAs } from "@tests/e2e/utils/auth";
await page.context().clearCookies();
await loginAs(page, "admin2");
// Log in as the worker-specific user (preferred for test isolation):
import { loginAsWorkerUser } from "@tests/e2e/utils/auth";
await page.context().clearCookies();
await loginAsWorkerUser(page, testInfo.workerIndex);
Test Structure
Tests start pre-authenticated as admin — navigate and test directly:
import { test, expect } from "@playwright/test";
test.describe("Feature Name", () => {
test("should describe expected behavior clearly", async ({ page }) => {
await page.goto("/app");
await page.waitForLoadState("networkidle");
// Already authenticated as admin — go straight to testing
});
});
User isolation — tests that modify visible app state (creating assistants, sending chat messages, pinning items) should run as a worker-specific user and clean up resources in afterAll. Global setup provisions a pool of worker users (worker0@example.com through worker7@example.com). loginAsWorkerUser maps testInfo.workerIndex to a pool slot via modulo, so retry workers (which get incrementing indices beyond the pool size) safely reuse existing users. This ensures parallel workers never share user state, keeps usernames deterministic for screenshots, and avoids cross-contamination:
import { test } from "@playwright/test";
import { loginAsWorkerUser } from "@tests/e2e/utils/auth";
test.beforeEach(async ({ page }, testInfo) => {
await page.context().clearCookies();
await loginAsWorkerUser(page, testInfo.workerIndex);
});
If the test requires admin privileges and modifies visible state, use "admin2" instead — it's a pre-provisioned admin account that keeps the primary "admin" clean for other parallel tests. Switch to "admin" only for privileged setup (creating providers, configuring tools), then back to the worker user for the actual test. See chat/default_assistant.spec.ts for a full example.
loginAsRandomUser exists for the rare case where the test requires a brand-new user (e.g. onboarding flows). Avoid it elsewhere — it produces non-deterministic usernames that complicate screenshots.
API resource setup — only when tests need to create backend resources (image gen configs, web search providers, MCP servers). Use beforeAll/afterAll with OnyxApiClient to create and clean up. See chat/default_assistant.spec.ts or mcp/mcp_oauth_flow.spec.ts for examples. This is uncommon (~4 of 37 test files).
Key Utilities
OnyxApiClient (@tests/e2e/utils/onyxApiClient)
Backend API client for test setup/teardown. Key methods:
- Connectors:
createFileConnector(),deleteCCPair(),pauseConnector() - LLM Providers:
ensurePublicProvider(),createRestrictedProvider(),setProviderAsDefault() - Assistants:
createAssistant(),deleteAssistant(),findAssistantByName() - User Groups:
createUserGroup(),deleteUserGroup(),setUserRole() - Tools:
createWebSearchProvider(),createImageGenerationConfig() - Chat:
createChatSession(),deleteChatSession()
chatActions (@tests/e2e/utils/chatActions)
sendMessage(page, message)— sends a message and waits for AI responsestartNewChat(page)— clicks new-chat button and waits for introverifyDefaultAssistantIsChosen(page)— checks Onyx logo is visibleverifyAssistantIsChosen(page, name)— checks assistant name displayswitchModel(page, modelName)— switches LLM model via popover
visualRegression (@tests/e2e/utils/visualRegression)
expectScreenshot(page, { name, mask?, hide?, fullPage? })expectElementScreenshot(locator, { name, mask?, hide? })- Controlled by
VISUAL_REGRESSION=trueenv var
theme (@tests/e2e/utils/theme)
THEMES—["light", "dark"] as constarray for iterating over both themessetThemeBeforeNavigation(page, theme)— setsnext-themestheme vialocalStoragebefore navigation
When tests need light/dark screenshots, loop over THEMES at the test.describe level and call setThemeBeforeNavigation in beforeEach before any page.goto(). Include the theme in screenshot names. See admin/admin_pages.spec.ts or chat/chat_message_rendering.spec.ts for examples:
import { THEMES, setThemeBeforeNavigation } from "@tests/e2e/utils/theme";
for (const theme of THEMES) {
test.describe(`Feature (${theme} mode)`, () => {
test.beforeEach(async ({ page }) => {
await setThemeBeforeNavigation(page, theme);
});
test("renders correctly", async ({ page }) => {
await page.goto("/app");
await expectScreenshot(page, { name: `feature-${theme}` });
});
});
}
tools (@tests/e2e/utils/tools)
TOOL_IDS— centralizeddata-testidselectors for tool optionsopenActionManagement(page)— opens the tool management popover
Waiting Strategy
// Wait for load state after navigation
await page.goto("/app");
await page.waitForLoadState("networkidle");
// Wait for specific element
await page.getByTestId("chat-intro").waitFor({ state: "visible", timeout: 10000 });
// Wait for URL change
await page.waitForFunction(() => window.location.href.includes("chatId="), null, { timeout: 10000 });
// Wait for network response
await page.waitForResponse(resp => resp.url().includes("/api/chat") && resp.status() === 200);
Best Practices
- Descriptive test names — clearly state expected behavior:
"should display greeting message when opening new chat" - API-first setup — use
OnyxApiClientfor backend state; reserve UI interactions for the behavior under test - User isolation — tests that modify visible app state (sidebar, chat history) should run as the worker-specific user via
loginAsWorkerUser(page, testInfo.workerIndex)(not admin) and clean up resources inafterAll. Each parallel worker gets its own user, preventing cross-contamination. ReserveloginAsRandomUserfor flows that require a brand-new user (e.g. onboarding) - DRY helpers — extract reusable logic into
utils/with JSDoc comments - No hardcoded waits — use the auto-retrying matchers (
web/tests/e2e/README.md),waitFor, orwaitForLoadState; neverwaitForTimeout - Parallel-safe — no shared mutable state between tests. Prefer static, human-readable names (e.g.
"E2E-CMD Chat 1") and clean up resources by ID inafterAll. This keeps screenshots deterministic and avoids needing to mask/hide dynamic text. Only fall back to timestamps (\test-${Date.now()}``) when resources cannot be reliably cleaned up or when name collisions across parallel workers would cause functional failures - Error context — catch and re-throw with useful debug info (page text, URL, etc.)
- Tag slow tests — mark serial/slow tests with
@exclusivein the test title - Visual regression — use
expectScreenshot()for UI consistency checks - Minimal comments — only comment to clarify non-obvious intent; never restate what the next line of code does