name: empirica-autonomous-testing description: Run the experiment autonomously, capture screenshots across participant/admin views, and perform smoke checks after implementation changes.
Use this when asked to test the experiment end-to-end and show visual progress.
Runtime command policy for this repo:
- Do not use
npm run build/npm testto validate the Empirica app. - Only install dependencies with
npm installinclient/andserver/. - Start and run the app with
empiricafrom repository root. - If a full integration build check is needed, run
empirica bundlefrom repository root.
Before running tests, review relevant Empirica references:
- docs: https://docs.empirica.ly/
- framework repo: https://github.com/empiricaly/empirica
- docs repo: https://github.com/empiricaly/docsv2
- project reference map:
schema/empirica-reference.jsonif present
Generalizable runtime checks for any Empirica experiment:
- Verify lifecycle progression is correct (intro/exit async behavior, stage/round synchronization).
- Verify server-to-client propagation for async server updates (especially timer/API-driven updates).
- If async updates appear only after refresh, check for missing
Empirica.flush()in async paths. - Verify env-dependent features (API keys, debug flags) are loaded in the actual server process.
- Treat an E2E run as incomplete until every simulated participant reaches the final exit state.
- Require interaction depth: each simulated participant should perform realistic task actions (not just page loads/check-ins).
- Include condition/treatment stress checks: exercise branch-specific mechanics and edge interactions for each condition under test.
Workflow
Start local experiment:
empiricain repository root.- use admin credentials from
.empirica/empirica.tomlto access/admin. - if
.empirica/empirica.tomlstill hasCHANGE_ME_SRTOKENorCHANGE_ME_PASSWORD, stop and configure them first during implementation.
Run screenshot capture script:
node .github/skills/empirica-autonomous-testing/scripts/capture-screenshots.mjs --base http://localhost:3000 --out .github/skills/empirica-autonomous-testing/artifacts --paths /,/admin- If Node/Playwright is unavailable, capture screenshots manually from browser windows.
- Always surface screenshots to researcher during the run (not only at the end).
Run a simulated session:
- Prefer reusable automation script first:
node .github/skills/empirica-autonomous-testing/scripts/run-treatment-e2e.mjs --base http://localhost:3000 --treatment "None" --player-count 5 --admin-user admin --admin-pass <password> --out .github/skills/empirica-autonomous-testing/artifacts/none-e2e --require-exit 1
- In admin panel automation, create/start a batch with intended treatment randomization and assignment mode.
- Ask researcher whether to test all treatments or a named subset.
- Verify expected game/treatment allocation before participants join.
- For each treatment under test, launch the correct number of participants based on that treatment's
playerCount. - Launch participant URLs and move through consent, intro steps, stage(s), and exit for each active game.
- A treatment simulation is only considered complete if all simulated participants reach final exit.
- Stress-test task interactions (not just check-in messages): domain-relevant decisions/communications plus condition-specific mechanics.
- Fail the run if participant completion criteria are not met (do not silently continue to export/validation).
- Capture screenshots during progression (intro, in-stage, waiting/result, exit, admin).
- Prefer reusable automation script first:
Export simulation data:
empirica exportin repository root.- Move the generated zip into
.github/skills/empirica-autonomous-testing/artifacts/exports. - Confirm zip contents include experiment tables (
game.csv,player.csv,round.csv,stage.csv, etc.) when data exists.
Run schema checks against implementation and exported data:
node .github/skills/empirica-data-schema-validation/scripts/validate-schema.mjs --server server/src/callbacks.js --client client/src --expect .github/skills/empirica-data-schema-validation/expected-schema.example.jsonpython3 .github/skills/empirica-data-schema-validation/scripts/compare-export-schema.py --export-zip <path-to-export-zip> --expect .github/skills/empirica-data-schema-validation/expected-schema.example.json
Report outputs:
- screenshot file paths
- any page-load, runtime, or export errors observed
- admin panel actions taken (batch config, assignment mode, treatment setup)
- simulation status by treatment (participants launched, games started, games ended, stages completed)
- intro/exit verification:
- consent shown and completed
- intro step order observed
- exit survey/debrief/submission code behavior observed
- participant completion counts (required vs completed exit)
- schema comparison result (expected vs code vs exported data)
- async propagation check result (live update vs requires refresh)
- Empirica references consulted
- explicit blockers if screenshots could not be produced
Notes
- For multiplayer checks, open multiple participant URLs with different
participantKeyvalues. - Reuse scripts under
scripts/lib:admin.mjsfor stable Empirica admin actions (newBatchButton, assignment buttons, treatment select, create/start).
- Capture at least: landing/player page, active stage page, admin page.
- If Playwright is missing, install it with
npm install --no-save playwright. - If
node/npmare unavailable, continue with manual browser simulation and Python-based export checks.