regression-suite - SKILL.md Agent Skill

name: regression-suite description: Design and manage the regression test suite for Label Suite releases.

Regression Suite

Plan and manage the regression test suite to ensure new changes don't break existing functionality.

Usage

/regression-suite plan          # Plan regression strategy for next release
/regression-suite run           # Generate regression run checklist
/regression-suite impact <file> # Analyze impact of changes to specific file

Regression Suite Structure

Tier 1: Smoke Tests (5 minutes, run on every commit)

Test	Tool	Purpose
FastAPI app starts	pytest	Basic health check
GET /health returns 200	pytest	API reachability
DB connection active	pytest	Database connectivity
Redis connection active	pytest	Celery broker
Frontend builds	Vite	Bundle integrity

Tier 2: Critical Path Tests (15 minutes, run on every PR)

Test	Tool	AC
Annotator login and JWT issued	pytest	Auth
POST /submissions accepted	pytest	AC-F01
Celery scoring task completes	pytest	AC-F02
GET /submissions/{id} returns score	pytest	AC-F03
GET /leaderboard returns rankings	pytest	Leaderboard
answer NOT in any response	pytest security	AC-S01 (P0)
Annotator cannot access /admin/*	pytest	AC-S05

Tier 3: Full Regression (45 minutes, run before release)

All Tier 1 + Tier 2 tests, plus:

Test	Tool
All P1 AC verification	pytest + Playwright
Rate limiting enforcement	pytest
Config-driven form rendering	Playwright
Leaderboard sort order correctness	pytest
Scoring for all supported metric types	pytest
Celery task failure + retry	pytest
Admin CRUD operations	pytest

Impact Analysis

When a file changes, determine the minimum regression tier needed:

Changed File	Minimum Tier	Reason
`services/scoring.py`	Tier 3	Core scoring logic
`tasks/scoring.py`	Tier 3	Scoring Celery task
`schemas/dataset.py`	Tier 3	Risk of leakage
`routers/submissions.py`	Tier 2	Submission flow
`routers/leaderboard.py`	Tier 2	Leaderboard
`components/AnnotationWorkspace.tsx`	Tier 2	Core annotator UI
`utils/*.py`	Tier 1	Utilities
`README.md`, `docs/`	Tier 1	Docs only

CI/CD Integration

# .github/workflows/regression.yml
name: Regression Tests

on:
  push:
    branches: [main]
  pull_request:

jobs:
  smoke:
    name: Smoke Tests
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run smoke tests
        run: pytest tests/ -m smoke --tb=short

  critical-path:
    name: Critical Path Tests
    runs-on: ubuntu-latest
    needs: smoke
    services:
      postgres:
        image: postgres:16
        env:
          POSTGRES_PASSWORD: test
      redis:
        image: redis:7
    steps:
      - uses: actions/checkout@v4
      - name: Run critical path tests
        run: pytest tests/ -m "critical or security" --tb=short

  full-regression:
    name: Full Regression (pre-release)
    runs-on: ubuntu-latest
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
      - name: Run full regression
        run: |
          pytest tests/ --tb=short
          pnpm playwright test

Regression Run Checklist

## Release Regression Checklist

**Release**: vX.Y.Z
**Date**: YYYY-MM-DD
**Runner**: senior-qa

### Pre-Run
- [ ] All CI checks green on release branch
- [ ] Staging environment deployed and healthy
- [ ] Test data seeded in staging

### Tier 1 (Smoke)
- [ ] App starts in Docker Compose
- [ ] GET /health → 200

### Tier 2 (Critical Path)
- [ ] POST /submissions flow works end-to-end
- [ ] Celery scoring completes within 30s
- [ ] `answer` NOT in API response (security)
- [ ] Annotator cannot access admin endpoints

### Tier 3 (Full Regression)
- [ ] All P1 ACs verified
- [ ] All metric types score correctly
- [ ] Playwright: annotator annotation flow
- [ ] Playwright: leaderboard display

### Sign-Off
- [ ] QA: [name] — PASS / FAIL
- [ ] Tech Lead: [name] — Approved for release

Flaky Test Management

Action	When
Mark as `@pytest.mark.flaky(reruns=3)`	Intermittent, understood cause
File issue and skip (`@pytest.mark.skip`)	Blocking CI, investigating
Delete test	Obsolete or duplicate
Fix root cause	Preferred action