testing-standards - SKILL.md Agent Skill

name: testing-standards description: Testing patterns, TDD approach, mocking, coverage standards, and test architecture for full-stack apps. Python (pytest) and TypeScript (vitest/jest) focused. MANDATORY: write tests BEFORE implementation code (red-green-refactor). Never declare done with 0% coverage on new features. triggers: [test, pytest, jest, vitest, coverage, tdd, mock, unittest, e2e, integration-test, unit-test, test-suite, test-file, assertion, describe, it, expect, assert, before-each, after-each, setup, teardown, fixture, spy, stub, snapshot, cypress, playwright, testing-library, react-testing-library] auto_load: code-builder

Testing Standards

⚠️ MANDATORY TDD ENFORCEMENT (CODE-001 Gene)

For EVERY new feature, you MUST follow red-green-refactor:

1. Write the test FIRST (it will fail — "red")
2. Write the minimum code to make it pass ("green")
3. Refactor for clarity ("refactor")
4. Repeat until feature is complete

Why this matters (research):

"TDD prompting alone increased regressions (9.94%)" when models skip the test-first step (arxiv:2603.17973v1, Mar 2026)
Without TDD, agents default to "do everything in one go" (Reddit/C Claude Code, Jul 2025)
Test-first catches edge cases BEFORE they become bugs

What "no excuse" means:

New feature = minimum 1 test covering the happy path
Bug fix = test that reproduces the bug before fixing
If you write implementation code before tests, you're doing it wrong

Verification: pytest tests/ / vitest must pass before you declare done.

Core Principles

Test behavior, not implementation — tests should break when behavior changes, not when code is refactored
One assertion concept per test — each test verifies one thing
Fast feedback — unit tests run in milliseconds, integration in seconds, e2e in minutes
Deterministic — same test, same result, every time (no flaky tests)

1. Test Pyramid

         ╱ e2e ╲          ← Few: critical user journeys (Cypress, Playwright)
        ╱ integration ╲    ← Some: API routes, DB queries, service boundaries
       ╱   unit tests   ╲   ← Many: functions, utilities, hooks, components

Ratios

Layer	Count	Speed	What to test
Unit	70%+	< 1ms each	Pure functions, utils, hooks, validators, models
Integration	20%	< 1s each	API routes, DB queries, service orchestration
E2E	< 10%	< 30s each	Critical user flows, auth, payments, core features

2. Python Testing (pytest)

Test Structure

import pytest
from datetime import datetime

# ✅ CORRECT: Clean arrange-act-assert
def test_calculate_order_total_with_multiple_items():
    # Arrange
    items = [{"price": 10.00, "qty": 2}, {"price": 5.00, "qty": 3}]

    # Act
    total = calculate_total(items)

    # Assert
    assert total == 35.00

Fixtures

@pytest.fixture
def sample_user(db_session):
    """Create a user for tests. Cleaned up automatically by transaction rollback."""
    user = User(email="test@example.com", password_hash="hashed_pw")
    db_session.add(user)
    db_session.commit()
    return user

@pytest.fixture
def client():
    """FastAPI test client with clean DB per test."""
    app.dependency_overrides[get_db] = lambda: test_db_session
    with TestClient(app) as c:
        yield c
    app.dependency_overrides.clear()

# ✅ Use: test gets a FRESH user each time
def test_get_user_profile(client, sample_user):
    response = client.get(f"/users/{sample_user.id}")
    assert response.status_code == 200
    assert response.json()["email"] == "test@example.com"

Parametrize

@pytest.mark.parametrize("email,expected", [
    ("user@example.com", True),
    ("not-an-email", False),
    ("", False),
    ("user+tag@example.com", True),
])
def test_email_validation(email, expected):
    assert is_valid_email(email) == expected

Mocking

# ✅ Use monkeypatch for simple cases, pytest-mock for complex
def test_send_welcome_email(monkeypatch):
    sent = []
    monkeypatch.setattr("app.email.send_email", lambda to, msg: sent.append((to, msg)))
    # ... test runs without actually sending email
    assert len(sent) == 1

# ✅ Async mocking with pytest-asyncio
@pytest.mark.asyncio
async def test_async_service(mocker):
    mock_result = {"id": "123"}
    mocker.patch("app.api.fetch_data", return_value=mock_result)
    result = await my_service.get_data()
    assert result["id"] == "123"

Async Tests

import pytest

@pytest.mark.asyncio
async def test_async_function():
    result = await async_operation()
    assert result is not None

3. TypeScript Testing (vitest)

Test Structure

import { describe, it, expect, vi, beforeEach } from 'vitest';

describe('calculateTotal', () => {
  beforeEach(() => {
    vi.clearAllMocks();
  });

  it('calculates total with discount', () => {
    const items = [{ price: 100, quantity: 2 }];
    const result = calculateTotal(items, { discount: 0.1 });
    expect(result).toBe(180); // 200 - 10%
  });

  it('throws on negative quantity', () => {
    expect(() => calculateTotal([{ price: 10, quantity: -1 }])).toThrow('invalid quantity');
  });
});

Mocking

// ✅ Module mock
vi.mock('../lib/email', () => ({
  sendEmail: vi.fn(),
}));

// ✅ Function mock
const mockFetch = vi.fn();
vi.stubGlobal('fetch', mockFetch);

// ✅ Partial mock
const mockDb = {
  findUser: vi.fn().mockResolvedValue({ id: '1', name: 'Test' }),
  saveUser: vi.fn().mockResolvedValue(true),
};

React Testing Library

import { render, screen, fireEvent, waitFor } from '@testing-library/react';
import userEvent from '@testing-library/user-event';

describe('LoginForm', () => {
  it('submits form with valid data', async () => {
    const onSubmit = vi.fn();
    render(<LoginForm onSubmit={onSubmit} />);

    await userEvent.type(screen.getByLabelText('Email'), 'user@test.com');
    await userEvent.type(screen.getByLabelText('Password'), 'password123');
    await userEvent.click(screen.getByRole('button', { name: 'Login' }));

    await waitFor(() => {
      expect(onSubmit).toHaveBeenCalledWith({
        email: 'user@test.com',
        password: 'password123',
      });
    });
  });

  it('shows validation errors for empty fields', async () => {
    render(<LoginForm onSubmit={vi.fn()} />);
    await userEvent.click(screen.getByRole('button', { name: 'Login' }));

    expect(screen.getByText('Email is required')).toBeInTheDocument();
    expect(screen.getByText('Password is required')).toBeInTheDocument();
  });
});

4. What to Test (and What NOT To)

Test These

Pure business logic (always first priority)
Input validation (edge cases: empty, too long, invalid format)
Error handling (what happens when DB is down, API returns 500)
Auth boundaries (unauthenticated gets 401, unauthorized gets 403)
State transitions (draft → published → archived)
Data transformations (calculations, formatting, parsing)

Don't Test These

Framework internals (Express routing, React reconciliation, SQLAlchemy sessions)
Third-party library behavior (axios, Prisma, date-fns — trust their tests)
CSS/styling (snapshot tests for visual things are brittle)
Implementation details (internal helper names, private methods)
Configuration that doesn't change (env var loading)

5. Integration Test Patterns

API Integration

def test_create_order_flow(client, auth_headers, sample_products):
    # 1. Create order
    response = client.post("/orders", json={
        "items": [{"product_id": sample_products[0].id, "qty": 2}]
    }, headers=auth_headers)
    assert response.status_code == 201
    order_id = response.json()["id"]

    # 2. Verify order exists
    response = client.get(f"/orders/{order_id}", headers=auth_headers)
    assert response.status_code == 200
    assert response.json()["status"] == "pending"

    # 3. Verify stock decreased
    response = client.get(f"/products/{sample_products[0].id}", headers=auth_headers)
    assert response.json()["stock"] == sample_products[0].stock - 2

Database Integration

def test_user_can_only_see_own_orders(client, auth_headers_1, auth_headers_2):
    """Users cannot access other users' orders."""
    # Alice creates an order
    alice_resp = client.post("/orders", json={"items": [...]}, headers=auth_headers_1)
    alice_order_id = alice_resp.json()["id"]

    # Bob tries to see it — should fail
    bob_resp = client.get(f"/orders/{alice_order_id}", headers=auth_headers_2)
    assert bob_resp.status_code == 404  # Not 403 — don't reveal order exists

6. E2E Test Patterns

import { test, expect } from '@playwright/test';

test('user can complete purchase flow', async ({ page }) => {
  // Navigate
  await page.goto('/products');

  // Add item to cart
  await page.click('[data-testid="add-to-cart-1"]');

  // Go to checkout
  await page.click('[data-testid="checkout"]');
  await page.fill('[name="email"]', 'user@test.com');
  await page.fill('[name="password"]', 'password123');
  await page.click('[data-testid="login"]');

  // Complete payment
  await page.fill('[name="card"]', '4242424242424242');
  await page.fill('[name="expiry"]', '12/28');
  await page.fill('[name="cvc"]', '123');
  await page.click('[data-testid="pay"]');

  // Verify success
  await expect(page.locator('[data-testid="order-confirmation"]')).toBeVisible();
});

7. Timer/Flake Anti-Pattern

// ❌ BAD: Fixed wait — flaky under load
await new Promise(r => setTimeout(r, 2000));

// ✅ GOOD: Wait for actual condition
await page.waitForSelector('[data-testid="result"]', { timeout: 10000 });

// ✅ GOOD: Polling (Python)
from tenacity import retry, stop_after_attempt, wait_fixed

@retry(stop=stop_after_attempt(3), wait=wait_fixed(1))
def wait_for_condition():
    result = check_condition()
    assert result is not None, "Condition not met yet"

8. Verification Checklist

Tests follow arrange-act-assert pattern
Pure business logic has full coverage (happy + edge + error paths)
TDD enforced: test written BEFORE implementation code (red-green-refactor)
Mocked external services (email, payments, third-party APIs)
No fixed timers / sleeps in tests
Tests are deterministic (same result every run)
Database tests use transactions/rollback (clean state per test)
Auth tests cover: unauthenticated (401), unauthorized (403), valid
E2E tests cover critical user journeys only
Coverage >= 70% on business logic (not aiming for 100% overall)
Tests run in CI before any deploy