python-package-builder

name: python-package-builder description: Build a Python package shipped via PyPI/uv following the foundation's 8-layer cadence. Encodes pyproject.toml + ruff + mypy strict + pytest + uv lockfile + commit-per-layer + Phase docs (PLAN + CODE_COMPLETE) + verify-before-done + comprehension-gate. Use when scaffolding a new Python project from a spec OR adding a new Phase to an existing one. tools-needed: file read/write, bash for uv invocations

Python Package Builder

Foundation: This skill inherits the cadence from skills/code-projects/code-project-foundation/SKILL.md. All exit-criteria split rules, plan + checkpoint contracts, and verify-before-done + comprehension-gate integrations defined there apply here.

Purpose

Bootstrap a Python package or implement a new Phase against an existing one using the foundation cadence. Produces the same shape every time: src/<package>/ layout, tests/ mirror, docs/PHASE_<N>_PLAN.md + docs/PHASE_<N>_CODE_COMPLETE.md, pyproject.toml PEP 621 + ruff + mypy strict + pytest config, GitHub Actions CI matrix, ADRs in docs/adr/.

When To Use

Scaffolding a new Python package from a spec (the spec lives in the planning repo, the source repo lives at ~/repos/github.com/<your-username>/<project>/).
Implementing the next Phase of an existing Python project (extending an already-scaffolded repo).
Adding a new layer to a Phase already in progress.

When NOT To Use

Non-Python projects - use the equivalent language-specific builder (ts-package-builder, go-module-builder, etc.) or hand-written equivalent until one exists.
One-off scripts / Jupyter notebooks - the cadence overhead exceeds the gain.
Patches on already-shipped projects - use direct edits + verify-before-done.

The Build Loop

For each layer (A through H per the foundation):

Step 1: Read the plan

Read docs/PHASE_<N>_PLAN.md Layer X section: files to create/modify, inline -> verify: lines, tests required, risks.

Step 2: TDD-paired implementation

Per step: write tests first (the verifier names the test file + assertions), implement, run the test file in isolation (uv run pytest <test-file> -x), run mypy FULL-REPO (uv run mypy), run ruff on touched files (uv run ruff check <touched-file>); fix failures before the next step.

Full-repo mypy is non-negotiable: scoped mypy is unreliable under strict mode (cross-file strictness needs the whole graph) and the pre-commit hook runs full-repo anyway (engram Phase 6: two commit bounces from scoped-green layers).

Step 3: Layer-end full validation

Before committing the layer (per Foundation Pattern 2 - format BEFORE manual checks so the pre-commit hook is a no-op):

uv run ruff format
uv run ruff check --fix
uv run mypy
uv run pytest -q

All must be green. Coverage check optional per-layer; required at Phase exit.

Step 4: Commit the layer

Commit message structure:

feat(<scope>): Layer X - <one-sentence summary>

Phase N Layer X (Steps M1-M2):

* <bullet describing source change 1>
* <bullet describing source change 2>
* <bullet describing test change>

Full suite: <count> passed (was <baseline>); ruff + mypy clean.

Sign + DCO (git commit -S -s) per global CLAUDE.md Git Commits. If the layer commit is a substantive checkpoint, session-wrap --checkpoint per global Session Management.

Default Project Skeleton

~/repos/github.com/<your-username>/<project>/
├── pyproject.toml                          # PEP 621 + ruff + mypy + pytest config
├── uv.lock                                 # locked deps via uv
├── .pre-commit-config.yaml                 # ruff + mypy on changed files
├── .github/workflows/ci.yml                # 3.11+3.12 x macOS+Ubuntu
├── LICENSE                                 # Apache-2.0 (or MIT, per project)
├── README.md                               # quickstart <5 min + local install + CI status
├── CHANGELOG.md                            # Keep a Changelog
├── CONTRIBUTING.md                         # how to contribute
├── .gitignore                              # Python + IDE + .venv
├── docs/
│   ├── adr/                                # ADRs numbered 001+
│   ├── PHASE_1_PLAN.md                     # written by deep-plan
│   ├── PHASE_1_CODE_COMPLETE.md            # written by verify-before-done at Phase end
│   ├── PHASE_<N>_PLAN.md                   # for each subsequent phase
│   └── <SETUP_GUIDE>.md                    # operator-facing guide(s)
├── src/<package>/
│   ├── __init__.py
│   ├── errors.py                           # custom exception hierarchy
│   ├── models/                             # Pydantic boundary models
│   ├── utils/                              # stateless helpers
│   ├── <subsystems>/                       # one dir per major subsystem
│   ├── cli/
│   │   ├── __init__.py                     # typer app + register pattern
│   │   └── <command>.py                    # each subcommand its own module
│   └── diagnostics/                        # doctor + check_codes
├── tests/
│   ├── conftest.py
│   ├── fixtures/                           # synthetic corpus generators
│   ├── unit/                               # mirror of src/<package>/
│   ├── integration/                        # cross-subsystem tests
│   └── properties/                         # hypothesis tests
└── bench/
    └── <metric>.py                         # NFR benchmarks (if applicable)

pyproject.toml Defaults

[project]
name = "<package-name>"
version = "0.1.0"
requires-python = ">=3.11"
description = "..."
license = { text = "Apache-2.0" }
authors = [{ name = "<Your Name>", email = "<you@example.com>" }]

[project.scripts]
<cli-name> = "<package>.cli:app"

[tool.ruff]
line-length = 100
target-version = "py311"

[tool.ruff.lint]
select = ["ALL"]
ignore = [
    "D203", "D213",  # docstring conventions
    "COM812", "ISC001",  # ruff format conflict
    "FBT001", "FBT002",  # boolean positional args sometimes appropriate
    "TRY003",  # long messages in exceptions
]

[tool.ruff.lint.per-file-ignores]
"tests/**" = ["S101", "PLR2004", "ANN"]

[tool.mypy]
strict = true
python_version = "3.11"

[tool.pytest.ini_options]
testpaths = ["tests"]
asyncio_mode = "auto"
addopts = "--strict-markers --strict-config"

CI Matrix Defaults

.github/workflows/ci.yml exercises Python 3.11 + 3.12 across macOS + Ubuntu (4 cells):

strategy:
  matrix:
    python-version: ["3.11", "3.12"]
    os: [ubuntu-latest, macos-latest]

The 8 Layers, Realized in Python

For a typical Python package, the layers tend to look like this. Adapt per project.

Layer A - Models + errors + constants

Pydantic models in src/<package>/models/ with model_config = ConfigDict(extra="forbid").
Exception hierarchy in src/<package>/errors.py rooted in <Package>Error with error_code: str class attribute.
Constant codes / enums in dedicated modules (e.g., diagnostics/check_codes.py) so tests can iterate.

Layer B - Stateless utilities

Pure functions: file naming, fingerprinting, atomic write, run_command wrapper, type-safe subprocess invocation.
Each utility in its own module; tests mirror the structure exactly.

Layer C - Core engine / state machine

The package's central abstraction: storage layer, sync coordinator, request router, etc.
Use enum.StrEnum for explicit states; encode allowed transitions in a dict at module level.
All async functions wrap blocking calls via asyncio.to_thread; test with pytest-asyncio.
Foundation Pattern 1 applies: any sync method that internally calls into asyncio must have a "no running loop" test.

Layer D - Cross-cutting safety / probes

Locks, validation gates, identity checks. Each runs once at startup AND on a per-cycle basis where appropriate.
Integrate with the diagnostics layer so the same logic surfaces both at startup-FAIL and under <package> doctor.

Layer E - Diagnostics / doctor

A run_diagnostics(config) function returning a structured report.
One _check_* function per check code from Layer A constants module.
Doctor handles non-applicable preconditions gracefully (Foundation Pattern 3).

Layer F - CLI + entry-point wiring

typer app with register(app) pattern: each command in its own module attaches via register().
The root cli/__init__.py imports each command module and calls register(app).
Startup ordering for long-running commands (serve / daemon) follows: load config → run probes → acquire lock → open resources → build server → run loop → drain on shutdown.

MUST: wire integration callsites BEFORE Layer G tests run. Whenever Layer C/D/E introduces a new module that participates in a running flow (capture gate, routing dispatcher, push queue, doctor probe, MCP tool handler, CLI subcommand handler), wire it into the actual serve / daemon / CLI startup path inside Layer F itself. Do not defer the wiring to "after Layer G", do not wire it inside test fixtures only, do not register the subcommand module without also re-exporting it from cli/__init__.py's registration list.

Confirmed at 3 occurrences in engram (Phase 3 wiring fix ef47801; Phase 4 initial deferral; Phase 4 public-release follow-up 633a05d → f9df9eb): every time the wiring was deferred past Layer F, Layer G integration tests passed against handler-level logic but the running binary shipped with the new behavior disconnected. The hermetic CLI smoke (Phase Exit Discipline step 5) catches this if you run it — but the right place to prevent it is making Layer F responsible for the integration callsite, not just the handler module.

Test rule: if a Layer G integration test passes but the equivalent hermetic CLI smoke (spawn the installed binary against a tmp_path) would fail because the new behavior is unreachable from the user-facing entry point, your Layer F is incomplete.

Layer G - Integration tests

tests/integration/ covers cross-subsystem flows.
tests/properties/ covers hypothesis property tests for invariants.
Hermetic: each test owns its own tmp_path; no network unless explicitly mocked.

Layer H - Docs

ADR for any architectural decision touched by the Phase.
docs/PHASE_<N>_CODE_COMPLETE.md lists exit criteria with evidence per code-side criterion.
Setup guide (e.g., docs/<SETUP>.md) for any operator-facing flow.
README "Status" section + Roadmap table updated.
CHANGELOG [Unreleased] grouped under Added / Changed / Security.

Phase Exit Discipline

At the end of each Phase, BEFORE marking complete:

Run verify-before-done - produces the explicit verification checklist.
Run comprehension-gate Step 5 if Phase added >200 LOC of source code.
Run NFR benchmarks (if the project has them).
Run full coverage gate: uv run pytest --cov=src --cov-fail-under=80.
Run hermetic CLI smoke against the installed binary (the load-bearing addition; see Foundation Pattern 5 "Test the binary, not just the suite"). For each subcommand introduced or modified by the Phase: spawn the actual binary via subprocess against a tmp_path workspace + assert observable state (filesystem layout, doctor output rows, JSON-RPC tool responses, stderr classification). The test suite exercises handlers; the CLI smoke exercises wiring between handlers and the user-facing binary - they catch disjoint bug classes. Without this step, a Phase can ship with passing tests + clean lint + green coverage AND broken CLI commands. Reference: every prior engram Phase has surfaced 3 wiring bugs at this gate that the unit tests missed (Phase 2 init/clone-vault/docs; Phase 3 doctor/serve/import). Skipping is voluntarily accepting that recurring failure mode. 5.5. Assert python -m <pkg> parity with the console script. Both must produce the same --version output and resolve the same subcommand surface. The spawn-dance pattern (Phase 5 daemon mode) does python -m engram daemon start ... from inside the proxy; without a top-level src/<pkg>/__main__.py that re-exports the typer/click app, that command silently 500s with No module named <pkg>.__main__. Empirically engram Phase 5 shipped this regression and only caught it post-merge when the user reported "Failed to connect" on their MCP client. Required check in the CI smoke matrix: install the wheel in a clean venv, then assert engram --version and python -m engram --version produce byte-identical output. The fix is mechanical (one __main__.py file) but the absence is invisible without this check.
Author docs/PHASE_<N>_CODE_COMPLETE.md with the code-side / operational split.
Update workspace files in the planning repo:
- <planning-repo>/workspace/<project>/MANIFEST.md - add the new Phase row + update the feature table (a feature's state becomes passing only after running its Verify command this session).
- <planning-repo>/workspace/<project>/PENDING_TASKS.md - operator action items.
- <planning-repo>/workspace/<project>/PHASE_<N>_RETROSPECTIVE.md - via code-project-retrospective skill.
Emit the Phase's goal-ready gate string. Compose the deterministic gates from steps 4-5.5 into one paste-ready /goal condition so the harness's independent evaluator owns the stop, e.g.: /goal "uv run pytest --cov=src --cov-fail-under=80 exits 0; hermetic CLI smoke for <changed subcommands> passes against the installed binary in a clean venv; <cli> --version and python -m <pkg> --version are byte-identical". Only mechanically checkable clauses. The checklist (step 1) still runs - the goal string is the stop condition, not the verification.

Common Anti-Patterns This Skill Avoids

Big-bang Phase commits. A single 5000-line "Phase N done" commit is unreviewable. The 8-layer cadence with one commit per layer keeps each commit reviewable in <30 minutes.
WIP>1. Two features active at once - or an "also refactor while I'm in here" rider - doubles the integration surface and muddies the layer commits. One active feature in the MANIFEST feature table at a time; finish or park it before activating the next.
Tests-after. Layers that ship implementation first then "we'll add tests later" never get the tests. The verifier-per-step pattern blocks this.
Implicit operational criteria. Every operational criterion must be explicit in the exit-criteria list; "we'll know it works when we use it" is not a criterion.
Single-phase plans for multi-phase scope. If the spec covers multiple Phases, write one PHASE_N_PLAN.md per Phase; do not collapse them.
Spec-as-implementation. The spec lives in the planning repo; the implementation lives in the source repo; PLAN docs bridge them. The spec is not the implementation.
Planning vocabulary in committed code, tests, or user-facing docs. "Phase N", "Layer X", "Amendment Y", "critique B3", "per the plan" belong in the planning repo (workspace/<project>/) and PR descriptions - NOT in src/, tests/, docs/, README.md, or CLAUDE.md of the source repo (engram once shipped 37 files leaking this vocabulary; cleanup commit a6b0d68). Before each layer commit, grep the staged diff for Phase \d, Layer [A-H], Amendment, critique, closes [A-H]\d and rewrite to describe the BEHAVIOR or RATIONALE instead. If a comment genuinely needs a historical decision, link the ADR by number, not the plan doc.