name: python-package-builder description: Build a Python package shipped via PyPI/uv following the foundation's 8-layer cadence. Encodes pyproject.toml + ruff + mypy strict + pytest + uv lockfile + commit-per-layer + Phase docs (PLAN + CODE_COMPLETE) + verify-before-done + comprehension-gate. Use when scaffolding a new Python project from a spec OR adding a new Phase to an existing one. tools-needed: file read/write, bash for uv invocations
Python Package Builder
Foundation: This skill inherits the cadence from
skills/code-projects/code-project-foundation/SKILL.md. All exit-criteria split rules, plan + checkpoint contracts, and verify-before-done + comprehension-gate integrations defined there apply here.
Purpose
Bootstrap a Python package or implement a new Phase against an existing one using the foundation cadence. Produces the same shape every time: src/<package>/ layout, tests/ mirror, docs/PHASE_<N>_PLAN.md + docs/PHASE_<N>_CODE_COMPLETE.md, pyproject.toml PEP 621 + ruff + mypy strict + pytest config, GitHub Actions CI matrix, ADRs in docs/adr/.
When To Use
- Scaffolding a new Python package from a spec (the spec lives in the planning repo, the source repo lives at
~/repos/github.com/<your-username>/<project>/). - Implementing the next Phase of an existing Python project (extending an already-scaffolded repo).
- Adding a new layer to a Phase already in progress.
When NOT To Use
- Non-Python projects - use the equivalent language-specific builder (
ts-package-builder,go-module-builder, etc.) or hand-written equivalent until one exists. - One-off scripts / Jupyter notebooks - the cadence overhead exceeds the gain.
- Patches on already-shipped projects - use direct edits +
verify-before-done.
The Build Loop
For each layer (A through H per the foundation):
Step 1: Read the plan
Read docs/PHASE_<N>_PLAN.md Layer X section: files to create/modify, inline -> verify: lines, tests required, risks.
Step 2: TDD-paired implementation
Per step: write tests first (the verifier names the test file + assertions), implement, run the test file in isolation (uv run pytest <test-file> -x), run mypy FULL-REPO (uv run mypy), run ruff on touched files (uv run ruff check <touched-file>); fix failures before the next step.
Full-repo mypy is non-negotiable: scoped mypy is unreliable under strict mode (cross-file strictness needs the whole graph) and the pre-commit hook runs full-repo anyway (engram Phase 6: two commit bounces from scoped-green layers).
Step 3: Layer-end full validation
Before committing the layer (per Foundation Pattern 2 - format BEFORE manual checks so the pre-commit hook is a no-op):
uv run ruff format
uv run ruff check --fix
uv run mypy
uv run pytest -q
All must be green. Coverage check optional per-layer; required at Phase exit.
Step 4: Commit the layer
Commit message structure:
feat(<scope>): Layer X - <one-sentence summary>
Phase N Layer X (Steps M1-M2):
* <bullet describing source change 1>
* <bullet describing source change 2>
* <bullet describing test change>
Full suite: <count> passed (was <baseline>); ruff + mypy clean.
Sign + DCO (git commit -S -s) per global CLAUDE.md Git Commits. If the layer commit is a substantive checkpoint, session-wrap --checkpoint per global Session Management.
Default Project Skeleton
~/repos/github.com/<your-username>/<project>/
├── pyproject.toml # PEP 621 + ruff + mypy + pytest config
├── uv.lock # locked deps via uv
├── .pre-commit-config.yaml # ruff + mypy on changed files
├── .github/workflows/ci.yml # 3.11+3.12 x macOS+Ubuntu
├── LICENSE # Apache-2.0 (or MIT, per project)
├── README.md # quickstart <5 min + local install + CI status
├── CHANGELOG.md # Keep a Changelog
├── CONTRIBUTING.md # how to contribute
├── .gitignore # Python + IDE + .venv
├── docs/
│ ├── adr/ # ADRs numbered 001+
│ ├── PHASE_1_PLAN.md # written by deep-plan
│ ├── PHASE_1_CODE_COMPLETE.md # written by verify-before-done at Phase end
│ ├── PHASE_<N>_PLAN.md # for each subsequent phase
│ └── <SETUP_GUIDE>.md # operator-facing guide(s)
├── src/<package>/
│ ├── __init__.py
│ ├── errors.py # custom exception hierarchy
│ ├── models/ # Pydantic boundary models
│ ├── utils/ # stateless helpers
│ ├── <subsystems>/ # one dir per major subsystem
│ ├── cli/
│ │ ├── __init__.py # typer app + register pattern
│ │ └── <command>.py # each subcommand its own module
│ └── diagnostics/ # doctor + check_codes
├── tests/
│ ├── conftest.py
│ ├── fixtures/ # synthetic corpus generators
│ ├── unit/ # mirror of src/<package>/
│ ├── integration/ # cross-subsystem tests
│ └── properties/ # hypothesis tests
└── bench/
└── <metric>.py # NFR benchmarks (if applicable)
pyproject.toml Defaults
[project]
name = "<package-name>"
version = "0.1.0"
requires-python = ">=3.11"
description = "..."
license = { text = "Apache-2.0" }
authors = [{ name = "<Your Name>", email = "<you@example.com>" }]
[project.scripts]
<cli-name> = "<package>.cli:app"
[tool.ruff]
line-length = 100
target-version = "py311"
[tool.ruff.lint]
select = ["ALL"]
ignore = [
"D203", "D213", # docstring conventions
"COM812", "ISC001", # ruff format conflict
"FBT001", "FBT002", # boolean positional args sometimes appropriate
"TRY003", # long messages in exceptions
]
[tool.ruff.lint.per-file-ignores]
"tests/**" = ["S101", "PLR2004", "ANN"]
[tool.mypy]
strict = true
python_version = "3.11"
[tool.pytest.ini_options]
testpaths = ["tests"]
asyncio_mode = "auto"
addopts = "--strict-markers --strict-config"
CI Matrix Defaults
.github/workflows/ci.yml exercises Python 3.11 + 3.12 across macOS + Ubuntu (4 cells):
strategy:
matrix:
python-version: ["3.11", "3.12"]
os: [ubuntu-latest, macos-latest]
The 8 Layers, Realized in Python
For a typical Python package, the layers tend to look like this. Adapt per project.
Layer A - Models + errors + constants
- Pydantic models in
src/<package>/models/withmodel_config = ConfigDict(extra="forbid"). - Exception hierarchy in
src/<package>/errors.pyrooted in<Package>Errorwitherror_code: strclass attribute. - Constant codes / enums in dedicated modules (e.g.,
diagnostics/check_codes.py) so tests can iterate.
Layer B - Stateless utilities
- Pure functions: file naming, fingerprinting, atomic write, run_command wrapper, type-safe subprocess invocation.
- Each utility in its own module; tests mirror the structure exactly.
Layer C - Core engine / state machine
- The package's central abstraction: storage layer, sync coordinator, request router, etc.
- Use
enum.StrEnumfor explicit states; encode allowed transitions in a dict at module level. - All async functions wrap blocking calls via
asyncio.to_thread; test withpytest-asyncio. - Foundation Pattern 1 applies: any sync method that internally calls into asyncio must have a "no running loop" test.
Layer D - Cross-cutting safety / probes
- Locks, validation gates, identity checks. Each runs once at startup AND on a per-cycle basis where appropriate.
- Integrate with the diagnostics layer so the same logic surfaces both at startup-FAIL and under
<package> doctor.
Layer E - Diagnostics / doctor
- A
run_diagnostics(config)function returning a structured report. - One
_check_*function per check code from Layer A constants module. - Doctor handles non-applicable preconditions gracefully (Foundation Pattern 3).
Layer F - CLI + entry-point wiring
typerapp withregister(app)pattern: each command in its own module attaches viaregister().- The root
cli/__init__.pyimports each command module and callsregister(app). - Startup ordering for long-running commands (serve / daemon) follows: load config → run probes → acquire lock → open resources → build server → run loop → drain on shutdown.
MUST: wire integration callsites BEFORE Layer G tests run. Whenever Layer C/D/E introduces a new module that participates in a running flow (capture gate, routing dispatcher, push queue, doctor probe, MCP tool handler, CLI subcommand handler), wire it into the actual serve / daemon / CLI startup path inside Layer F itself. Do not defer the wiring to "after Layer G", do not wire it inside test fixtures only, do not register the subcommand module without also re-exporting it from cli/__init__.py's registration list.
Confirmed at 3 occurrences in engram (Phase 3 wiring fix ef47801; Phase 4 initial deferral; Phase 4 public-release follow-up 633a05d → f9df9eb): every time the wiring was deferred past Layer F, Layer G integration tests passed against handler-level logic but the running binary shipped with the new behavior disconnected. The hermetic CLI smoke (Phase Exit Discipline step 5) catches this if you run it — but the right place to prevent it is making Layer F responsible for the integration callsite, not just the handler module.
Test rule: if a Layer G integration test passes but the equivalent hermetic CLI smoke (spawn the installed binary against a tmp_path) would fail because the new behavior is unreachable from the user-facing entry point, your Layer F is incomplete.
Layer G - Integration tests
tests/integration/covers cross-subsystem flows.tests/properties/covers hypothesis property tests for invariants.- Hermetic: each test owns its own
tmp_path; no network unless explicitly mocked.
Layer H - Docs
- ADR for any architectural decision touched by the Phase.
docs/PHASE_<N>_CODE_COMPLETE.mdlists exit criteria with evidence per code-side criterion.- Setup guide (e.g.,
docs/<SETUP>.md) for any operator-facing flow. - README "Status" section + Roadmap table updated.
- CHANGELOG
[Unreleased]grouped under Added / Changed / Security.
Phase Exit Discipline
At the end of each Phase, BEFORE marking complete:
- Run
verify-before-done- produces the explicit verification checklist. - Run
comprehension-gateStep 5 if Phase added >200 LOC of source code. - Run NFR benchmarks (if the project has them).
- Run full coverage gate:
uv run pytest --cov=src --cov-fail-under=80. - Run hermetic CLI smoke against the installed binary (the load-bearing addition; see Foundation Pattern 5 "Test the binary, not just the suite"). For each subcommand introduced or modified by the Phase: spawn the actual binary via
subprocessagainst atmp_pathworkspace + assert observable state (filesystem layout, doctor output rows, JSON-RPC tool responses, stderr classification). The test suite exercises handlers; the CLI smoke exercises wiring between handlers and the user-facing binary - they catch disjoint bug classes. Without this step, a Phase can ship with passing tests + clean lint + green coverage AND broken CLI commands. Reference: every prior engram Phase has surfaced 3 wiring bugs at this gate that the unit tests missed (Phase 2 init/clone-vault/docs; Phase 3 doctor/serve/import). Skipping is voluntarily accepting that recurring failure mode. 5.5. Assertpython -m <pkg>parity with the console script. Both must produce the same--versionoutput and resolve the same subcommand surface. The spawn-dance pattern (Phase 5 daemon mode) doespython -m engram daemon start ...from inside the proxy; without a top-levelsrc/<pkg>/__main__.pythat re-exports the typer/click app, that command silently 500s withNo module named <pkg>.__main__. Empirically engram Phase 5 shipped this regression and only caught it post-merge when the user reported "Failed to connect" on their MCP client. Required check in the CI smoke matrix: install the wheel in a clean venv, then assertengram --versionandpython -m engram --versionproduce byte-identical output. The fix is mechanical (one__main__.pyfile) but the absence is invisible without this check. - Author
docs/PHASE_<N>_CODE_COMPLETE.mdwith the code-side / operational split. - Update workspace files in the planning repo:
<planning-repo>/workspace/<project>/MANIFEST.md- add the new Phase row + update the feature table (a feature's state becomespassingonly after running its Verify command this session).<planning-repo>/workspace/<project>/PENDING_TASKS.md- operator action items.<planning-repo>/workspace/<project>/PHASE_<N>_RETROSPECTIVE.md- viacode-project-retrospectiveskill.
- Emit the Phase's goal-ready gate string. Compose the deterministic gates from steps 4-5.5 into one paste-ready
/goalcondition so the harness's independent evaluator owns the stop, e.g.:/goal "uv run pytest --cov=src --cov-fail-under=80 exits 0; hermetic CLI smoke for <changed subcommands> passes against the installed binary in a clean venv; <cli> --version and python -m <pkg> --version are byte-identical". Only mechanically checkable clauses. The checklist (step 1) still runs - the goal string is the stop condition, not the verification.
Common Anti-Patterns This Skill Avoids
- Big-bang Phase commits. A single 5000-line "Phase N done" commit is unreviewable. The 8-layer cadence with one commit per layer keeps each commit reviewable in <30 minutes.
- WIP>1. Two features
activeat once - or an "also refactor while I'm in here" rider - doubles the integration surface and muddies the layer commits. Oneactivefeature in the MANIFEST feature table at a time; finish or park it before activating the next. - Tests-after. Layers that ship implementation first then "we'll add tests later" never get the tests. The verifier-per-step pattern blocks this.
- Implicit operational criteria. Every operational criterion must be explicit in the exit-criteria list; "we'll know it works when we use it" is not a criterion.
- Single-phase plans for multi-phase scope. If the spec covers multiple Phases, write one PHASE_N_PLAN.md per Phase; do not collapse them.
- Spec-as-implementation. The spec lives in the planning repo; the implementation lives in the source repo; PLAN docs bridge them. The spec is not the implementation.
- Planning vocabulary in committed code, tests, or user-facing docs. "Phase N", "Layer X", "Amendment Y", "critique B3", "per the plan" belong in the planning repo (
workspace/<project>/) and PR descriptions - NOT insrc/,tests/,docs/,README.md, orCLAUDE.mdof the source repo (engram once shipped 37 files leaking this vocabulary; cleanup commita6b0d68). Before each layer commit, grep the staged diff forPhase \d,Layer [A-H],Amendment,critique,closes [A-H]\dand rewrite to describe the BEHAVIOR or RATIONALE instead. If a comment genuinely needs a historical decision, link the ADR by number, not the plan doc.
See Also
code-project-foundation/SKILL.md- the parent foundation (language-agnostic).code-project-retrospective/SKILL.md- what to do AFTER each Phase ships.~/.claude/skills/verify-before-done/- global verification skill.~/.claude/skills/comprehension-gate/- global comprehension skill.superpowers:deep-plan- global plan-authoring skill.