ship - SKILL.md Agent Skill

name: ship description: Run the full ship flow — verify quality, ensure test coverage, update artifacts, smoke test, push, create PR, and merge when CI is green. Trigger when user says "ship", "ship it", "fix and ship", or asks to push and merge a branch. user_invocable: true metadata: internal: true

Run the full ship flow: verify quality, ensure test coverage, update artifacts, smoke test, then push, create PR, and merge when CI is green.

This skill implements the complete "Shipping" definition and Pre-PR Checklist from AGENTS.md. When the user says "ship" or "fix and ship", execute ALL phases below — not just the push/merge steps.

Arguments

$ARGUMENTS - Optional: description of what is being shipped (used for PR title/body context and to scope the quality checks)

Instructions

Phase 1: Pre-flight

Confirm we're NOT on main or master
If HEAD is detached and the current task has local changes ready to ship, create a branch first instead of stopping
Confirm whether uncommitted changes belong to the task being shipped
If the worktree is dirty only because of the current task, keep going: validate, commit, and ship those changes
If unrelated uncommitted changes exist, stop and tell the user

Phase 2: Test Coverage

Review the changes on this branch (use git diff origin/main...HEAD and git log origin/main..HEAD) and ensure comprehensive test coverage:

Identify all changed code paths — every new/modified function, module, builtin, tool
Verify existing tests cover the changes — run just test and check for failures (never cargo test --all-features as a single invocation; see AGENTS.md)
Write missing tests for any uncovered code paths:
- Positive tests: happy path, valid inputs, expected state transitions
- Negative tests: invalid inputs, error conditions, boundary cases, permission failures, missing resources
- Security tests: if change touches parser, interpreter, VFS, network, git, or user input — add tests per specs/security-testing.md
- Compatibility tests: if change affects Bash behavior parity — add differential tests comparing against real Bash
Run all tests to confirm green: just test
If any test fails, fix the code or test until green

Phase 3: Artifact Updates

Review the changes and update project artifacts where applicable. Skip items that aren't affected.

Specs (specs/): if the change adds/modifies behavior covered by a spec, update the relevant spec file to stay in sync
Threat model (specs/threat-model.md): if the change introduces new attack surfaces, external inputs, authentication/authorization changes, or data handling — add or update threat entries using the TM-<CATEGORY>-<NNN> format and add // THREAT[TM-XXX-NNN] code comments at mitigation points
AGENTS.md: if the change adds new specs, commands, or modifies development workflows — update the relevant section
Limitations (specs/limitations.md): if a limitation was added or lifted, update the table; if builtins changed, run just regen-builtins and commit the JSON
Documentation (crates/bashkit/docs/): if the change affects public APIs, tools, or features — update the relevant guide markdown files

Phase 3b: Code Simplification

Review all changed code for opportunities to simplify:

Identify duplication — look for repeated patterns that could share a helper or be consolidated
Reduce complexity — simplify nested logic, long match arms, deeply indented blocks
Remove dead code — unused functions, unreachable branches, commented-out code
Check naming — ensure functions, variables, and types have clear, descriptive names
Verify no over-engineering — remove unnecessary abstractions, feature flags, or indirection that don't serve the current change

If simplification changes are made, loop back to Phase 2 to verify tests still pass.

Phase 3c: Security Review

Analyze all changed code for security vulnerabilities:

Input validation — check that user-supplied data (script input, file paths, environment variables, command arguments) is validated before use
Injection risks — look for command injection, path traversal, environment variable injection, or shell metacharacter issues
Sandbox escapes — if changes touch VFS, builtins, or process execution, verify they cannot escape the sandbox (see specs/threat-model.md)
Resource exhaustion — check for unbounded loops, unbounded allocations, or missing limits on user-controlled sizes
Error handling — ensure errors don't leak internal state, file paths, or sensitive information
Unsafe code — review any unsafe blocks for soundness; prefer safe alternatives

If security issues are found, fix them, add regression tests, and update specs/threat-model.md if a new threat category is identified.

Phase 3d: Design Quality Review

Review all changed code for shortcuts, lazy abstractions, and premature compromises. This is a greenfield project — correctness and clean design matter more than compatibility or speed of delivery. Take the time to find better abstractions.

No shortcut abstractions — reject copy-paste patterns disguised as "good enough". If two things look similar, determine whether they are actually the same concept. If yes, unify properly. If no, keep them separate with clear names — don't force a bad shared interface.
No lazy wrappers — every abstraction must earn its place. A wrapper that just forwards calls adds indirection without value. An enum variant that exists "just in case" is dead weight. If a layer doesn't add meaning, remove it.
Right abstraction level — check that traits, types, and module boundaries model the actual domain, not implementation accidents. A StringOrList enum is a parser leak; a Pattern type is a domain concept. Prefer the latter.
No stringly-typed interfaces — look for magic strings, string matching on variant names, ad-hoc parsing of structured data. Replace with enums, newtypes, or proper typed APIs.
No premature generics — a function generic over three trait bounds used in one call site is harder to read than a concrete function. Generalize only when there are (or will immediately be) multiple real callers.
No compatibility shims — this is greenfield. If an interface is wrong, change it. Don't add adapters, conversion layers, or deprecated alternatives. Fix call sites instead.
Error types are first-class — check that error enums are specific and actionable, not catch-all Other(String) buckets. Each variant should guide the caller's recovery logic.
Module boundaries enforce invariants — if a pub field or function lets outside code break a module's assumptions, tighten visibility. Constructors and accessors exist to protect invariants, not to be "nice".

If design issues are found, refactor, update tests (loop back to Phase 2), and update specs if the change alters documented behavior.

Phase 4: Smoke Testing

Smoke test impacted functionality to verify it works end-to-end:

CLI changes: run just run with relevant commands, verify output
Builtin/interpreter changes: run example scripts via just run-script <file> to verify behavior
Tool changes: if LLM tool interface changed, run a quick tool invocation test
Python bindings: if Python code changed, run ruff check crates/bashkit-python && ruff format --check crates/bashkit-python

If smoke testing reveals issues, fix them and loop back to Phase 2 (tests must still pass).

Phase 5: Quality Gates

git fetch origin main && git rebase origin/main

If rebase fails with conflicts, abort and tell the user to resolve manually

just pre-pr

If it fails, run just fmt to auto-fix, then retry once
If still failing, stop and report

Phase 6: Push and PR

git push -u origin <current-branch>

Check for existing PR:

gh pr view --json url 2>/dev/null

If no PR exists, create one:

Title: conventional commit style from the branch commits
Body: summary of What, Why, How, and what tests were added/verified
Use gh pr create

If a PR already exists, update it if needed and report its URL.

Phase 7: Wait for CI and Merge

Check CI status with gh pr checks (poll every 30s, up to 15 minutes)
If CI is green, merge with gh pr merge --squash --auto
If CI fails, report the failing checks and stop
NEVER merge when CI is red

Phase 8: Post-merge

After successful merge:

Report the merged PR URL
Done

Rules

Phases 2-4 (tests, artifacts, simplification, security review, smoke testing) are the quality core — do NOT skip them.
The $ARGUMENTS context helps scope which tests, specs, and smoke tests are relevant.
For "fix and ship" requests: implement the fix first, then run /ship to validate and merge.
Never close a half-done issue. If the PR only covers a subset of the issue's tasks/checkboxes, use Part of #N instead of Closes #N or Fixes #N. Only use closing keywords when every task in the issue is complete. Premature closure hides remaining work.