docstrings

name: docstrings description: Author, check, and critically review AAanalysis docstrings against the house style of the mature classes (CPP, dPULearn, AAclust). Bundles a deterministic structural checker (scripts/check_docstrings.py — numpydoc shape, named Returns, per-method Examples include, cross-reference integrity, citation-defined, stub-skipping) and a doc-vs-signature drift detector (scripts/doc_signature_drift.py — documented default/type vs the real signature). For deep reviews it fans out per-subpackage reviewers, cross-checks each docstring against the implementation + sibling docstrings + the CONTEXT.md glossary for clarity and consistency, classifies fixes as Necessary / Nice-to-have / Can-be, optionally refines the plan with grill-with-docs, then refactors staged or in one pass. Use when writing or reviewing the docstring of any public class/method/function under aaanalysis/, when a new class must match conventions, or when the user asks to audit, grade, lint, or critically review docstring / API-doc quality or find doc-vs-code drift.

AAanalysis docstrings — author, check, review

Two layers. The structural layer (the checker) proves a docstring has the right shape; the critical layer judges whether it is correct, clear, and consistent with the code. The mature classes CPP, dPULearn, AAclust are the gold standard — every public symbol should read as if written by the same hand.

Authoritative rules: the published Docstring Style Guide (docs/source/index/docstring_guide.rst) is the single source of truth for templates, baselines, citations, versioning, examples, and the rule → checker-code table. This skill enforces it; when in doubt, the guide wins. Detailed pointers: REFERENCE.md.

Quick start

SK=.claude/skills/docstrings/scripts
# 1. Structural shape (defects fail; advisory + stubs don't):
python $SK/check_docstrings.py aaanalysis        # or a single file; --fix for safe edits
# 2. Doc-vs-signature drift (high-signal, the Necessary tier):
python $SK/doc_signature_drift.py aaanalysis
# 3. Example notebooks: every parameter covered + outputs SAVED (else blank on RTD):
python $SK/check_example_notebooks.py examples/   # or a single .ipynb

check_docstrings.py flags the structural codes (summary form, Parameters in __init__, named Returns, per-method Examples include, See Also bullets, Notes *-bullets, CITATION-UNDEFINED, XREF-UNRESOLVED, …). Output splits Defects (exit ≠ 0) from Advisory (CLASS-NO-CITATION, RAISES-UNDOCUMENTED — never fail) and skips UNDER CONSTRUCTION stubs. 0 defects = the convention holds for every real symbol.
doc_signature_drift.py flags DEFAULT-DRIFT and TYPE-DRIFT (documented default/scalar-type ≠ signature), plus noisier PARAM-UNDOCUMENTED / PARAM-EXTRA. It suppresses intentional patterns (signature default None / ut.X constants / non-literal factories / ambiguous unions), so the drift hits are almost always real.
check_example_notebooks.py enforces the example-notebook standard (docstring_guide.rst → Notebook content & structure / Examples & verification): flags EMPTY-OUTPUT (a code cell with no saved output — renders blank on Read the Docs since docs show stored outputs) and UNCOVERED-PARAM (a public parameter the notebook never mentions). Recurring miss ([[feedback-example-notebook-standard]]): every example must introduce every parameter (one cell per group) and be committed with executed aa.display_df outputs. NotebookEdit / a fresh Write clear outputs, so after any cell edit re-run jupyter nbconvert --to notebook --execute --inplace examples/<sub>/<name>.ipynb and confirm outputs before committing.
The docs build is the final gate neither script sees: cd docs && make html must succeed (RST render errors, broken .. include::, stale autosummary stubs surface only there).

Authoring a new class / method

Copy the matching template from the guide / REFERENCE.md:

Class → noun-phrase summary <Full Name> (**ACRONYM**) class ... after a blank first line, ending in a [Key]_ citation only if that reference genuinely describes the class (rule 7); .. versionadded::; then Attributes only. Parameters / Notes / See Also / Examples go in __init__.
Method → verb-phrase summary (no →/+); sections ordered Parameters → Returns → Raises → Notes → Warnings → See Also → Examples; named Returns; end with Examples → .. include:: examples/<name>.rst.
Plot pair → Plotting class for :class: ...; reciprocal See Also.

House-style rules at a glance

Class summary = noun phrase + **ACRONYM** (not a verb); [Key]_ only when one genuinely applies (rule 7).
Parameters live in __init__; the class docstring carries at most Attributes.
Every public method ends with one Examples .. include:: examples/<name>.rst.
Returns is named and matches the returned variable.
Recurring params (df_seq, labels, n_jobs, random_state) reuse the canonical baseline sentence; each docstring is self-contained — document every parameter as its own entry, never lump them into a, b, c : See :meth:other``.
See Also = * :role:Target: gloss. bullets. Every :class:/:meth:/:func: target must resolve (XREF-UNRESOLVED).
Verify every citation — never invent one. Valid only if Key is defined in references.rst and the work describes this symbol. Citations are the exception (core algorithm classes); utilities/loaders/preprocessors carry none — don't reflexively add [Breimann25]_. When unsure, omit.
Header is Warnings (not Warns); no →/+ in summaries.
Every integrated external tool/method is cited + explained in one sentence. A method wrapping/running an external tool (DSSP, Chainsaw, Merizo, AFragmenter, MEME/FIMO, cd-hit, mmseqs2, logomaker, SHAP, …) names it with a [Key]_ citation (its paper in references.rst, not a bare repo URL) and a one-sentence "what it does".
Summary + description. A one-line summary is followed by a short plain-language description (what it does in simple words, the cited tool [Key]_ if any, key :role: cross-refs) — for classes and non-trivial methods/functions. The checker's SUMMARY-ONLY (advisory) flags summary-only docstrings. Write the description (and notebook concept cells) as natural prose — do not prefix paragraphs with a bold rhetorical meta-label (**Mental model.**, **When to use.**, **What it returns.**); state the content directly. (Bold for genuine emphasis or a structural Notes-bullet label is fine.)
Expand abbreviations on first use. The first time an abbreviation/acronym appears in a docstring, spell it out with the short form in parentheses — e.g. "Command Line Interface (CLI)", "Position Weight Matrix (PWM)" — then use the short form. Each docstring is self-contained (re-introduce per docstring); the bold (**ACRONYM**) in a class summary is the class-level form. Glossary terms may use :term: instead; universally standard forms (DNA, 3D, ID, CPU, PDB) need no expansion.

Critical review (audit → tier → refactor)

Baseline. Enumerate public symbols (__init__.__all__). Run both scripts.
Fan-out. One reviewer agent per subpackage (parallel), each reading the docstrings against the signatures, the implementation, a sibling docstring, and the CONTEXT.md glossary.
Verify before asserting. Every specific claim (wrong default/shape, a "requires X", a code bug) is a candidate until checked against source — never report a fix on an unconfirmed false positive.
Tabulate + classify into the three tiers (below).
Refine (optional). grill-with-docs per doc-group to settle how to fix.
Refactor staged or in one pass; finish when check_docstrings.py = 0 defects, drift has no unexplained DEFAULT-DRIFT/TYPE-DRIFT, and make html builds.

Fix tiers: 🔴 Necessary — wrong / misleading / a code bug (drift, copy-paste, contradictions, false claims, return shape errors). 🟡 Nice-to-have — incomplete but not wrong (terse params, missing Notes/See-Also, vague prose). ⚪ Can-be — cosmetic wording. Stubs are exempt.

Two review dimensions (rubric for fan-out agents — rate ✅/🟡/🟠/⚪, note specific & actionable or "—"): (a) per-symbol correctness (mostly the scripts); (b) consistency & clarity — package consistency (terminology/depth vs siblings

the CONTEXT.md glossary), ease of understanding (followable without the source), and faithfulness to the implementation (skim the primary methods). Map a prose-vs-impl mismatch → Necessary; an inconsistency/unclear explanation → Nice-to-have; wording → Can-be. Guardrails: never suggest adding a citation to a utility, nor an Examples include where one exists.

AAanalysis docstrings — author, check, review

Quick start

Authoring a new class / method

House-style rules at a glance

Critical review (audit → tier → refactor)

See also