name: kg-sync version: 2.0.0 description: > Keep Knowledge Graph synchronized with live repo state. Scan code (AST), scan experiments (DuckDB), propagate belief changes, detect staleness, generate downstream artifacts, export to sci-llm-writer. last_updated: 2026-03-19 activation: manual invocation: /kg-sync revision_notes: > v2.0.0: Skills 2.0 format. Added 6-layer architecture awareness. Materialization protocol links PRD decisions to KG domain entries. v1.0.0: Initial 7-step workflow. metadata: category: knowledge tags: [knowledge-graph, synchronization, duckdb, staleness, latex] relations: compose_with: - prd-update - knowledge-reviewer depend_on: - knowledge-reviewer similar_to: [] belong_to: []
kg-sync Skill
A reproducible Skill for keeping the Knowledge Graph synchronized with the live repo state and exporting a snapshot to sci-llm-writer for manuscript generation.
6-Layer Architecture Context
This skill operates at Layer 3 (Evidence) and generates Layer 5 artifacts.
See docs/planning/prd-kg-openspec-architecture.md for the full architecture.
When to Use
- After a milestone: new GPU run completed, new model adapter added, new flow implemented
- When
scripts/check_projection_staleness.pyreports stale downstream files - When preparing to write Introduction/Discussion in sci-llm-writer
- Invocation:
/kg-syncor/kg-sync --export-to sci-llm-writer
Generic Reusability
This Skill is designed for any repo following the pattern:
knowledge-graph/
code-structure/ ← from AST (any Python repo)
experiments/ ← from experiment tracker (MLflow, W&B, etc.)
manuscript/ ← from human narrative + projections.yaml
Future repos (foundation-PLR, any research repo) can use the same Skill with their own scanner + experiment tracker.
7-Step Workflow
1. SCAN-CODE → ast.parse() scan of src/ → code-structure/*.yaml
2. SCAN-EXP → DuckDB query over mlruns/ → experiments/*.yaml
2b. PROPAGATE → Check _network.yaml propagation: for modified nodes → requires_review flags
3. STAMP → Write _generated_at.yaml (git commit hash + timestamp)
4. STALENESS → check_projection_staleness.py → stale file report
5. GENERATE → Render stale projections via Jinja2 templates/*.j2 → .tex files
6. VALIDATE → (a) Schema check (b) Idempotency (c) pdflatex compile
7. EXPORT → If --export-to: atomic copy to sci-llm-writer kg-snapshot/
Step-by-Step Reference
Step 1: SCAN-CODE
uv run python scripts/scan_code_structure.py \
--source-dir src/minivess/ \
--output-dir knowledge-graph/code-structure/
What it does:
ast.parse()all .py files in src/minivess/ (CLAUDE.md Rule #16 — no regex)- Extracts: @flow decorated functions, ModelAdapter subclasses, Hydra config nodes, test markers
- Writes:
flows.yaml,adapters.yaml,config-schema.yaml,test-coverage.yaml - FALLBACK: if ast.parse() fails on a file → log warning, skip file, continue
Step 2: SCAN-EXP
uv run python scripts/scan_experiments.py \
--mlruns-dir mlruns/ \
--output-dir knowledge-graph/experiments/
What it does:
- DuckDB analytics query over mlruns/ (already in src/minivess/observability/analytics.py)
- Extracts FINISHED runs only — FAILED/KILLED runs are excluded
- Writes: one YAML per experiment name +
current_best.yaml - MISSING RESULTS: if SAM3/VesselFM absent → insert
{status: "PENDING", reason: "GPU runs not complete"} - HARD ERROR: run_id in existing YAML absent from mlruns/ → error, not silent fallback
Step 2b: PROPAGATE
- Parse
knowledge-graph/_network.yaml propagation:section - For each source node with git changes (compare to last
_generated_at.yamlcommit hash):- Find target nodes in propagation section
- For type=hard/soft: add
requires_review: true+review_reason: "parent changed: <id>" - For type=signal: log only, no YAML change
- Run
scripts/review_prd_integrity.pyto validate _network.yaml consistency - Report propagation tree to user before continuing
Step 3: STAMP
Writes knowledge-graph/code-structure/_generated_at.yaml and knowledge-graph/experiments/_generated_at.yaml:
generated_at: "2026-03-15T10:00:00Z"
git_commit: "<sha>"
git_dirty: false
generator: scripts/scan_code_structure.py
Step 4: STALENESS
uv run python scripts/check_projection_staleness.py \
--projections knowledge-graph/manuscript/projections.yaml
Reports which downstream .tex files are stale (input mtime > output mtime OR output missing).
Step 5: GENERATE
For each stale projection:
uv run python -c "
import jinja2
# Load template + KG data
# Render .tex file
# Write header: %% AUTO-GENERATED from run_id: <id>
"
Key invariants:
- AUTO-GENERATED .tex files NEVER hand-edited (pre-commit hook checks header)
- HUMAN-AUTHORED sections marked
%% HUMAN-AUTHORED: do not overwrite - All numbers from MLflow via DuckDB — never manually typed
- No timestamps/UUIDs in generated .tex — only run_ids (for idempotency)
Step 6: VALIDATE
# (a) Schema check
uv run python scripts/review_prd_integrity.py
# (b) Idempotency check
uv run python scripts/check_projection_staleness.py # must report "all up to date"
# (c) Compile check
cd docs/manuscript/latent-methods-results/
pdflatex latent-methods-results.tex 2>&1 | tail -5
# Must exit 0 with no errors
If ANY check fails → BLOCK export, report specific failure.
Step 7: EXPORT (optional — only with --export-to flag)
# Atomic write: tmp → mv (never partial state in sci-llm-writer)
python3 -c "
import shutil, pathlib, tempfile, yaml
snapshot = {
'generated_at': '...',
'git_commit': '...',
'status': 'complete' if not pending else 'incomplete',
'pending_experiments': [...], # from current_best.yaml
'decisions': [...], # from navigator.yaml
'claims': [...], # from manuscript/claims.yaml
}
tmp = pathlib.Path(tempfile.mktemp())
tmp.write_text(yaml.dump(snapshot))
dest = pathlib.Path('../sci-llm-writer/manuscripts/vasculature-mlops/kg-snapshot/kg-snapshot.yaml')
dest.parent.mkdir(parents=True, exist_ok=True)
shutil.move(str(tmp), str(dest))
"
Export gates (all must pass):
- pdflatex compiles cleanly
- Schema check passes (review_prd_integrity.py)
- Idempotency: running Step 5 twice = identical output
- No orphan run_ids (run_id in snapshot missing from mlruns/ → ERROR)
Key Files
| File | Purpose |
|---|---|
scripts/scan_code_structure.py |
Step 1 — AST scanner |
scripts/scan_experiments.py |
Step 2 — MLflow/DuckDB extractor |
scripts/check_projection_staleness.py |
Step 4 — Staleness detector |
knowledge-graph/templates/*.j2 |
Step 5 — Jinja2 templates |
knowledge-graph/manuscript/projections.yaml |
Step 4 input — dependency map |
knowledge-graph/code-structure/_generated_at.yaml |
Step 3 output — staleness stamp |
knowledge-graph/experiments/_generated_at.yaml |
Step 3 output — staleness stamp |
Invariants
- No timestamps/UUIDs in generated .tex (idempotency)
- All numbers from MLflow via DuckDB — never typed manually
- HUMAN-AUTHORED sections never overwritten
- Export is atomic (tmp + mv — no partial state)
- Orphan run_id = hard ERROR (not silent skip)
- PENDING experiments =
{status: "PENDING"}header in kg-snapshot.yaml
This Skill is Generic
To adapt for another repo:
- Replace
src/minivess/with your source directory in Step 1 - Replace
mlruns/with your experiment tracker path in Step 2 - Replace Jinja2 templates with your downstream output format
- Update
projections.yamlwith your dependency map Everything else (stamp, staleness, validate, export) works unchanged.