name: refactor description: "Execute recursive, dependency-aware context modularization and traversal for any code or document corpus. Use when you need to atomize context, classify and interconnect concepts, infer constraints and dependencies, create bridge files/symlinks/hyperedges, and produce a refined directed acyclic dependency model with progressive confidence gating."
Refactor
Purpose
This skill provides a standalone best-practice pipeline for turning an undifferentiated context surface into a compact modular graph. It is designed for context-engineering workflows, repo rewrites, architecture consolidation, and knowledge-base triage where semantic structure and dependency correctness matter more than cosmetic refactoring.
The process is recursive. It keeps extracting structure, decomposing modules, creating bridges between distant but constrained dependencies, and re-running until a quality threshold is reached.
Core operating model
Use this skill in two modes:
analysisto map and score the current state of context.orchestrationto map, refactor, and emit the final index.
In both modes, the same primitives apply:
- Atom-of-thought abstraction: break text into stable atomic semantic units.
- Thoughtbox state: persist cycle-local decisions, open questions, constraints, and unresolved risk.
- Infranodus-style graph analytics: build weighted term/coherence graphs, detect communities, and route cross-connections through bridge objects.
- Hyperedge bridging: preserve weak and long-range dependency semantics through explicit bridge files and references.
- Recursive DAG convergence: ensure the final structure is acyclic or minimized into explicit cycle capsules.
When to use this skill
- When context is diffuse, highly duplicated, or weakly modularized.
- When dependencies are implicit and need explicit graph formalization.
- When you need a canonical modular map with bridges and recursion confidence.
- When you need a reusable orchestration framework that is minimally dependent and can run standalone.
Trigger terms
Use when the request contains any of:
- recursive refactor
- context graph
- module decomposition
- dependency DAG
- bridge file
- concept extraction
- acyclic index
- "95%" or high-confidence optimization pass
Entry requirements
- The input corpus path (files/directories).
- Optional output directory.
- Optional constraints and hard exclusions.
- Confidence target (default
0.95).
Standard process
- Collect context units.
- Extract atoms, tags, and concept signatures.
- Build an undirected semantic graph and directional dependency graph.
- Detect communities and recursively decompose unstable modules.
- Detect cycles and create cycle capsules when necessary.
- Create bridges for distant yet meaningful edges.
- Emit module-level files, symlinks/references, and one main index.
- Run a final Markov-style confidence pass and stop only when criteria are met or max recursion depth is reached.
Directory layout produced
The orchestrated output defaults to:
index.mdmain entry point and formal traversal order.state.jsonrun state, thresholds, and convergence summary.modules/<module-id>/README.mdmodule boundaries and local interfaces.modules/<module-id>/atoms.jsoncanonical atoms for that module.modules/<module-id>/files.mdlinked source members.bridges/<a>__<b>.mdcross-module bridge files.bridges/symlinks/<module-id>/optional symlink views.graph.jsonfull graph dump and computed metrics.atlas.jsonlserialized thoughtbox states per recursion epoch.
If a repo style requires a different layout, mirror these artifacts with equivalent semantic roles.
Standalone script integration
The bundled script is intended to be copied/adapted into any workspace:
/Users/mikhail/Projects/Context-Engineering/Distil/refactor/scripts/rlm_orchestrator.py
Use this as the default execution engine. It has zero mandatory third-party dependencies.
Execution procedure
1) Atomization
For every file/segment:
- Normalize Unicode spacing, comments, and line breaks.
- Tokenize and extract n-grams in controlled windows.
- Strip stop terms, noisy tokens, and identifiers that do not carry semantic weight.
- Keep top weighted atoms by TF-IDF-like scoring.
- Emit atom vectors and tags.
2) Thoughtbox pass
Maintain a persistent per-pass state object:
observations: discovered constraints, naming conventions, domain-specific terms.assumptions: inferred links without high confidence.decisions: hard commitments and rationale.risks: cycle candidates, overloaded modules, low-confidence merges.open_questions: unresolved semantic ambiguity requiring explicit arbitration.
The thoughtbox object is always written to disk each recursion iteration so the process is auditable.
3) Graph build
Build two edge sets:
- semantic edges from similarity of atom vectors.
- dependency edges from explicit references/imports/mentions in text.
Store edge strength as floating scores in [0,1].
4) Recursive decomposition
Use these rules recursively for each module:
- compute modularity signal from edge density and interface breadth.
- split on weak edge boundaries when cohesion is below threshold.
- rerun atomization inside sub-modules as needed for local precision.
- stop splitting when either:
- cohesion is stable by threshold,
- module size is already minimal,
- or max recursion depth is hit.
5) DAG synthesis
- Detect cycles in directed dependency graph.
- Collapse each SCC into a cycle capsule when needed.
- Sort capsules/modules topologically.
- Re-link edges through capsules and bridges so traversal is formally acyclic at the macro level.
6) Bridge and hyperedge construction
For distanced but important edges:
- emit bridge markdown documents with explicit rationale, source path, and target path.
- emit references into both modules.
- emit symlinks when filesystem permits, else canonical reference files.
Do not convert every weak edge to bridge. Reserve bridges for semantically dense relations that are:
- frequently co-activated,
- constrained by dependency,
- or repeated across traversal epochs.
7) Confidence and finalization
Run a Markov-chain traversal where probability mass moves along dependency direction. Converge on a confidence measure across nodes and modules.
Stop when:
global_confidence >= configured target(default0.95), ormax_iterationsreached,- and no high-impact structural changes happened in the last two epochs.
8) Index formalization
Emit index.md with:
- module order (topological),
- bridge registry,
- rationale per module,
- unresolved risk log,
- action list for the next maintenance cycle.
Minimal dependency policy
- Never hard-require external services.
- Keep all algorithms local and deterministic.
- Prefer deterministic tie-breakers.
- Preserve idempotence where possible.
Script invocation
Default:
python3 scripts/rlm_orchestrator.py \
--input /path/to/context \
--output .rlm-out \
--max-depth 4 \
--min-similarity 0.05 \
--target-confidence 0.95 \
--include-dot-git false
For audit-only pass:
python3 scripts/rlm_orchestrator.py --input . --output .rlm-out --mode analysis
Important:
- Use
analysisbefore destructive file moves. - Keep source files in read-only mode during analysis.
- Materialize symlink-based bridges only during orchestration pass.
Quality gates
Abort and revise if any of these are true:
- Core modules exceed 200 files without strong internal cohesion.
- More than 20% of edges become unidirectional contradictions.
- Multiple bridges point to stale paths.
- Cycles remain unresolved in macro graph.
- Confidence is under target by more than 5% with no structural trend.
Failure handling
- If semantic extraction is noisy, tighten token filters and rerun with domain stopwords.
- If cycles proliferate, add explicit interface contracts first, then rerun.
- If bridge count explodes, raise bridge threshold and keep only high-persistence edges.
- If symlink creation fails, emit reference stubs with same semantics.
Safe fallback behavior
- If recursive decomposition cannot improve structure, stop and produce a strict audit report instead of over-splitting.
- If path mapping is ambiguous, preserve the original source path and expose an explicit
manual_decision_requiredflag.
Reference map
/Users/mikhail/Projects/Context-Engineering/Distil/refactor/scripts/rlm_orchestrator.pyis the implementation core./Users/mikhail/Projects/Context-Engineering/Distil/refactor/references/process-model.mdexplains the conceptual mapping./Users/mikhail/Projects/Context-Engineering/Distil/refactor/references/ulcb1-tree-search.mdexplains recursive stopping criteria.
Non-goals
- It does not claim language/semantic perfection.
- It does not auto-run formatters, compilers, or tests.
- It does not guarantee no context loss; every split creates explicit rationale and provenance.
Final requirement
The skill run is complete when the agent can answer:
- What are the modules?
- Why are they grouped?
- How do they depend?
- Where are the bridges and why?
- What part still requires a human decision?
Only then should the pipeline emit its final index and confidence summary.
Schema-aware control model (v2)
graph.jsoncarries a strict contract:schema_version,run,modules,bridges,cycles,graph,topological_order,epochs,confidence,target_confidence,selected_epoch,status,thoughtbox.runis a compact machine control surface: thresholds, recursion limits, and bridge policy.epochsis the closed-loop audit trail for recursive refinement (with epoch state and transition status).- Each module includes:
module_id,label,members,members_countcohesion,atom_densityoutgoing_dependencies,incoming_dependenciesontologywithcore_terms,properties,cardinalitybridge_refs
- Each bridge includes:
bridge_id,source_module,target_module,strengthnotes,justification,bridge_file
E2E red-team test suite expectations
- Run
analysismode first, thenorchestration. - Verify outputs exist:
graph.json,index.md,state.json,atlas.jsonl,modules/*,bridges/*. - Verify recursive control adaptation:
- graph rebuild occurs across epochs (
epochs[].min_similarityshould show adjustment when target is not met), - at least one epoch improves module signature or module counts before convergence,
- final
selected_epochis selected by confidence progression and structural stability.
- graph rebuild occurs across epochs (
- Verify ontology and relational integrity:
- each module ontology includes
core_terms[].member_coverageandproperties[].supporting_members, - each bridge includes
supportandbridge_filecoherence.
- each module ontology includes
- Verify failure-safety invariants:
- no stale bridge references in
analysismode, - no path explosion beyond
max_bridges, - deterministic output for repeated runs with same inputs.
- no stale bridge references in
- Verify control stability:
selected_epoch >= 1confidenceshould monotonically improve or stabilize across final state- no stale duplicate bridges
- Verify schema checks:
modulesentries satisfy required keys and typesbridgesentries include pathablebridge_filetopological_orderonly contains known module identifiers
- Verify adversarial resilience:
- empty input path =>
no-sourcestatus - cyclic dependency inputs produce cycle capsule entries
- unsupported files are safely ignored
- high-threshold run still emits analyzable structure without crashing
- empty input path =>
Reference map
/Users/mikhail/Projects/Context-Engineering/Distil/refactor/references/ontology-schema.mdfor schema details./Users/mikhail/Projects/Context-Engineering/Distil/refactor/references/process-model.mdfor the conceptual model./Users/mikhail/Projects/Context-Engineering/Distil/refactor/references/ulcb1-tree-search.mdfor recursive control and stopping conditions.