name: rlm-development description: Guide for developing and contributing to the pyrlm-runtime project. Use when modifying core modules, adding features, fixing bugs, or understanding the architecture of the Recursive Language Model runtime. license: MIT metadata: author: apenab version: "1.0"
RLM Runtime Development Guide
Project Overview
pyrlm-runtime is a minimal runtime for Recursive Language Models (RLMs) based on the MIT CSAIL paper. Instead of feeding full context into an LLM, RLMs treat large context as environment state in a Python REPL. The LLM writes code to inspect, chunk, and recursively query sub-LLMs on smaller pieces.
Architecture
src/pyrlm_runtime/
├── rlm.py # Core RLM loop (entry point: RLM.run())
├── context.py # Immutable Context dataclass with inspection helpers
├── env.py # Sandboxed PythonREPL for code execution
├── policy.py # Execution limits (steps, tokens, subcalls, recursion)
├── cache.py # File-based subcall memoization (.rlm_cache/)
├── trace.py # Execution recording (TraceStep, Trace)
├── router.py # SmartRouter: auto baseline vs RLM selection
├── prompts.py # System prompts (BASE, SUBCALL, RECURSIVE, LLAMA, QWEN)
└── adapters/
├── base.py # ModelAdapter protocol, Usage, ModelResponse
├── openai_compat.py # OpenAI-compatible API wrapper
├── generic_chat.py # HTTP-based generic chat adapter
└── fake.py # Deterministic adapter for testing
Key Design Patterns
1. Protocol-Based Adapters
All model integrations implement the ModelAdapter protocol:
class ModelAdapter(Protocol):
def complete(
self,
messages: list[dict[str, str]],
*,
max_tokens: int = 512,
temperature: float = 0.0,
) -> ModelResponse: ...
When adding a new adapter, implement this protocol. Return ModelResponse(text=..., usage=Usage(...)).
2. Frozen Dataclasses for Immutability
Context, Usage, ModelResponse, and ExecResult are frozen dataclasses. This ensures determinism and thread safety. Never make these mutable.
3. Sandboxed REPL
PythonREPL restricts execution to safe builtins and allowed modules (re, math, json, textwrap). When modifying the REPL:
- Never add
os,sys,subprocess, or network modules to allowed imports - Keep the stdout limit (default 4000 chars) to prevent memory issues
- The
_RegexProxywrapsreto auto-coerce Context/list/tuple inputs to strings
4. The RLM Loop (rlm.py)
The core execution loop in RLM.run():
- Initialize REPL with context as
P(string) andctx(Context object) - Inject helper functions:
peek(),tail(),lenP(),subcall(),ask(), etc. - Loop: call root LLM -> parse response -> execute code or finalize
- LLM outputs Python code (executed in REPL) or
FINAL:/FINAL_VAR:to end - REPL stdout/errors feed back to the LLM on next iteration
5. Policy Enforcement
Policy tracks and limits: steps, subcalls, recursion depth, total tokens, subcall tokens. Each has a corresponding exception (MaxStepsExceeded, etc.). Always respect policy checks when adding new execution paths.
Coding Conventions
Style
- Formatter:
ruff format(line-length=100, double quotes, space indent) - Linter:
ruff check - Type checker:
ty - Python: 3.12+ (use
X | Yunion syntax, notOptional[X])
Commits
- Use conventional commits via commitizen:
feat:,fix:,docs:,refactor:,test:,chore: - Run
uv run ruff format . && uv run ruff check .before committing
Testing
- Framework:
pytest(config inpyproject.toml) - Run:
uv run pytest - Use
FakeAdapterfromadapters/fake.pyfor deterministic tests (no API calls needed) - Test files:
tests/test_*.pymatching source modules
Dependencies
- Only external dependency:
httpx>=0.27 - Dev deps:
pytest,ruff,ty,commitizen - Keep dependencies minimal by design
How to Make Common Changes
Adding a new REPL helper function
- Define the function inside
RLM.run()(it has access tocontext,repl,policy,trace) - Register it with
repl.set("function_name", function) - Document it in the system prompt (
prompts.py) so the LLM knows it exists - Add tests in
tests/test_rlm_loop.pyusingFakeAdapter
Adding a new model adapter
- Create
src/pyrlm_runtime/adapters/your_adapter.py - Implement the
ModelAdapterprotocol (just thecompletemethod) - Return
ModelResponse(text=..., usage=Usage(prompt_tokens=..., completion_tokens=..., total_tokens=...)) - Export from
__init__.pyif it should be part of the public API
Modifying the execution loop
- Read
rlm.pycarefully - the loop has guards, fallbacks, and auto-finalization - Key flags:
repl_executed,subcall_made,invalid_responses,repl_errors - New execution paths must call
policy.check_step()/policy.check_subcall() - Record steps via
trace.add(TraceStep(...))for observability - Test with
FakeAdapterscripted responses to simulate multi-step conversations
Adding a new system prompt variant
- Add the prompt string to
prompts.py - Follow existing patterns (available helpers, output format rules, examples)
- The LLM must know about
FINAL:/FINAL_VAR:syntax to terminate - Pass via
RLM(system_prompt=YOUR_PROMPT)
Modifying Context
Contextis frozen - add new class methods or instance methods, don't change existing signatures- If adding document-level operations, handle both
context_type="string"and"document_list" - Update
metadata()if new fields are needed by prompts - Test in
tests/test_context.py
Public API (exported from __init__.py)
Core: RLM, Context, PythonREPL, ExecResult
Config: Policy, PolicyError, MaxStepsExceeded, MaxSubcallsExceeded, MaxRecursionExceeded, MaxTokensExceeded
Adapters: ModelAdapter, ModelResponse, Usage, FileCache
Routing: SmartRouter, RouterConfig, RouterResult, ExecutionProfile, TraceFormatter
Tracing: Trace, TraceStep
Environment Variables
| Variable | Purpose | Example |
|---|---|---|
LLM_BASE_URL |
API endpoint | http://localhost:11434/v1 |
LLM_MODEL |
Root model name | qwen2.5-coder:14b-instruct |
LLM_SUBCALL_MODEL |
Subcall model (can be smaller) | qwen2.5-coder:7b-instruct |
LLM_API_KEY / OPENAI_API_KEY |
API authentication | |
LLM_LOG_LEVEL |
Logging verbosity | INFO, DEBUG |
PARALLEL_SUBCALLS |
Enable parallel subcalls | 1 |
SHOW_TRAJECTORY |
Show execution trajectory | 1 |
Running Commands
# Run all tests
uv run pytest
# Format code
uv run ruff format .
# Lint
uv run ruff check .
# Type check
uv run ty check
# Build package
uv build
# Run an example
uv run python examples/minimal.py