name: lang1-monkey-interpreter description: >- Authors and extends the lang1 Monkey-style interpreter (Python lexer, Pratt parser, tree-walking evaluator with closures, arrays, builtins, and classes). Use when editing or adding features in cursor-generated/lang1, when the user mentions Monkey/REPL/tokens/parser/evaluator/AST for this project, or when implementing language semantics matching this codebase.
lang1 Monkey-style interpreter
This project is a small interpreter in Python for a Monkey-inspired language: a REPL and file runner with integers, strings, booleans, let, fn, calls, if/else, return, arrays and indexing, comparison and arithmetic operators, classes with new, this, methods, and instance fields. The implementation follows the classic book-style split: tokens → lexer → AST → parser → evaluator, with no bytecode VM—execution is recursive AST interpretation.
Read reference.md for precedence tables, object model details, class/this semantics, builtins, and extension checklists beyond the day-to-day workflow below.
When this skill applies
- Any change under
lang1/that touches language behavior or interpreter internals. - Adding syntax, operators, statements, runtime values, or builtins.
- Debugging parse errors (
parser.errors), runtime errors (ErrorObj), or scoping (closures,this, class envs). - Keeping new code consistent with existing patterns (dataclass AST nodes,
MonkeyObjectsubclasses,Environment).
Project layout and responsibilities
| File | Responsibility |
|---|---|
tokens.py |
TokenType enum, KEYWORDS map, Token (type + literal string). |
lexer.py |
Lexer: single-pass character scanning, next_token(). |
ast_nodes.py |
AST as @dataclass types under Expression / Statement / Node. |
parser.py |
Parser: two-token lookahead, Pratt-style expression parsing, statement parsing. |
evaluator.py |
monkey_eval, runtime objects (*Obj), Environment, builtins. |
main.py |
run(), REPL, run_file(); glues lexer → parser → evaluator. |
examples.monkey |
Example programs (factorial, higher-order functions, classes). |
Run from the lang1 directory:
python main.py # REPL
python main.py examples.monkey # execute file
There is no requirements.txt; the code uses the Python standard library only (enum, dataclasses, typing).
Surface syntax (what programs look like)
Reserved words: let, fn, true, false, if, else, return, class, new, this.
- Bindings:
let name = expr;(trailing semicolon often accepted when present). - Functions:
fn(a, b) { stmts }— body is a brace block; last expression value can propagate from blocks used asifbranches. - Calls:
f(arg1, arg2). - Conditionals:
if (cond) { ... } else { ... }— parentheses around condition are parsed as grouped expression. - Arrays:
[1, 2, 3], indexarr[i]. - Operators:
+ - * / %, comparisons== != < > <= >=, unary!and-, logical truthiness per evaluator (not short-circuit&&/||in the grammar—useif). - Classes:
class Dog {
fn init(name, age) { this.name = name; this.age = age; }
fn bark() { puts("woof"); }
}
let d = new Dog("Rex", 5);
d.bark();
d.name;
Instance construction runs init if defined; fields are dynamic properties on the instance (this.x = ... in init, or external assignment obj.field = expr via AssignStatement).
Architectural rules (do not violate casually)
Single source of truth for keywords
Identifiers are classified in the lexer viaKEYWORDSintokens.py. Adding a reserved word means adding both aTokenTypeand aKEYWORDSentry.AST shape
Nodes are plain dataclasses with optional fields defaulting toNoneor empty lists. Prefer new node types over overloading existing ones with sentinel meanings unless the language truly stays uniform.Parser errors are non-fatal lists
Parser.errorscollects strings;parse_programstill returns aProgramwhen possible.main.runprints errors and skips evaluation—mirror this if you add alternate entrypoints.Evaluation returns
MonkeyObjectmonkey_evalalways returns someMonkeyObject(oftenNULL). UseErrorObjfor user-visible failures; useReturnValueonly to implementreturnunwinding.Environment chain
Lexical scoping:Environment(outer=...)for function bodies and method bodies. Builtins are not stored in every env—they are resolved in_eval_identifierafter env lookup fails.Closures
FunctionObjstores the definingenv. Applying a function createsextended_env = Environment(outer=fn.env)and binds parameters there.
Lexer conventions (lexer.py)
- State:
source,pos, current charch._read_charadvances;_peek_chardoes not consume. - Whitespace: skipped between tokens via
_skip_whitespace(space, tab, newlines). - Identifiers: start with letter or
_, continue with alphanumeric/_; look upKEYWORDSto distinguish keywords fromIDENT. - Numbers: contiguous digits only (no floats in this language).
- Strings: double quotes
"...";_read_stringreturns unescaped content between quotes (no escape sequences in the lexer as written—be explicit if you add them). - Two-char operators:
==,!=,<=,>=use_peek_charafter the first character. - Unknown char: emit
ILLEGALwith that character as literal and advance.
When adding a token: implement the branch in next_token, add TokenType, and if it is alphabetic, add to KEYWORDS only if it should be reserved.
AST conventions (ast_nodes.py)
- Base types:
Node→Expression/Statement. Everything is a dataclass for straightforward inspection and debugging. - Naming: AST types use suffixes
*Literal,*Expression,*Statement,BlockStatement,Program. - Class-related nodes:
ClassStatement,MethodDef,ThisExpression,MemberExpression,NewExpression,AssignStatement(forobj.prop = expr).
Keep fields typed with Optional[...] where the parser may leave them unset on error paths.
Parser conventions (parser.py)
- Initialization: Reads two tokens (
cur_token,peek_token) so both are valid before parsing. - Statements:
_parse_statementswitches onLET,RETURN,CLASS; otherwise expression statement (possibly rewritten toAssignStatement). - Expressions: Pratt parsing with
_parse_expression(precedence),_parse_prefix, infix/postfix via precedence comparison withpeek_token. _expect_peek(t): Ifpeek_tokenist, advances and returnsTrue; else records error and returnsFalse. Many statement parsers returnNoneon failure—do not swallow errors silently without appending toerrors.- Member / call / index: After parsing left-hand expression, higher-precedence operators attach
CallExpression,IndexExpression, orMemberExpressiondepending onpeek_token.
Critical: The while loop in _parse_expression explicitly allows only certain peek_token types for infix continuation. New infix operators must be added to that set and to PRECEDENCES.
Precedence names (LOWEST … MEMBER) are defined at module top; see reference.md for the full table.
Evaluator conventions (evaluator.py)
Runtime types
- Subclass
MonkeyObjectfor every runtime value. Prefer__slots__on concrete classes for clarity and small footprint. - Singletons:
TRUE,FALSE,NULL—use identity where the code comparesobj is TRUEetc. - String form:
__repr__of objects is what the REPL prints when a line evaluates to that value.
Core dispatch
monkey_eval is a large isinstance chain. When adding a node type:
- Add the AST class in
ast_nodes.py. - Add a branch in
monkey_eval(and helper functions if the case is large). - Ensure errors from subexpressions propagate (
ErrorObjshort-circuit).
Control flow
- Program vs block:
_eval_programunwrapsReturnValueto the bare value at top level;_eval_blockpropagatesReturnValueandErrorObjwithout unwrapping return at block end inside functions. if:IfExpressionevaluates condition with_is_truthy, then consequence or alternative block.
Functions and calls
- Literals
fn(...) { ... }produceFunctionObjwith currentenv. _apply_function: ForFunctionObj, extend env with parameters, eval body; if result isReturnValue, return.value. ForBuiltinObj, call the Pythonfn.
Arrays and index
- Literals build
ArrayObj; index requiresIntegerObjindex in range—otherwiseNULL(no out-of-range error).
Classes (high level)
ClassStatementbuildsClassObjwith method tablename → (params, body)andenv= defining environment.new ClassName(args)constructsInstanceObj, optionally runsinitwiththisbound.MemberExpressionreads instance properties or returns a bound method asFunctionObjwiththisin env.- See reference.md for precise
thisandinitbehavior.
Builtins
Register in BUILTINS dict mapping name → BuiltinObj(callable). Implement arity and type checks inside the Python function; return ErrorObj on misuse.
Style and quality bar for new code
- Match existing imports: top-level
from tokens import ...,from ast_nodes import ...with explicit lists where the file already does. - Naming: snake_case functions, PascalCase classes, short private helpers
_eval_*. - No gratuitous abstractions: avoid introducing a visitor framework or IR unless the user asks; the codebase is intentionally direct.
- Comments: Only where non-obvious (e.g. parser quirks). Do not add banner comments or restate the language spec in every file.
- Tests: There is no automated test runner in-repo; if you add tests, prefer a small
pytestorunittestmodule without changing the defaultmain.pybehavior unless requested.
End-to-end workflow for a new language feature
- Specify behavior on paper (syntax, evaluation order, errors).
- Tokens:
TokenType, lexer branch,KEYWORDSif reserved. - AST: new dataclasses; wire
parserimports. - Parser: statement and/or prefix/infix parsing; update precedence and infix whitelist.
- Evaluator:
monkey_eval+ helpers; extendMonkeyObjectif new runtime type. - Example: add a line or function to
examples.monkeydemonstrating the feature. - Manual run:
python main.py examples.monkeyand REPL spot checks.
Pitfalls specific to this codebase
- Semicolons: Often optional; parser looks at
peek_tokenforSEMICOLONbut does not require it everywhere. Do not assume C-style mandatory semicolons. ifas expression: Branches are blocks; the value of anifexpression is the value of the evaluated block (last statement’s value in expression-oriented blocks).- Division by zero: Integer
/and%returnErrorObj("division by zero"). - Method lookup vs field: Instance
propertiesshadow nothing; method binding uses class methods when property not set—see evaluator. - Parser loop whitelist: Forgetting to add a new infix token to
_parse_expression’swhileloop causes the new operator to be ignored (parse ends early).
Additional documentation
- For deeper detail: reference.md (precedence matrix, runtime tables, class semantics, builtins).
- Original notes and intent:
PROMPT.mdin the project root (not authoritative for implementation details).