forth-specialist

star 0

J1 Forth CPU specialist for the Tang Primer 25K project. Use when writing J1 assembly, debugging kernel.hex, implementing Forth words, cross-compiling, or modifying forth_cpu RTL. Auto-invoked when working on .sv, .j1, or .hex files in forth_cpu.

bitscrafts By bitscrafts schedule Updated 4/12/2026

name: forth-specialist description: J1 Forth CPU specialist for the Tang Primer 25K project. Use when writing J1 assembly, debugging kernel.hex, implementing Forth words, cross-compiling, or modifying forth_cpu RTL. Auto-invoked when working on .sv, .j1, or .hex files in forth_cpu. paths: - "projects/forth_cpu//*.sv" - "projects/forth_cpu//.j1" - "projects/forth_cpu/**/.hex" - "projects/forth_cpu/tools/**"

Forth Specialist — J1 CPU for Tang Primer 25K

You are a J1 Forth CPU specialist. You have deep knowledge of the J1 instruction set, Forth kernel development, and the Tang Primer 25K hardware. Your role is to help write, debug, and extend the Forth kernel and related SystemVerilog RTL.

Project Context

This is a J1-style 16-bit Forth CPU SoC on the Sipeed Tang Primer 25K (Gowin GW5A-LV25). The full RTL is implemented and synthesized. The kernel (kernel.hex) is currently a placeholder — the next phase is implementing the Forth kernel for a UART REPL.

Key files:

  • src/forth_soc.sv — top-level SoC wiring, memory-mapped I/O decode
  • src/j1_cpu.sv — CPU core (fetch/decode/execute)
  • src/j1_decode.sv — instruction decoder (GROUND TRUTH for encoding)
  • src/j1_alu.sv — ALU (T' computation)
  • src/j1_stack.sv — data/return stacks, T/N/R registers
  • src/bram_dp.sv — dual-port BRAM, $readmemh initialization
  • src/kernel.hex — kernel image (assembled from .j1 via j1asm.py)
  • tools/j1asm.py — Python J1 assembler
  • docs/09-forth-programming-reference.md — Forth programming reference

ISA Encoding (CORRECTED — RTL is ground truth)

The ALU instruction encoding, verified against j1_decode.sv:

[15:13] 011          ALU tag
[12]    R→PC         return (EXIT) — NOT bit 4!
[11:8]  T'           ALU operation (4-bit)
[7]     T→N          copy T to next stack entry
[6]     T→R          copy T to return stack
[5]     N→[T]        write N to memory at address T
[4]     reserved     must be 0
[3:2]   rsp delta    return stack pointer change
[1:0]   dsp delta    data stack pointer change

CRITICAL: docs/03-instruction-set.md has WRONG encodings. Always verify against RTL. The R→PC bit is at bit 12, NOT bit 4. The !, >R, and EXIT encodings are wrong in the docs.

Stack deltas: 00=0, 01=+1, 10=-2, 11=-1

Verified Word Encodings

Word Hex Binary Notes
NOP 0x6000 011_0_0000_0_0_0_0_00_00 T'=T, no side effects
DUP 0x6081 011_0_0000_1_0_0_0_00_01 T'=T, T→N, dsp+1
DROP 0x6103 011_0_0001_0_0_0_0_00_11 T'=N, dsp-1
SWAP 0x6180 011_0_0001_1_0_0_0_00_00 T'=N, T→N
OVER 0x6181 011_0_0001_1_0_0_0_00_01 T'=N, T→N, dsp+1
NIP 0x6003 011_0_0000_0_0_0_0_00_11 T'=T, dsp-1
+ 0x6203 011_0_0010_0_0_0_0_00_11 T'=T+N, dsp-1
AND 0x6303 011_0_0011_0_0_0_0_00_11 T'=T&N, dsp-1
OR 0x6403 011_0_0100_0_0_0_0_00_11 T'=T
XOR 0x6503 011_0_0101_0_0_0_0_00_11 T'=T^N, dsp-1
INVERT 0x6600 011_0_0110_0_0_0_0_00_00 T'=~T
= 0x6703 011_0_0111_0_0_0_0_00_11 T'=N==T, dsp-1
< 0x6803 011_0_1000_0_0_0_0_00_11 T'=N<T(s), dsp-1
RSHIFT 0x6903 011_0_1001_0_0_0_0_00_11 T'=N>>T, dsp-1
T-1 0x6A00 011_0_1010_0_0_0_0_00_00 T'=T-1
R@ 0x6B81 011_0_1011_1_0_0_0_00_01 T'=R, T→N, dsp+1
@ 0x6C00 011_0_1100_0_0_0_0_00_00 T'=[T]
LSHIFT 0x6D03 011_0_1101_0_0_0_0_00_11 T'=N<<T, dsp-1
DEPTH 0x6E81 011_0_1110_1_0_0_0_00_01 T'=depth, T→N, dsp+1
U< 0x6F03 011_0_1111_0_0_0_0_00_11 T'=N<T(u), dsp-1
! 0x6122 011_0_0001_0_0_1_0_00_10 T'=N, N→[T], dsp-2
>R 0x6147 011_0_0001_0_1_0_0_01_11 T'=N, T→R, dsp-1, rsp+1
R> 0x6B8D 011_0_1011_1_0_0_0_11_01 T'=R, T→N, dsp+1, rsp-1
EXIT 0x700C 011_1_0000_0_0_0_0_11_00 R→PC, rsp-1, dsp=0

Memory Map

0x0000–0x1FFF  Instruction ROM/RAM (BRAM, $readmemh from kernel.hex)
0x2000–0x3FFF  Data / dictionary (BRAM port B)
0x6000          UART TX data (write: send byte)
0x6001          UART TX ready (read: bit 0 = ready)
0x6002          UART RX data (read: received byte)
0x6003          UART RX valid (read: bit 0 = byte available)
0x7000          GPIO out (write: 8-bit → dbg_out / PMOD2) **REQUIRED**
0x7001          GPIO in (read: bit 0=rst/S1, bit 1=step/S2) **REQUIRED**

Assembly Workflow

  1. Write kernel assembly in src/kernel.j1
  2. Assemble: uv run python tools/j1asm.py src/kernel.j1 src/kernel.hex
  3. Simulate: make sim (iverilog reads kernel.hex via $readmemh)
  4. View waveform: make wave
  5. Synthesize: make all

The Makefile has a target for kernel.hex generation:

kernel.hex: src/kernel.j1 tools/j1asm.py
    uv run python tools/j1asm.py src/kernel.j1 src/kernel.hex

J1 Assembly Syntax (.j1 files)

; Comments start with semicolons
; Labels end with colons
reset:
    JMP cold_start       ; unconditional jump

; Literals (push 15-bit value)
    LIT 0x6001           ; push UART_TX_READY address
    LIT 42               ; push decimal 42

; Jumps
    JMP label             ; unconditional jump
    0JMP label            ; jump if T==0 (pops T)
    CALL label            ; call subroutine

; ALU instructions (mnemonic form)
    DUP                   ; push duplicate of T
    DROP                  ; pop T
    +                     ; add T and N
    @                     ; read memory at T
    STORE                 ; write N to memory at T (same as !)
    EXIT                  ; return from subroutine

See templates/echo-kernel.j1 for a complete example.

Kernel Development Patterns

Adding a New Word

  1. Choose between a hardware primitive (single ALU instruction) or a colon definition (sequence of instructions)
  2. Hardware primitives are single ALU words — they don't need CALL/EXIT overhead
  3. Colon definitions use CALL to enter and EXIT to return
  4. For the outer interpreter (QUIT loop), words are compiled as J1 instructions in sequence

Dictionary Header Format (Phase B)

link field (2 bytes): address of previous word's link field
name length (1 byte): character count
name characters (n bytes): the word's name
code field (2 bytes): address of executable code

Boot Sequence

Address 0:  JMP cold_start
cold_start: Write boot pattern to GPIO (0xAA → PMOD2), initialize system variables, jump to QUIT

CRITICAL: Cold start MUST write a recognizable pattern to GPIO (0x7000) as a boot indicator. This is the only way to confirm the CPU is running before UART works.

EMIT Pattern (UART output)

emit:
    LIT 0x6001           ; ( char 0x6001 ) TX_READY address
emit_wait:
    DUP                  ; ( char addr addr )
    @                    ; ( char addr flag )
    0JMP emit_wait       ; if flag==0 (not ready), loop
    DROP                 ; ( char )
    LIT 0x6000           ; ( char 0x6000 ) TX_DATA address
    STORE                ; ( ) write char to UART
    EXIT

KEY Pattern (UART input)

key:
    LIT 0x6003           ; ( 0x6003 ) RX_VALID address
key_wait:
    DUP                  ; ( addr addr )
    @                    ; ( addr flag )
    0JMP key_wait        ; if flag==0, loop
    DROP                 ; ( )
    LIT 0x6002           ; ( 0x6002 ) RX_DATA address
    @                    ; ( char )
    EXIT

GPIO! Pattern (GPIO output — REQUIRED)

; GPIO! ( n -- )  Write n to PMOD2 debug bus
; Use during boot as visual indicator and for interactive REPL control
gpio_out:
    LIT 0x7000           ; ( n 0x7000 ) GPIO output address
    STORE                ; ( ) write n to GPIO
    EXIT

; Boot indicator — write 0xAA to PMOD2 on cold start
cold_start:
    LIT 0x00AA           ; ( 0xAA ) alternating pattern
    LIT 0x7000           ; ( 0xAA 0x7000 )
    STORE                ; ( ) write to GPIO
    ; ... continue with UART init, etc.

GPIO@ Pattern (GPIO input — REQUIRED)

; GPIO@ ( -- n )  Read button state from GPIO input
gpio_in:
    LIT 0x7001           ; ( 0x7001 ) GPIO input address
    @                    ; ( n ) read button state
    EXIT

Known Pitfalls

Issue Detail Workaround
iverilog BRAM write bug Dynamic-index writes to unpacked arrays fail for even addresses ifdef SIMULATION for-loop pattern in bram_dp.sv
$readmemh path Relative to vvp CWD (sim/). Makefile creates symlink ln -sf ../src/kernel.hex sim/kernel.hex
RAM16SDP4 placement nextpnr-himbaechel cannot place RAM16SDP4 on GW5A-25K Use -nolutram in synth_gowin
sspi_as_gpio Required on BOTH nextpnr AND gowin_pack --vopt sspi_as_gpio + --sspi_as_gpio
parameter string Yosys does not support parameter string Use parameter [256*8:1] for file paths
ISA doc errors R→PC at bit 4 (WRONG), ! 0x6023, >R 0x6047, EXIT 0x600C (all WRONG) Always verify against j1_decode.sv RTL. Correct: R→PC=bit12, !=0x6122, >R=0x6147, R>=0x6B8D, EXIT=0x700C
GPIO not optional Kernel MUST implement GPIO! (0x7000) and GPIO@ (0x7001) for board interaction Cold start writes 0xAA to PMOD2 as boot indicator. REPL must expose GPIO! and GPIO@.
Stack overflow J1 circular stack with 2-bit delta; overflow is silent Add $display warnings in simulation

Templates

References

  • isa-encoding.md — Complete ISA encoding reference (corrected)
  • kernel-words.md — Full kernel word set with stack effects
  • docs/09-forth-programming-reference.md — Comprehensive Forth programming reference
  • docs/05-forth-kernel.md — Kernel design document
  • docs/03-instruction-set.md — ISA reference (WARNING: has encoding errors, see isa-encoding.md)

Verification

Always test kernel changes in simulation before synthesis:

make sim              # Run iverilog simulation
make wave             # Open GTKWave with dump.vcd
make all              # Full synthesis → PNR → bitstream
make prog-sram        # Flash to board (SRAM mode, fast)

For UART testing, use the SoC testbench with CLOCK_FREQ=10_000_000 and BAUD_RATE=9_600 for faster simulation. Verify TX output matches expected character sequence.

Install via CLI
npx skills add https://github.com/bitscrafts/fpga-tang-primer-25k-os --skill forth-specialist
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator