mlx-bitnet-gf3 - SKILL.md Agent Skill

name: mlx-bitnet-gf3 description: 'BitNet 1.58-bit uses ternary weights {-1, 0, +1} — identical to GF(3) trits:'

mlx-bitnet-gf3 Skill

1.58-bit LLMs on Apple Silicon with GF(3) color integration

Trit: +1 (PLUS - generative)
Color: #38E6AF
Source: exo-explore/mlx-bitnet + Gay.jl

Overview

BitNet 1.58-bit uses ternary weights {-1, 0, +1} — identical to GF(3) trits:

BitNet Weight   GF(3) Trit   Role
────────────────────────────────────
    -1          MINUS        Inhibitory / Validator
     0          ERGODIC      Neutral / Coordinator
    +1          PLUS         Excitatory / Generator

This skill bridges:

MLX BitNet: 1.58-bit inference on Apple Silicon
Gay.jl: Deterministic color generation with GF(3) conservation
QAT: Quantization-Aware Training for ternary networks

Installation

# Clone mlx-bitnet
git clone https://github.com/exo-explore/mlx-bitnet
cd mlx-bitnet
pip install -r requirements.txt

# Convert weights
python convert.py

# Verify
python test_interop.py

Models

Model	Source	Size
`1bitLLM/bitnet_b1_58-large`	HuggingFace	~3B
`NousResearch/OLMo-Bitnet-1B`	HuggingFace	1B

GF(3) Weight Visualization

Color each weight by its trit value:

import mlx.core as mx
from gay_mcp import color_at, seed

def visualize_layer_gf3(weights: mx.array, gay_seed: int = 1069):
    """Color ternary weights using Gay.jl deterministic colors."""
    seed(gay_seed)
    
    trit_colors = {
        -1: color_at(gay_seed, 1),  # MINUS color
         0: color_at(gay_seed, 2),  # ERGODIC color
        +1: color_at(gay_seed, 3),  # PLUS color
    }
    
    # Count trit distribution
    counts = {
        -1: int((weights == -1).sum()),
         0: int((weights == 0).sum()),
        +1: int((weights == 1).sum()),
    }
    
    # Verify GF(3) conservation (sum should be ≈ 0 for balanced layers)
    trit_sum = counts[-1] * (-1) + counts[0] * 0 + counts[+1] * 1
    gf3_residue = trit_sum % 3
    
    return {
        "colors": trit_colors,
        "counts": counts,
        "trit_sum": trit_sum,
        "gf3_residue": gf3_residue,
        "conserved": gf3_residue == 0
    }

Quantization-Aware Training (QAT)

Forward Pass with Ternary Quantization

import mlx.core as mx
import mlx.nn as nn

def quantize_ternary(w: mx.array) -> tuple[mx.array, mx.array]:
    """Quantize weights to {-1, 0, +1} with learned scale."""
    scale = mx.abs(w).mean()
    w_normalized = w / (scale + 1e-8)
    w_ternary = mx.clip(mx.round(w_normalized), -1, 1)
    return w_ternary, scale

class BitLinear(nn.Module):
    """1.58-bit linear layer with QAT."""
    
    def __init__(self, in_features: int, out_features: int):
        super().__init__()
        # Latent weights in full precision (for training)
        self.w_latent = mx.random.normal((out_features, in_features)) * 0.02
        self.scale = mx.ones((1,))
    
    def __call__(self, x: mx.array) -> mx.array:
        # Quantize to ternary during forward
        w_ternary, scale = quantize_ternary(self.w_latent)
        
        # Matrix multiply with ternary weights
        # Note: ternary matmul can be optimized to additions only!
        return x @ (w_ternary.T * scale)
    
    def gf3_stats(self) -> dict:
        """Get GF(3) trit distribution."""
        w_ternary, _ = quantize_ternary(self.w_latent)
        return {
            "minus": int((w_ternary == -1).sum()),
            "ergodic": int((w_ternary == 0).sum()),
            "plus": int((w_ternary == 1).sum()),
        }

Straight-Through Estimator (STE)

def ste_ternary(w_latent: mx.array) -> mx.array:
    """STE: forward uses quantized, backward uses latent."""
    w_ternary, scale = quantize_ternary(w_latent)
    
    # Stop gradient on quantization, pass through on latent
    return mx.stop_gradient(w_ternary - w_latent) + w_latent

Color-Coded Layer Analysis

def analyze_model_gf3(model, seed: int = 1069):
    """Analyze entire BitNet model through GF(3) lens."""
    from gay_mcp import palette
    
    colors = palette(3, seed=seed)
    layer_stats = []
    
    for name, layer in model.named_modules():
        if hasattr(layer, 'gf3_stats'):
            stats = layer.gf3_stats()
            total = sum(stats.values())
            
            layer_stats.append({
                "name": name,
                "minus_pct": stats["minus"] / total * 100,
                "ergodic_pct": stats["ergodic"] / total * 100,
                "plus_pct": stats["plus"] / total * 100,
                "gf3_sum": -stats["minus"] + stats["plus"],
                "conserved": (-stats["minus"] + stats["plus"]) % 3 == 0
            })
    
    return {
        "colors": {
            "minus": colors[0]["hex"],
            "ergodic": colors[1]["hex"],
            "plus": colors[2]["hex"],
        },
        "layers": layer_stats
    }

Inference with Color Tracing

def generate_with_color_trace(model, tokenizer, prompt: str, max_tokens: int = 100):
    """Generate text while tracing GF(3) activations."""
    from gay_mcp import next_color
    
    tokens = tokenizer.encode(prompt)
    trace = []
    
    for i in range(max_tokens):
        # Forward pass
        logits = model(mx.array([tokens]))
        
        # Sample next token
        next_token = int(mx.argmax(logits[:, -1, :], axis=-1))
        tokens.append(next_token)
        
        # Color this generation step
        step_color = next_color()
        trace.append({
            "step": i,
            "token": tokenizer.decode([next_token]),
            "color": step_color["hex"],
            "trit": step_color["trit"]
        })
        
        if next_token == tokenizer.eos_token_id:
            break
    
    # Verify GF(3) conservation across trace
    trit_sum = sum(t["trit"] for t in trace)
    
    return {
        "output": tokenizer.decode(tokens),
        "trace": trace,
        "trit_sum": trit_sum,
        "gf3_conserved": trit_sum % 3 == 0
    }

World → World' Morphism Integration

BitNet weight updates during training are World → World' morphisms:

def training_step_as_morphism(model, batch, lr: float = 1e-4):
    """Each training step is a World → World' transition."""
    from gay_mcp import gay_seed, color_at
    
    # World state before update
    world_before = {
        name: layer.w_latent.copy() 
        for name, layer in model.named_modules() 
        if hasattr(layer, 'w_latent')
    }
    
    # Forward + backward + update
    loss = compute_loss(model, batch)
    loss.backward()
    
    for name, layer in model.named_modules():
        if hasattr(layer, 'w_latent'):
            layer.w_latent -= lr * layer.w_latent.grad
    
    # World state after update
    world_after = {
        name: layer.w_latent.copy()
        for name, layer in model.named_modules()
        if hasattr(layer, 'w_latent')
    }
    
    # Compute WEV (World Extractable Value) = improvement
    wev = float(loss)  # Lower loss = more value extracted
    
    # Color the morphism by seed
    morphism_color = color_at(1069, hash(str(batch)) % 1000)
    
    return {
        "world_before": world_before,
        "world_after": world_after,
        "loss": float(loss),
        "wev": wev,
        "morphism_color": morphism_color
    }

GF(3) Balanced Triads

mlx-bitnet-gf3 (+1) ⊗ worlding (-1) ⊗ geb (0) = 0 ✓
mlx-bitnet-gf3 (+1) ⊗ unworlding-involution (+1) ⊗ ? (-2) = 0

Skill Neighborhood

Skill	Trit	Relationship
mlx-apple-silicon	0	MLX runtime
mlx-jax-splitmix	0	Deterministic RNG
discrete-backprop	-1	Gradient-free ternary
forward-forward-learning	+1	Local learning
gay-mcp	+1	Color generation

Commands

# Run inference with GF(3) trace
just bitnet-infer "Hello world" --trace-gf3

# Analyze layer trit distribution
just bitnet-analyze model.safetensors

# Train with QAT
just bitnet-train --dataset data.jsonl --qat

# Visualize weights as colors
just bitnet-viz layer_0.weights.png

References

Verified Outputs (Thread T-019ba0f4)

BitNet 1.58-bit Running on Apple Silicon

Model: mlx-community/bitnet-b1.58-2B-4T-4bit
Architecture: LlamaModel, 30 layers
Speed: 96.1 tokens/sec
Memory: 0.76 GB
Weights: ternary {-1, 0, +1} = GF(3) trits

GF(3) Color Trace (seed 1069)

============================================================
BitNet 1.58-bit + GF(3) Color Trace (seed 1069)
============================================================

  [-] #6404C3 A
  [-] #EB6AF7 world
  [-] #7F1747 morphism,
  [+] #4A4744 often
  [o] #3AC4D6 denoted
  [-] #6D2BEE as
  [o] #6BCECC φ,
  [-] #3B194C is
  [+] #9DA895 a
  [o] #1C45D5 concept
  [o] #4ED072 used
  [o] #1F3EA5 in

GF(3) trit sum: -3 mod 3 = 0 ✓ CONSERVED

Model Q&A Responses

Q: In one sentence: GF(3) is
A: GF(3) is a finite field with three elements, often denoted as 
   GF(3^2) or F_3^2, which is used in various areas of mathematics 
   and computer science.

Q: Ternary neural network weights mean
A: In the context of a ternary neural network, the weights refer to 
   the parameters that are adjusted during the training process...

Q: World morphism φ: W → W' is
A: A world morphism, often denoted as φ, is a concept used in modal 
   logic and model theory to describe how a structure (or a world) 
   can be transformed in...

Quick Run Commands

# One-liner: Run BitNet with GF(3) trace
uvx --with mlx --with mlx-lm --with huggingface_hub python << 'EOF'
from mlx_lm import load, generate

model, tokenizer = load('mlx-community/bitnet-b1.58-2B-4T-4bit')
prompt = "GF(3) means"
messages = [{"role": "user", "content": prompt}]
formatted = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=formatted, max_tokens=50, verbose=True)
EOF

Complete Integration Summary

Component	Value
Model	`mlx-community/bitnet-b1.58-2B-4T-4bit`
Architecture	LlamaModel, 30 layers
Speed	96 tokens/sec
Memory	0.76 GB
Bits/weight	1.58 = log₂(3)
Weight values	{-1, 0, +1} = GF(3) trits
Seed	1069 (deterministic)
GF(3) Conservation	✓ Verified (sum mod 3 = 0)

World Morphism Bridge

φ  : World  → World'   (geb + world-hopping + anoma-intents)
φ⁻¹: World' → World    (unworlding-involution + worlding)

BitNet training step = GF(3)-conserving World morphism
Each weight update: trit_before → trit_after with Σ ≡ 0 (mod 3)

Loaded Skills (GF(3) Balanced)

Skill	Trit	Color	Purpose
world-hopping	0	`#A590DA`	Kripke/Badiou accessibility
worlding	-1	`#633ECE`	`world_` persistent state
world-extractable-value	0	`#98DE8E`	WEV = PoA - 1
geb	+1	`#57DBC0`	Categorical morphism semantics
unworlding-involution	+1	`#84A1DF`	ι∘ι = id (inverse)
mlx-bitnet-gf3	+1	`#38E6AF`	1.58-bit = GF(3) trit

Trit: +1 (PLUS - generative)
Key Insight: 1.58-bit weights ARE GF(3) trits — log₂(3) ≈ 1.58
Thread: T-019ba0f4-31a2-77bd-b442-79a0944f3caa