name: mlx-bitnet-gf3 description: 'BitNet 1.58-bit uses ternary weights {-1, 0, +1} — identical to GF(3) trits:'
mlx-bitnet-gf3 Skill
1.58-bit LLMs on Apple Silicon with GF(3) color integration
Trit: +1 (PLUS - generative)
Color: #38E6AF
Source: exo-explore/mlx-bitnet + Gay.jl
Overview
BitNet 1.58-bit uses ternary weights {-1, 0, +1} — identical to GF(3) trits:
BitNet Weight GF(3) Trit Role
────────────────────────────────────
-1 MINUS Inhibitory / Validator
0 ERGODIC Neutral / Coordinator
+1 PLUS Excitatory / Generator
This skill bridges:
- MLX BitNet: 1.58-bit inference on Apple Silicon
- Gay.jl: Deterministic color generation with GF(3) conservation
- QAT: Quantization-Aware Training for ternary networks
Installation
# Clone mlx-bitnet
git clone https://github.com/exo-explore/mlx-bitnet
cd mlx-bitnet
pip install -r requirements.txt
# Convert weights
python convert.py
# Verify
python test_interop.py
Models
| Model | Source | Size |
|---|---|---|
1bitLLM/bitnet_b1_58-large |
HuggingFace | ~3B |
NousResearch/OLMo-Bitnet-1B |
HuggingFace | 1B |
GF(3) Weight Visualization
Color each weight by its trit value:
import mlx.core as mx
from gay_mcp import color_at, seed
def visualize_layer_gf3(weights: mx.array, gay_seed: int = 1069):
"""Color ternary weights using Gay.jl deterministic colors."""
seed(gay_seed)
trit_colors = {
-1: color_at(gay_seed, 1), # MINUS color
0: color_at(gay_seed, 2), # ERGODIC color
+1: color_at(gay_seed, 3), # PLUS color
}
# Count trit distribution
counts = {
-1: int((weights == -1).sum()),
0: int((weights == 0).sum()),
+1: int((weights == 1).sum()),
}
# Verify GF(3) conservation (sum should be ≈ 0 for balanced layers)
trit_sum = counts[-1] * (-1) + counts[0] * 0 + counts[+1] * 1
gf3_residue = trit_sum % 3
return {
"colors": trit_colors,
"counts": counts,
"trit_sum": trit_sum,
"gf3_residue": gf3_residue,
"conserved": gf3_residue == 0
}
Quantization-Aware Training (QAT)
Forward Pass with Ternary Quantization
import mlx.core as mx
import mlx.nn as nn
def quantize_ternary(w: mx.array) -> tuple[mx.array, mx.array]:
"""Quantize weights to {-1, 0, +1} with learned scale."""
scale = mx.abs(w).mean()
w_normalized = w / (scale + 1e-8)
w_ternary = mx.clip(mx.round(w_normalized), -1, 1)
return w_ternary, scale
class BitLinear(nn.Module):
"""1.58-bit linear layer with QAT."""
def __init__(self, in_features: int, out_features: int):
super().__init__()
# Latent weights in full precision (for training)
self.w_latent = mx.random.normal((out_features, in_features)) * 0.02
self.scale = mx.ones((1,))
def __call__(self, x: mx.array) -> mx.array:
# Quantize to ternary during forward
w_ternary, scale = quantize_ternary(self.w_latent)
# Matrix multiply with ternary weights
# Note: ternary matmul can be optimized to additions only!
return x @ (w_ternary.T * scale)
def gf3_stats(self) -> dict:
"""Get GF(3) trit distribution."""
w_ternary, _ = quantize_ternary(self.w_latent)
return {
"minus": int((w_ternary == -1).sum()),
"ergodic": int((w_ternary == 0).sum()),
"plus": int((w_ternary == 1).sum()),
}
Straight-Through Estimator (STE)
def ste_ternary(w_latent: mx.array) -> mx.array:
"""STE: forward uses quantized, backward uses latent."""
w_ternary, scale = quantize_ternary(w_latent)
# Stop gradient on quantization, pass through on latent
return mx.stop_gradient(w_ternary - w_latent) + w_latent
Color-Coded Layer Analysis
def analyze_model_gf3(model, seed: int = 1069):
"""Analyze entire BitNet model through GF(3) lens."""
from gay_mcp import palette
colors = palette(3, seed=seed)
layer_stats = []
for name, layer in model.named_modules():
if hasattr(layer, 'gf3_stats'):
stats = layer.gf3_stats()
total = sum(stats.values())
layer_stats.append({
"name": name,
"minus_pct": stats["minus"] / total * 100,
"ergodic_pct": stats["ergodic"] / total * 100,
"plus_pct": stats["plus"] / total * 100,
"gf3_sum": -stats["minus"] + stats["plus"],
"conserved": (-stats["minus"] + stats["plus"]) % 3 == 0
})
return {
"colors": {
"minus": colors[0]["hex"],
"ergodic": colors[1]["hex"],
"plus": colors[2]["hex"],
},
"layers": layer_stats
}
Inference with Color Tracing
def generate_with_color_trace(model, tokenizer, prompt: str, max_tokens: int = 100):
"""Generate text while tracing GF(3) activations."""
from gay_mcp import next_color
tokens = tokenizer.encode(prompt)
trace = []
for i in range(max_tokens):
# Forward pass
logits = model(mx.array([tokens]))
# Sample next token
next_token = int(mx.argmax(logits[:, -1, :], axis=-1))
tokens.append(next_token)
# Color this generation step
step_color = next_color()
trace.append({
"step": i,
"token": tokenizer.decode([next_token]),
"color": step_color["hex"],
"trit": step_color["trit"]
})
if next_token == tokenizer.eos_token_id:
break
# Verify GF(3) conservation across trace
trit_sum = sum(t["trit"] for t in trace)
return {
"output": tokenizer.decode(tokens),
"trace": trace,
"trit_sum": trit_sum,
"gf3_conserved": trit_sum % 3 == 0
}
World → World' Morphism Integration
BitNet weight updates during training are World → World' morphisms:
def training_step_as_morphism(model, batch, lr: float = 1e-4):
"""Each training step is a World → World' transition."""
from gay_mcp import gay_seed, color_at
# World state before update
world_before = {
name: layer.w_latent.copy()
for name, layer in model.named_modules()
if hasattr(layer, 'w_latent')
}
# Forward + backward + update
loss = compute_loss(model, batch)
loss.backward()
for name, layer in model.named_modules():
if hasattr(layer, 'w_latent'):
layer.w_latent -= lr * layer.w_latent.grad
# World state after update
world_after = {
name: layer.w_latent.copy()
for name, layer in model.named_modules()
if hasattr(layer, 'w_latent')
}
# Compute WEV (World Extractable Value) = improvement
wev = float(loss) # Lower loss = more value extracted
# Color the morphism by seed
morphism_color = color_at(1069, hash(str(batch)) % 1000)
return {
"world_before": world_before,
"world_after": world_after,
"loss": float(loss),
"wev": wev,
"morphism_color": morphism_color
}
GF(3) Balanced Triads
mlx-bitnet-gf3 (+1) ⊗ worlding (-1) ⊗ geb (0) = 0 ✓
mlx-bitnet-gf3 (+1) ⊗ unworlding-involution (+1) ⊗ ? (-2) = 0
Skill Neighborhood
| Skill | Trit | Relationship |
|---|---|---|
| mlx-apple-silicon | 0 | MLX runtime |
| mlx-jax-splitmix | 0 | Deterministic RNG |
| discrete-backprop | -1 | Gradient-free ternary |
| forward-forward-learning | +1 | Local learning |
| gay-mcp | +1 | Color generation |
Commands
# Run inference with GF(3) trace
just bitnet-infer "Hello world" --trace-gf3
# Analyze layer trit distribution
just bitnet-analyze model.safetensors
# Train with QAT
just bitnet-train --dataset data.jsonl --qat
# Visualize weights as colors
just bitnet-viz layer_0.weights.png
References
- BitNet: Scaling 1-bit Transformers
- The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
- exo-explore/mlx-bitnet
- 1bitLLM on HuggingFace
- Gay.jl GF(3) Documentation
Verified Outputs (Thread T-019ba0f4)
BitNet 1.58-bit Running on Apple Silicon
Model: mlx-community/bitnet-b1.58-2B-4T-4bit
Architecture: LlamaModel, 30 layers
Speed: 96.1 tokens/sec
Memory: 0.76 GB
Weights: ternary {-1, 0, +1} = GF(3) trits
GF(3) Color Trace (seed 1069)
============================================================
BitNet 1.58-bit + GF(3) Color Trace (seed 1069)
============================================================
[-] #6404C3 A
[-] #EB6AF7 world
[-] #7F1747 morphism,
[+] #4A4744 often
[o] #3AC4D6 denoted
[-] #6D2BEE as
[o] #6BCECC φ,
[-] #3B194C is
[+] #9DA895 a
[o] #1C45D5 concept
[o] #4ED072 used
[o] #1F3EA5 in
GF(3) trit sum: -3 mod 3 = 0 ✓ CONSERVED
Model Q&A Responses
Q: In one sentence: GF(3) is
A: GF(3) is a finite field with three elements, often denoted as
GF(3^2) or F_3^2, which is used in various areas of mathematics
and computer science.
Q: Ternary neural network weights mean
A: In the context of a ternary neural network, the weights refer to
the parameters that are adjusted during the training process...
Q: World morphism φ: W → W' is
A: A world morphism, often denoted as φ, is a concept used in modal
logic and model theory to describe how a structure (or a world)
can be transformed in...
Quick Run Commands
# One-liner: Run BitNet with GF(3) trace
uvx --with mlx --with mlx-lm --with huggingface_hub python << 'EOF'
from mlx_lm import load, generate
model, tokenizer = load('mlx-community/bitnet-b1.58-2B-4T-4bit')
prompt = "GF(3) means"
messages = [{"role": "user", "content": prompt}]
formatted = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=formatted, max_tokens=50, verbose=True)
EOF
Complete Integration Summary
| Component | Value |
|---|---|
| Model | mlx-community/bitnet-b1.58-2B-4T-4bit |
| Architecture | LlamaModel, 30 layers |
| Speed | 96 tokens/sec |
| Memory | 0.76 GB |
| Bits/weight | 1.58 = log₂(3) |
| Weight values | {-1, 0, +1} = GF(3) trits |
| Seed | 1069 (deterministic) |
| GF(3) Conservation | ✓ Verified (sum mod 3 = 0) |
World Morphism Bridge
φ : World → World' (geb + world-hopping + anoma-intents)
φ⁻¹: World' → World (unworlding-involution + worlding)
BitNet training step = GF(3)-conserving World morphism
Each weight update: trit_before → trit_after with Σ ≡ 0 (mod 3)
Loaded Skills (GF(3) Balanced)
| Skill | Trit | Color | Purpose |
|---|---|---|---|
| world-hopping | 0 | #A590DA |
Kripke/Badiou accessibility |
| worlding | -1 | #633ECE |
world_ persistent state |
| world-extractable-value | 0 | #98DE8E |
WEV = PoA - 1 |
| geb | +1 | #57DBC0 |
Categorical morphism semantics |
| unworlding-involution | +1 | #84A1DF |
ι∘ι = id (inverse) |
| mlx-bitnet-gf3 | +1 | #38E6AF |
1.58-bit = GF(3) trit |
Trit: +1 (PLUS - generative)
Key Insight: 1.58-bit weights ARE GF(3) trits — log₂(3) ≈ 1.58
Thread: T-019ba0f4-31a2-77bd-b442-79a0944f3caa