nousnet-distributed-training

star 1

Integrate the Psyche/nousnet distributed training protocol into Experience mesh for federated SLM training across untrusted LAN/internet peers. Apache 2.0, Rust-primary (77.8%). Key mechanisms: DisTrO optimizer (DCT-TopK sparse gradient compression), QUIC+gossip P2P via iroh, witness quorum integrity (bloom-filter proofs, 2/3 majority), Solana compute-point rewards. Use when planning federated training across the Experience mesh (e.g. 2x6GB GPU peers sharing training load for a Magicborn SLM). Foundation tier — this is a strategic platform capability, not a one-off script.

B2Gdevs By B2Gdevs schedule Updated 6/8/2026

name: nousnet-distributed-training description: >- Integrate the Psyche/nousnet distributed training protocol into Experience mesh for federated SLM training across untrusted LAN/internet peers. Apache 2.0, Rust-primary (77.8%). Key mechanisms: DisTrO optimizer (DCT-TopK sparse gradient compression), QUIC+gossip P2P via iroh, witness quorum integrity (bloom-filter proofs, 2/3 majority), Solana compute-point rewards. Use when planning federated training across the Experience mesh (e.g. 2x6GB GPU peers sharing training load for a Magicborn SLM). Foundation tier — this is a strategic platform capability, not a one-off script. status: stable lane: [slm, training, mesh, infra] type: platform-integration trigger: >- "nousnet", "psyche distributed training", "federated training", "DisTrO", "mesh training", "p2p training", "gradient compression", "experience training" source-ingest: .planning/ingest/2026-06-07-article-github-com-psychefoundation-nousnet.md port-score: strategic (Apache 2.0, active port candidate) argument-hint: --mode centralized|decentralized --peers

nousnet-distributed-training

Federated transformer training across Experience mesh peers. Based on PsycheFoundation/nousnet (Apache 2.0, Rust-primary).

Adoption type: wrap + integrate — Apache 2.0 allows direct use. Do NOT fork unless adding mesh-specific extensions (e.g. Experience economy token integration). Consume the published crate or build from source.

Architecture (Psyche/nousnet)

   Operator Peer A (6GB GPU)          Operator Peer B (6GB GPU)
   ┌─────────────────────────┐         ┌─────────────────────────┐
   │  Data shard (disjoint)  │         │  Data shard (disjoint)  │
   │  Forward + backward     │         │  Forward + backward     │
   │  DisTrO optimizer       │         │  DisTrO optimizer       │
   │  → sparse DCT result    │ ← iroh → │  → sparse DCT result   │
   └─────────────────────────┘   QUIC  └─────────────────────────┘
              │                                   │
              └──────────── Coordinator ──────────┘
                         (TCP or Solana)
                     Witness quorum (2/3 bloom-filter)

DisTrO optimizer (the key bandwidth savings)

Wire format per training round:

// crates/shared/modeling/src/distro.rs
SerializedDistroResult {
    sparse_idx: Vec<u32>,   // TopK DCT component indices
    sparse_val: Vec<f32>,   // TopK DCT component amplitudes
    xshape: Vec<u16>,       // original tensor shape
    totalk: u32,            // total K selected
}
TransmittableDistroResult {
    step: u64,
    trainer_nonce: [u8; 32],
    batch_id: u64,
    distro_results: Vec<SerializedDistroResult>,
}

Bandwidth reduction: only TopK DCT components transmitted, not full gradients. Integrity: SHA-256 over (step, batch_id, sparse tensors).

Coordinator states

Uninitialized → WaitingForMembers → Warmup → RoundTrain → RoundWitness
     → Cooldown → Finished | Paused

Dual backend:

  • TCP centralized (architectures/centralized/) — simpler; trust required
  • Solana on-chain (architectures/decentralized/) — trustless; rewards via token

Witness quorum (integrity without trust)

  • Bloom filter: 1024-bit (16x u64), 1% false-positive rate
  • Threshold: 2/3 of elected witnesses must submit valid proofs before state advances
  • Max 256 clients, max 32 witnesses per round
  • Client states: Healthy | Dropped | Withdrawn | Ejected

Experience mesh integration plan

Psyche component Experience equivalent Integration
Coordinator (TCP) Experience control plane (127.0.0.1:47800) Wire coordinator into control plane HTTP API
iroh QUIC P2P Experience mesh P2P transport (libp2p future) Use iroh as transport for training traffic
Solana treasury Experience economy (XP compute points) Map Psyche compute_point rewards → XP economy earn
Witness quorum No equivalent Adopt directly — ensures gradient integrity
DisTrO optimizer No equivalent Adopt directly — bandwidth reduction is critical on LAN

Workflow — adding a training peer

# On each peer (has Experience mesh installed):
xp mesh training join \
  --coordinator <coordinator-url> \
  --data-shard <path-to-shard> \
  --gpu-memory 6G

# Monitor:
xp mesh training status
xp mesh training metrics  # bandwidth, round progress, witness quorum state

Limitations (from RE)

  • Max 256 clients, max 32 witnesses — sufficient for LAN / small internet mesh
  • Solana backend requires Solana account + rent; TCP backend is simpler for local
  • Windows: Solana sandbox weakens on Windows (no user namespaces); use TCP mode
  • DisTrO DCT basis matrices pregenerated at startup; first-round latency is higher

Anti-patterns

  • Do NOT run both coordinator backends simultaneously — choose TCP (local/trusted) OR Solana (internet/trustless)
  • Do NOT exceed 256 clients without forking — the Solana program has this hard cap
  • Do NOT use Python wrappers — Psyche's Rust core is the right integration point; Python client examples are reference only
  • Do NOT skip witness quorum in distributed mode — integrity guarantees collapse without it; TCP mode is acceptable for trusted-peer LAN training

Provenance

  • Source: PsycheFoundation/nousnet (Apache 2.0) — .planning/ingest/2026-06-07-article-github-com-psychefoundation-nousnet.md
  • Tier: foundation / strategic
  • Adoption type: wrap + integrate; Apache 2.0; direct use allowed
  • Relationship to existing work: nousnet ingest already in memory as project_nousnet_remote_slm_training_plugin.md
  • Built: 2026-06-07
Install via CLI
npx skills add https://github.com/B2Gdevs/get-anything-done-monorepo --skill nousnet-distributed-training
Repository Details
star Stars 1
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator