kscale-kinfer - SKILL.md Agent Skill

name: kscale-kinfer description: '- User asks about deploying RL policies to real robots'

K-Scale kinfer Skill

"The K-Scale model export and inference tool"

Trigger Conditions

User asks about deploying RL policies to real robots
Questions about ONNX model inference, Rust ML runtime
Policy execution on embedded systems
Real-time neural network inference

Overview

kinfer is K-Scale's model inference engine for deploying trained policies:

Model Loading: ONNX format support via ort (ONNX Runtime)
Real-time Execution: Rust implementation for low latency
Logging: NDJSON telemetry for debugging
Integration: Seamless connection with KOS firmware

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│  kinfer Inference Pipeline                                               │
│                                                                          │
│  ┌──────────────┐      load      ┌──────────────┐                       │
│  │  ONNX Model  │───────────────▶│   Runtime    │                       │
│  │  (.onnx)     │                │  (ort-sys)   │                       │
│  └──────────────┘                └──────┬───────┘                       │
│                                         │                                │
│  ┌──────────────┐      step      ┌──────┴───────┐      output           │
│  │ Observation  │───────────────▶│   Inference  │───────────────▶Action │
│  │  (sensors)   │                │    Engine    │                       │
│  └──────────────┘                └──────────────┘                       │
│                                         │                                │
│                                         ▼                                │
│                                  ┌──────────────┐                       │
│                                  │   Logger     │                       │
│                                  │  (NDJSON)    │                       │
│                                  └──────────────┘                       │
└─────────────────────────────────────────────────────────────────────────┘

Key Features

1. Single Tokio Runtime

// Efficient async execution with GIL management
lazy_static! {
    static ref RUNTIME: Runtime = Runtime::new().unwrap();
}

2. Pre-fetch Inputs

// Minimize latency by preparing inputs ahead of time
fn step_and_take_action(&mut self, observation: &[f32]) -> Vec<f32> {
    // Pre-fetch next input while processing current
    ...
}

3. NDJSON Logging

// Async logging thread for telemetry
struct Logger {
    file: File,
    tx: Sender<LogEntry>,
}

Language & Stack

Primary: Rust (performance-critical)
ML Runtime: ONNX Runtime (ort, ort-sys)
Async: Tokio for non-blocking I/O
Bindings: Python via PyO3

GF(3) Trit Assignment

Trit: -1 (MINUS)
Role: Verification/Validation (inference must be correct)
Color: #6E5FE4
URI: skill://kscale-kinfer#6E5FE4

Balanced Triads

kscale-kinfer (-1) ⊗ kscale-ksim (0) ⊗ onnx-export (+1) = 0 ✓
kscale-kinfer (-1) ⊗ rust-ml (0) ⊗ policy-training (+1) = 0 ✓

Key Contributors

Contributor	Focus Areas
b-vm	Step function, command names
codekansas	Performance, refactoring
WT-MM	Logging, env variables
alik-git	NDJSON logging, plotting
nfreq	Tokio runtime, GIL management

Example Usage

import kinfer

# Load model
model = kinfer.load_model("walking_policy.onnx")

# Get observation from sensors
obs = get_sensor_data()

# Run inference
action = model.step(obs)

# Apply to actuators
apply_action(action)

Rust API

use kinfer::InferenceEngine;

let mut engine = InferenceEngine::load("policy.onnx")?;

loop {
    let obs = get_observation();
    let action = engine.step_and_take_action(&obs);
    send_to_actuators(&action);
}

ACSet Schema

@present SchKinfer(FreeSchema) begin
    # Objects
    Model::Ob           # ONNX model
    Tensor::Ob          # Input/output tensors
    Runtime::Ob         # ONNX Runtime session
    LogEntry::Ob        # Telemetry records
    
    # Morphisms (inference pipeline)
    load::Hom(Model, Runtime)             # Model → Runtime loading
    input::Hom(Tensor, Runtime)           # Observation → Runtime
    output::Hom(Runtime, Tensor)          # Runtime → Action
    step::Hom(Tensor, Tensor)             # obs → action (composition)
    
    # Morphisms (logging)
    log::Hom(Runtime, LogEntry)           # Runtime → Telemetry
    
    # Attributes
    Shape::AttrType
    Dtype::AttrType
    Latency::AttrType
    
    shape::Attr(Tensor, Shape)
    dtype::Attr(Tensor, Dtype)
    latency::Attr(Runtime, Latency)
    
    # Key constraint: deterministic inference
    # step = output ∘ input (functorial)
    # Same input → same output (reproducibility)
end

References

kscalelabs/kinfer - Main repository (17 stars)
kscalelabs/kinfer-sim - Simulation visualization
ONNX Runtime - Inference backend