rlat-build-on-kaggle

name: rlat-build-on-kaggle description: >- Build large rlat knowledge models (.rlat files) on Kaggle's free T4 GPU when the local machine has no GPU or encoding would take too long on CPU. Walks the user through Kaggle account + CLI setup, builds a kernel script that installs rlat[build,ann], sparse-clones (or uploads) source, encodes the corpus on T4, writes a remote-mode .rlat back, and pulls the artefact home. Trigger when the user mentions Kaggle, "free GPU", "build remotely", "no GPU here", a corpus too large to encode locally (~10K+ passages on CPU), or wants to rebuild several .rlat files in one batch. Not for: small corpora that finish in minutes on CPU; users who already have a CUDA GPU (just `pip install rlat[build]` and run `rlat build` locally). allowed-tools: Bash, Read, Write, Edit, Glob, Grep

rlat-build-on-kaggle — encode large corpora on Kaggle's free T4

You are helping the user build a .rlat knowledge model on Kaggle's free T4 GPU. The flow is: set up the kaggle CLI → write a kernel script that installs rlat + builds the corpus on /kaggle/working → push → poll → pull the .rlat back. The fast path takes ~20-40 minutes wall time for a corpus of ~50K passages.

When to use this skill (and when not to)

Local situation	Use this skill?
No GPU; corpus has >10K passages	Yes — encoding 10K passages on CPU runs ~30-60 min; T4 finishes in 2-5 min
Several `.rlat` to rebuild as a batch	Yes — one Kaggle session can build all of them
You want a remote-mode `.rlat` (consumers fetch source from a URL) and the source already lives on GitHub	Yes — Kaggle has fast access to GitHub + the GPU
You have a CUDA GPU on your machine	No — `pip install rlat[build]` then `rlat build … --runtime torch` is faster than the round-trip to Kaggle
Corpus has <2K passages	No — finishes in <2 min on CPU; the Kaggle round-trip is more overhead than the encode
You need to keep the corpus private and Kaggle internet egress is unacceptable	No — Kaggle kernels need internet on for `pip install`; sensitive corpora should be encoded locally or on a private GPU

A T4 encodes gte-modernbert-base at roughly 1500-2500 passages/sec at batch size 64. Locally with no GPU and ONNX-CPU runtime, the same encoder runs at roughly 80-150 passages/sec depending on the CPU. The crossover where Kaggle wins is usually around 5-10K passages once you account for the ~5-min push + queue + setup overhead.

Step 1 — One-time setup (≈10 minutes)

The user needs three things before the first push: a Kaggle account, the kaggle Python CLI, and an API token. Walk them through whichever pieces they don't already have.

Kaggle account

Sign up at kaggle.com — free. Verify the phone number under Settings → Phone verification — GPU minutes are locked behind phone verification.

CLI install

pip install kaggle
kaggle --version  # confirm install

API token

Visit kaggle.com/settings/account.
Click Create New Token under API. Browser downloads kaggle.json.
Move it to the location the CLI expects:
- Linux/macOS: mkdir -p ~/.kaggle && mv ~/Downloads/kaggle.json ~/.kaggle/ && chmod 600 ~/.kaggle/kaggle.json
- Windows: move %USERPROFILE%\Downloads\kaggle.json %USERPROFILE%\.kaggle\kaggle.json

Test:

kaggle kernels list -m  # lists your own kernels (should not error)

Windows: set `PYTHONUTF8=1` (one-time, very important)

The Kaggle Windows CLI decodes server responses with CP1252. Any non-ASCII character in the response (em-dashes in your kernel title, Unicode in server messages) crashes the response parser, and the crash silently eats a successful push — the kernel exists server-side but the slug is never recorded client-side. Symptoms: 'charmap' codec can't decode byte ... traceback, then 403 Permission denied on every subsequent status / output call.

Fix it durably:

setx PYTHONUTF8 1   # cmd.exe

Then open a new terminal. From this point on, every Python process on Windows runs in UTF-8 mode and the kaggle CLI works.

If you can't setx (locked-down corp machine), prefix every kaggle CLI call with PYTHONUTF8=1:

PYTHONUTF8=1 kaggle kernels push -p kaggle/job --accelerator NvidiaTeslaT4
PYTHONUTF8=1 kaggle kernels status <username>/<slug>
PYTHONUTF8=1 kaggle kernels output <username>/<slug> -p ./outputs/

Step 2 — Pick the source-storage pattern

Two patterns. Pick whichever matches where the source corpus lives.

Pattern A — Source on GitHub (most common for public docs)

The kernel sparse-clones the relevant subdir at a pinned commit, builds a remote-mode .rlat whose manifest.json records the URL pattern https://raw.githubusercontent.com/<repo>/<sha>/<scope>/..., and consumers fetch source on demand with SHA verification. The .rlat itself stays small (just bands + ANN index + manifest).

This is the right pattern when:

The corpus already lives in a public GitHub repo
You want the .rlat to stay small for shipping (HF Hub, package, etc.)
You're OK with consumers fetching source over the network at query time

Pattern B — Local source uploaded as a Kaggle dataset

If the source isn't on GitHub (private codebase, scraped corpus, internal docs), package it as a tarball, push to Kaggle as a private dataset, then mount it in the kernel and build a bundled-mode .rlat. The bundled mode embeds source bytes inside the .rlat; the consumer needs nothing else.

# Local: package + push as dataset
tar czf my_corpus.tar.gz -C /path/to/corpus .
mkdir kaggle/data
mv my_corpus.tar.gz kaggle/data/
cat > kaggle/data/dataset-metadata.json <<EOF
{
  "title": "My corpus for rlat build",
  "id": "<your-username>/my-corpus-source",
  "licenses": [{"name": "CC0-1.0"}]
}
EOF
kaggle datasets create -p kaggle/data    # first time
# subsequent updates: kaggle datasets version -p kaggle/data -m "update"

Then in the kernel script (Step 3), mount the dataset by adding it to dataset_sources in the kernel metadata, glob-discover the unpacked files at /kaggle/input/ (Kaggle auto-extracts .tar.gz), and run rlat build against the discovered path with --store-mode bundled.

The rest of this skill walks Pattern A. Pattern B differs only in the source-prep block; everything from "push the kernel" onwards is identical.

Step 3 — Write the kernel script

Create a project-relative working dir (not mktemp — the Kaggle CLI has path issues with temp dirs on Windows), then drop two files: build_corpus.py (the work) and kernel-metadata.json (Kaggle config).

mkdir -p kaggle/rlat-build

Pre-built templates ship with this skill — copy them straight from scripts/ and edit the CONFIG block:

# single-corpus build
cp .claude/skills/rlat-build-on-kaggle/scripts/build_corpus.py        kaggle/rlat-build/
cp .claude/skills/rlat-build-on-kaggle/scripts/kernel-metadata.json   kaggle/rlat-build/

# multi-corpus batch (resilient checkpointing — see "Multi-corpus batches" below)
cp .claude/skills/rlat-build-on-kaggle/scripts/build_corpora_batch.py kaggle/rlat-build/

Then edit the CONFIG block at the top of the script and the id / title fields in the metadata JSON. Both files are self-contained and have inline comments explaining each knob.

The full single-corpus template is reproduced below so you can read it without leaving this skill — but if you're going to actually push, copy the file from scripts/ rather than re-typing.

`kaggle/rlat-build/build_corpus.py`

Adapt the variables in the CONFIG block to your corpus.

"""Build an rlat knowledge model on a Kaggle T4 in remote-mode.

Writes /kaggle/working/<name>.rlat as soon as the build finishes; the
file ships out at COMPLETE / ERROR.
"""
from __future__ import annotations
import json, shutil, subprocess, sys, time, traceback
from pathlib import Path

# ── Config: adapt to your corpus ────────────────────────────────────
CONFIG = {
    "name": "my-docs",                          # output: <name>.rlat
    "github": "owner/repo",                     # public GitHub repo
    "branch": "main",                           # branch to clone
    "scope": "docs",                            # subdir inside the repo to index
}
# ────────────────────────────────────────────────────────────────────

WORK = Path("/kaggle/working")
WORK.mkdir(exist_ok=True, parents=True)
SRC = WORK / "_src"
SRC.mkdir(exist_ok=True, parents=True)


def shell(cmd, log=None):
    print(f"$ {' '.join(cmd)}", flush=True)
    if log is None:
        return subprocess.run(cmd, check=True)
    with open(log, "ab") as f:
        f.write(f"\n$ {' '.join(cmd)}\n".encode())
        f.flush()
        proc = subprocess.run(cmd, stdout=subprocess.PIPE,
                              stderr=subprocess.STDOUT, check=False)
        f.write(proc.stdout)
    sys.stdout.write(proc.stdout.decode("utf-8", errors="replace"))
    sys.stdout.flush()
    if proc.returncode != 0:
        raise subprocess.CalledProcessError(proc.returncode, cmd)
    return proc


def main() -> int:
    # 1. Install rlat with the GPU build extras AND the ANN extras.
    #    `[ann]` pulls faiss-cpu, which rlat needs for HNSW above ~5000
    #    passages. Forgetting it lets the encode finish then crashes with
    #    "RuntimeError: faiss is not installed".
    shell([sys.executable, "-m", "pip", "install", "--quiet",
           "rlat[build,ann]"])
    shell(["rlat", "install-encoder"])

    # 2. Sanity-print: rlat version, encoder + pinned revision, CUDA presence.
    #    Failing these early saves a 30-minute encode against a broken env.
    shell([sys.executable, "-c",
           "import resonance_lattice as rl; print('rlat', rl.__version__);"
           "from resonance_lattice.install.encoder import MODEL_ID, PINNED_REVISION;"
           "print('encoder', MODEL_ID, '@', PINNED_REVISION);"
           "import torch; print('cuda:', torch.cuda.is_available());"
           "print('device:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU')"])

    # 3. Sparse-clone the scope subdir at branch HEAD. Records the SHA so
    #    the remote-mode .rlat manifest can pin to it.
    target = SRC / CONFIG["github"].replace("/", "_")
    if target.exists():
        shutil.rmtree(target)
    target.mkdir(parents=True, exist_ok=True)
    shell(["git", "init", "-q", "-b", CONFIG["branch"], str(target)])
    shell(["git", "-C", str(target), "remote", "add", "origin",
           f"https://github.com/{CONFIG['github']}.git"])
    shell(["git", "-C", str(target), "config", "core.sparseCheckout", "true"])
    (target / ".git/info/sparse-checkout").write_text(f"{CONFIG['scope']}/*\n")
    shell(["git", "-C", str(target), "fetch", "--depth", "1", "origin", CONFIG["branch"]])
    shell(["git", "-C", str(target), "checkout", "FETCH_HEAD"])
    sha = subprocess.check_output(
        ["git", "-C", str(target), "rev-parse", "FETCH_HEAD"], text=True).strip()
    print(f"pinned to {CONFIG['github']}@{sha}", flush=True)

    scope_dir = target / CONFIG["scope"]
    if not scope_dir.exists():
        raise RuntimeError(f"scope path {CONFIG['scope']!r} not found in clone")

    # 4. Build the .rlat in remote mode. The manifest records this URL
    #    pattern; consumers fetch source from raw.githubusercontent.com
    #    and SHA-verify it. --runtime torch picks CUDA on T4 (~10-30x
    #    faster than ONNX-CPU). --batch-size 64 doubles throughput at
    #    768d vs the default 32 without OOM on T4 (~13 GB RAM).
    out = WORK / f"{CONFIG['name']}.rlat"
    url_base = (f"https://raw.githubusercontent.com/{CONFIG['github']}"
                f"/{sha}/{CONFIG['scope']}/")
    log = WORK / f"{CONFIG['name']}.build.log"
    t0 = time.time()
    shell(["rlat", "build", str(scope_dir),
           "-o", str(out),
           "--store-mode", "remote",
           "--remote-url-base", url_base,
           "--runtime", "torch",
           "--batch-size", "64"], log=log)

    # 5. Write a result summary so the slug status panel + JSON record
    #    both report success cleanly.
    result = {
        "status": "ok",
        "rlat_path": str(out),
        "size_bytes": out.stat().st_size,
        "github": CONFIG["github"],
        "scope": CONFIG["scope"],
        "commit_sha": sha,
        "remote_url_base": url_base,
        "wall_seconds": round(time.time() - t0, 1),
    }
    (WORK / "build_results.json").write_text(json.dumps(result, indent=2))
    print(f"DONE  size={result['size_bytes']:,} bytes  wall={result['wall_seconds']}s")
    return 0


if __name__ == "__main__":
    try:
        sys.exit(main())
    except Exception as e:
        # Persist failure record so build_results.json is downloadable
        (WORK / "build_results.json").write_text(json.dumps(
            {"status": "failed", "error": f"{type(e).__name__}: {e}"}, indent=2))
        traceback.print_exc()
        sys.exit(1)

`kaggle/rlat-build/kernel-metadata.json`

{
  "id": "<your-username>/rlat-build-<corpus-name>",
  "title": "rlat build <corpus-name>",
  "code_file": "build_corpus.py",
  "language": "python",
  "kernel_type": "script",
  "is_private": "true",
  "enable_gpu": "true",
  "enable_tpu": "false",
  "enable_internet": "true",
  "dataset_sources": [],
  "competition_sources": [],
  "kernel_sources": [],
  "model_sources": []
}

The slug must equal slugify(title). Kaggle's Save endpoint enforces id == slugify(title) for existing kernels. If they don't match, re-pushes return 409 Conflict. Easiest fix: make both sides simple, identical, all-lowercase-and-hyphens (e.g. rlat-build-my-docs).

Boolean fields are strings ("true" / "false"), not native JSON booleans. The CLI rejects native booleans silently.

Step 4 — Push the kernel

# Linux/macOS
kaggle kernels push -p kaggle/rlat-build --accelerator NvidiaTeslaT4

# Windows (no setx PYTHONUTF8 done yet)
PYTHONUTF8=1 kaggle kernels push -p kaggle/rlat-build --accelerator NvidiaTeslaT4

Always pass --accelerator NvidiaTeslaT4. The enable_gpu metadata flag alone may assign a P100, which is sm_60 — incompatible with contemporary PyTorch (sm_70+ minimum). CUDA ops fail silently on P100, the encode falls back to CPU at roughly 1/30th speed, and the kernel times out before finishing.

The push prints a kernel URL like https://www.kaggle.com/code/<username>/<slug>. Save it — that's where you'll watch progress.

The "Your kernel title does not resolve to the specified id" line is a warning, not an error. It fires when id and slugify(title) disagree. On first push, Kaggle creates the kernel under the title slug, so subsequent status calls against your id field return 403. Make id and slugify(title) match from the start to avoid this.

Step 5 — Poll for completion

kaggle kernels status <username>/<slug>

Status flow: QUEUED → RUNNING → COMPLETE (or ERROR).

The CLI doesn't stream progress. The web UI does — open https://www.kaggle.com/code/<username>/<slug> and watch the "Log Message" panel. While RUNNING:

The web UI shows "Output 0 B" until the run terminates. This is not a failure indicator — Kaggle only ships /kaggle/working/ to the output bucket at COMPLETE/ERROR. Files are being written; you just can't see them until the run ends.
kaggle kernels output … mid-run returns the (often empty) auto-generated .log and no files from /kaggle/working/.

If RUNNING looks stuck for >10 min with no progress in the web UI's log panel, the encode is genuinely hung. Cancel by re-pushing the same slug — there's no kaggle kernels stop command, but a re-push supersedes the running version.

If QUEUED for >5 min, GPU resources are contended; either wait (usually clears in 10-30 min) or push at off-peak hours.

Step 6 — Pull the .rlat back

kaggle kernels output <username>/<slug> -p ./outputs/ -o

This pulls everything in /kaggle/working/ — the .rlat, the .build.log, and build_results.json. To target only the .rlat:

kaggle kernels output <username>/<slug> -p ./outputs/ \
   --file-pattern "<name>.rlat"

Large file download quirk (>~300 MB): the CLI silently buffers the whole file in memory before writing to disk. Output appears stuck at 0 bytes for 3-5 minutes, then writes in one burst. Use a long timeout (15+ min) and don't kill the command if the file size hasn't moved — poll file size as the liveness signal, not stdout.

Step 7 — Verify the .rlat locally

# Magic bytes: real ZIP archive starts with PK\x03\x04
python -c "print(open('outputs/my-docs.rlat', 'rb').read(4))"
# Expected: b'PK\x03\x04'

# Open it and inspect
rlat profile outputs/my-docs.rlat

# First query (downloads source from raw.githubusercontent.com on demand)
rlat search outputs/my-docs.rlat "your test query" --top-k 3

If profile reports passage_count, band info, and the build's commit_sha, the build worked. The first remote-mode query is slow (network fetches per hit); subsequent queries against the same passages are fast (the source bytes are cached locally).

Multi-corpus batches (resilient pattern)

If you're rebuilding several .rlat files in one Kaggle session, structure the script around a checkpoint file in /kaggle/working/ so a late failure doesn't lose earlier successes:

RESULTS = WORK / "build_results.json"
results = json.loads(RESULTS.read_text()) if RESULTS.exists() else {}

for corpus in CORPORA:
    name = corpus["name"]
    out = WORK / f"{name}.rlat"
    # Skip if result-record AND output file are present
    if results.get(name, {}).get("status") == "ok" and out.exists():
        print(f"[{name}] already done, skipping")
        continue
    try:
        results[name] = build_one(corpus)
    except Exception as e:
        results[name] = {"status": "failed", "error": f"{type(e).__name__}: {e}"}
        traceback.print_exc()
    # Persist after EVERY corpus — this is the resilience anchor
    RESULTS.write_text(json.dumps(results, indent=2))

The in-kernel filesystem persists between iterations of one run. Each .rlat that hits /kaggle/working/ will ship in the output bucket even if a later corpus crashes.

For genuinely cross-push resume (e.g. a partial run on Monday + finish on Tuesday), attach the kernel's previous outputs as a dataset_sources entry — Kaggle's "version this kernel" doesn't preserve /kaggle/working/ across versions.

Common gotchas (in failure-frequency order)

Symptom	Cause	Fix
`'charmap' codec can't decode byte ...` traceback on push	Windows CP1252 decode of server response	`setx PYTHONUTF8 1` once; new terminal; re-push. The server-side push almost certainly succeeded — re-push is idempotent.
`subprocess-exited-with-error` / "Getting requirements to build wheel did not run successfully" when `pip install`-ing local source (Pattern B)	`/kaggle/input` is read-only; setuptools writes `.egg-info` into the source tree during the metadata hook	`shutil.copytree` the source to `/kaggle/working` first, then `pip install` from the copy. Also drop `pip --quiet` — it hides the real setuptools traceback. Include any file pyproject references (`license = {file=...}`, README) in the dataset.
`409 Conflict for url: …KernelsApiService/SaveKernel` on re-push	`id` ≠ `slugify(title)` after first push registered the title slug	Make `id` and `slugify(title)` match (lowercase + hyphens), then re-push
`status` returns 403 right after push	Charmap crash silently ate the success record	`setx PYTHONUTF8 1`, re-push
Encode finishes, `.rlat` build crashes with `RuntimeError: faiss is not installed`	`pip install rlat[build]` doesn't include `[ann]`; corpora >5000 passages need it	Use `pip install rlat[build,ann]`
Kernel runs "forever" then runs out of compute	P100 assigned (sm_60); CUDA falls back to CPU silently	Always pass `--accelerator NvidiaTeslaT4` on push
Local `kaggle kernels output` says "stuck at 0 bytes" but no error	Large file (>300 MB) — CLI buffers before writing	Wait 3-5 min; do not kill
Mid-run web UI shows "Output 0 B"	Kaggle ships outputs only at terminal state	Normal — check the log panel for live progress instead
`ModuleNotFoundError: torch` after install completes	rlat[build] base install conflict, or pip didn't pick up the extras	`pip install --upgrade --force-reinstall rlat[build,ann]`
Stderr lines appear before stdout from earlier timestamps	Stdout/stderr flush at different times in Kaggle log	Trust the timestamp column, not visual order

Resource limits to budget against

Limit	Value
GPU quota	30 hours / week (free tier)
Max execution time	12 hours / kernel session
Disk in `/kaggle/working/`	~73 GB
Output size shipped to user	20 GB max
RAM (T4)	~13 GB

A single gte-modernbert-base encode at batch 64 uses ~3 GB GPU memory. A 100K-passage corpus encodes in ~2-5 minutes. The bottleneck for big batches is usually the source clone (network) rather than the encode.

Wrapping up

After the run:

The .rlat is in outputs/<name>.rlat. Move it wherever your project expects (./<name>.rlat, data/<name>.rlat, etc.).
If you used remote mode, make sure the consumer machine has internet access at query time — passages are streamed from raw.githubusercontent.com. Cache lives at ~/.cache/rlat/sources/<sha>/ and warms on first use.
Smoke-test with rlat search <name>.rlat "anything" locally before shipping the artefact further.

For deeper background on what rlat build does, see docs/user/CLI.md (the build section). For the storage-mode trade-offs (why remote? why bundled?), see docs/user/STORAGE_MODES.md.