name: finetune-model description: Track 2 quickstart wrapper for structured fine-tune prompts using a registry model, a registry dataset, Rockie GPU jobs, and inference-loader deployment of the trained artifact.
finetune-model
Run the Quickstart Track 2 fine-tune flow for a selectable registry model and dataset, then deploy the trained artifact through Rockie's inference loader. This skill composes existing platform APIs; it must not create a separate training service or run training locally.
When to invoke
Structured Quickstart Track 2 prompts:
Quickstart fine-tune request: track: finetune model: <registry model slug> registry_dataset_id: <registry dataset id> compute_target: rockie_gpu source: quickstart-pickerEquivalent structured lab prompts that explicitly ask to fine-tune a selected registry model on a selected registry dataset.
Do not invoke this skill for open-ended model selection, dataset creation, private tenant data ingestion, or custom training research. Route those to the appropriate planning or data workflow first.
Required v1 inputs
model: the registry model slug or id from the picker or user input.registry_dataset_id: the registry dataset id from the picker or user input.
Both fields are required for v1. If either is missing, stop and ask for that field. Do not infer a dataset from free text and do not treat a private data reference as selectable.
Explicit refusal: private_data_ref is not supported in v1. Refuse v1 requests
that provide private_data_ref, even if the prompt says it has been filtered,
until the per-tenant data API from issue #1298 exposes a validated training
handle.
Runtime auth
All Rockie control-plane calls require both headers:
-H "X-Tenant-Token: $ROCKIELAB_TENANT_TOKEN" \
-H "X-Tenant-Id: $ROCKIELAB_TENANT_ID"
ROCKIELAB_TENANT_TOKEN is the control-plane auth token. ROCKIELAB_TENANT_ID
is tenant scope/display identity only. Never use ROCKIELAB_TENANT_ID as
X-Tenant-Token.
Preflight before GPU spend
Before any job submission, budget approval, or GPU spend:
Preflight the model:
GET $ROCKIELAB_API_URL/api/registry/models/{slug}The model must be selectable for fine-tuning and must be a v1-supported
<7Btext-generation base model.Preflight the dataset:
POST $ROCKIELAB_API_URL/api/registry/datasets/resolve-for-trackThe request must include
track: "finetune"andregistry_dataset_id: "<id>". The resolved dataset must be selectable for Track 2.
If either preflight fails, stop before spend and surface the platform response.
Do not perform launcher-style credit, balance, spend-cap, membership, or local capacity checks. The existing budget-term-sheet gate and platform HTTP 402 path own spend approval and insufficient-credit handling. If the Rockie platform returns HTTP 402 at submit or load time, surface the platform 402 response through the shared redaction/sanitization boundary.
Local execution is forbidden
This skill must never run model training or model download work locally. Do not import or run torch, triton, CUDA libraries, transformers, peft, datasets, accelerate, bitsandbytes, vLLM, Docker training containers, or any local GPU/training stack on the orchestrator or skill host. Do not download model weights or datasets to the local workspace. Do not start local training, local serving, direct SSH jobs, supplier CLIs, or direct provider APIs.
All GPU training must go through the Rockie job platform using
skills/experiment/runtime/submit.py. All serving must go through the Rockie
inference-loader APIs.
Fine-tune job contract
Generate a single-GPU LoRA/SFT bash script for v1-supported <7B models. The
script runs inside the Rockie job pod and may install or import training
libraries there, not locally.
The script must:
use the preflighted model and resolved
registry_dataset_idsave trained outputs under
/workspace/results/fine-tuned-model/archive the trained output as
/workspace/results/fine-tuned-model.tar.gzprint structured progress lines exactly as:
FINETUNE_PROGRESS {"step":12,"loss":0.42}print
JOB_DONEonly after the archive is written
Use the /experiment helper, not raw curl, for job submission:
python skills/experiment/runtime/submit.py \
--gpu-type A100_80GB \
--gpu-count 1 \
--script-file ./finetune-track2.sh \
--timeout 14400 \
--term-sheet-json ./approved-term-sheet.json \
--region <approved-region> \
--tier <approved-tier> \
--origin-skill finetune-model \
--monitoring-profile-id experiment.ml_baseline.v1
The submit helper must remain in the path so the shared budget-term-sheet gate,
dashboard metadata, and platform HTTP 402 handling remain authoritative. When
dashboard metadata is available, pass origin_skill: "finetune-model" and a
training monitoring profile.
skills/experiment/runtime/submit.py tails job logs. Changed
job.last_log_line values beginning FINETUNE_PROGRESS with a JSON suffix
are the progress contract. Valid JSON fields step and loss are appended to
the dashboard as training.step and training.loss metrics, and a concise
progress line is printed to stdout. Malformed progress lines are ignored safely.
Artifact handoff and deployment
After the job reaches DONE, fetch artifacts:
GET $ROCKIELAB_API_URL/api/jobs/{job_id}/artifacts
Select results/fine-tuned-model.tar.gz. Fail closed if that artifact is absent.
Deploy the trained artifact through the inference loader:
POST $ROCKIELAB_API_URL/api/inference/loads
The load body must use the first-class job artifact URL:
{
"url": "rockie-job-artifact://<job_id>/results/fine-tuned-model.tar.gz",
"compute_target": "rockie_gpu",
"keep_running": true,
"dashboard": {
"origin_skill": "finetune-model",
"run_name": "Quickstart fine-tune deploy: <model>",
"monitoring_profile_id": "inference.batch_baseline.v1"
}
}
Do not use temporary signed HTTPS artifact URLs for the trained model handoff. Do not deploy by downloading the artifact locally.
Poll endpoint readiness with:
GET $ROCKIELAB_API_URL/api/inference/loads/{id}/endpoint/wait
Use GET /api/inference/loads/{id} and GET /api/inference/loads/{id}/logs
only for status and failure inspection.
Required final lab output
The final response must include all of:
- endpoint URL from the loader endpoint response; use this value directly as
ENDPOINT_URLbecause it already includes the modality path - bearer-token handling; if the loader returns a bearer token, show it as the
value to use in
Authorization: Bearer ...; if the platform withholds or rotates it, explain where the lab/runtime obtains the token without inventing one - copy-paste curl using the endpoint URL and bearer auth header
job_id- artifact reference:
rockie-job-artifact://<job_id>/results/fine-tuned-model.tar.gz load_id
Example curl:
curl "$ENDPOINT_URL" \
-H "Authorization: Bearer $ROCKIE_INFERENCE_BEARER_TOKEN" \
-H "Content-Type: application/json" \
-d '{"model":"fine-tuned-model","messages":[{"role":"user","content":"hello"}]}'