colab-unsloth-finetune

star 0

Orchestrate autonomous LLM fine-tuning on Google Colab's free GPU tier using Unsloth, driven from VS Code (or Codex Desktop App) with sequential command execution and self-correcting error recovery.

sylvanus4 By sylvanus4 schedule Updated 6/6/2026

name: colab-unsloth-finetune description: >- Orchestrate autonomous LLM fine-tuning on Google Colab's free GPU tier using Unsloth, driven from VS Code (or Codex Desktop App) with sequential command execution and self-correcting error recovery.

Colab Unsloth Finetune — Autonomous LLM Fine-Tuning via Google Colab + Unsloth

Orchestrate autonomous LLM fine-tuning on Google Colab's free GPU tier using Unsloth, driven from VS Code (or Codex Desktop App) with sequential command execution and self-correcting error recovery.


Trigger

Use when the user asks to "fine-tune on Colab", "Colab Unsloth training", "free GPU fine-tuning", "autonomous Colab training", "overnight Colab fine-tune", "Colab으로 학습", "Unsloth 파인튜닝", "무료 GPU 학습", "코랩 파인튜닝", "자율 파인튜닝", "colab-unsloth-finetune", or wants to run LLM fine-tuning on Google Colab using Unsloth without paying for GPU.

Do NOT use for RunPod/HF Jobs GPU training (use runpod-pods or hf-model-trainer). Do NOT use for local GPU training without Colab (use hf-model-trainer with local setup). Do NOT use for model evaluation only (use hf-evaluation). Do NOT use for inference endpoint deployment (use hf-endpoints).


Prerequisites

Requirement How to Verify
Google account with Drive access Browser login at drive.google.com
VS Code + Google Colab extension code --list-extensions | grep -i colab or install googlecolab.google-colab
Chrome with Colab extension (optional) Check chrome://extensions for "Open in Colab"
Dataset in JSONL format wc -l <dataset>.jsonl + head -1 <dataset>.jsonl | python3 -m json.tool
Sufficient Google Drive space Dataset size + ~2x for model checkpoints
HuggingFace token (for gated models) hf whoami or check HF_TOKEN env var

Workflow — 10-Phase Pipeline

Phase 1: Prepare Dataset

  1. Validate the JSONL dataset locally:
    python3 -c "
    import json, sys
    with open('<dataset>.jsonl') as f:
        lines = f.readlines()
    print(f'Total samples: {len(lines)}')
    sample = json.loads(lines[0])
    print(f'Keys: {list(sample.keys())}')
    print(f'File size: {sys.getsizeof(open(\"<dataset>.jsonl\").read()) / 1024 / 1024:.1f} MB')
    "
    
  2. Ensure the dataset follows one of Unsloth's supported formats:
    • ShareGPT: {"conversations": [{"from": "human", "value": "..."}, {"from": "gpt", "value": "..."}]}
    • Alpaca: {"instruction": "...", "input": "...", "output": "..."}
    • Raw text: {"text": "..."}

Phase 2: Upload Dataset to Google Drive

  1. Upload via gws-drive skill or gws drive upload:
    gws drive upload <dataset>.jsonl --folder "Colab_Training"
    
  2. Or upload manually via drive.google.com
  3. Note the Drive path for later: /content/drive/MyDrive/Colab_Training/<dataset>.jsonl

Phase 3: Create Google Colab Notebook

  1. Go to colab.research.google.com → New Notebook
  2. Runtime → Change runtime type → GPU (T4 is free tier, sufficient for ≤8B models with 4-bit LoRA)
  3. Name the notebook descriptively: unsloth-finetune-YYYY-MM-DD.ipynb
  4. Save to Google Drive

Phase 4: Connect VS Code to Colab (Optional — for Codex/Agent-Driven Execution)

  1. In VS Code, install the Google Colab extension (googlecolab.google-colab)
  2. Copy the Colab notebook URL
  3. Paste the URL into VS Code / Codex Desktop App
  4. This enables the coding agent to execute cells sequentially in the Colab runtime

Alternative: Use the Chrome "Open in Colab" extension for direct browser-based execution.

Phase 5: Mount Google Drive

Execute in the first Colab cell:

from google.colab import drive
drive.mount('/content/drive')

Authorize via the OAuth popup.

Phase 6: Download Dataset from Drive

import shutil
shutil.copy(
    '/content/drive/MyDrive/Colab_Training/<dataset>.jsonl',
    '/content/<dataset>.jsonl'
)

Verify:

!wc -l /content/<dataset>.jsonl
!head -1 /content/<dataset>.jsonl

Phase 7: Install Unsloth

%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

Verify installation:

import unsloth
print(f"Unsloth version: {unsloth.__version__}")

Phase 8: Configure and Run Training

from unsloth import FastLanguageModel
import torch

# 1. Load model with 4-bit quantization
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="<model_id>",  # e.g. "unsloth/Qwen3-4B-bnb-4bit"
    max_seq_length=2048,
    dtype=None,               # auto-detect
    load_in_4bit=True,
)

# 2. Apply LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                     "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
)

# 3. Prepare dataset
from datasets import load_dataset
dataset = load_dataset("json", data_files="/content/<dataset>.jsonl", split="train")

# 4. Format dataset (adapt to your format)
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(tokenizer, chat_template="chatml")

def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [tokenizer.apply_chat_template(
        convo, tokenize=False, add_generation_prompt=False
    ) for convo in convos]
    return {"text": texts}

dataset = dataset.map(formatting_prompts_func, batched=True)

# 5. Train
from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=2048,
    dataset_num_proc=2,
    packing=False,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        num_train_epochs=1,
        learning_rate=2e-4,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="/content/outputs",
        report_to="none",
    ),
)

gpu_stats = torch.cuda.get_device_properties(0)
print(f"GPU: {gpu_stats.name}, VRAM: {gpu_stats.total_mem / 1024**3:.1f} GB")

trainer_stats = trainer.train()
print(f"Training completed in {trainer_stats.metrics['train_runtime']:.0f}s")

Phase 9: Save and Export Model

# Save to Drive
model.save_pretrained("/content/drive/MyDrive/Colab_Training/finetuned-model")
tokenizer.save_pretrained("/content/drive/MyDrive/Colab_Training/finetuned-model")

# Optional: Push to HuggingFace Hub
model.push_to_hub("<your-hf-username>/<model-name>", token="<HF_TOKEN>")
tokenizer.push_to_hub("<your-hf-username>/<model-name>", token="<HF_TOKEN>")

# Optional: Export as GGUF for local inference
model.save_pretrained_gguf(
    "/content/drive/MyDrive/Colab_Training/finetuned-model-gguf",
    tokenizer,
    quantization_method="q4_k_m",
)

Phase 10: Verify Results

# Quick inference test
FastLanguageModel.for_inference(model)
inputs = tokenizer(["<test_prompt>"], return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.batch_decode(outputs))

Error Recovery Patterns

Error Cause Fix
CUDA out of memory Batch size too large or model too big for free T4 (15GB) Reduce per_device_train_batch_size to 1, enable gradient_checkpointing, use 4-bit quantization
Drive mount timeout OAuth session expired Re-run drive.mount() with force_remount=True
Module not found: unsloth Installation failed silently Run !pip install unsloth again, check for version conflicts with !pip list | grep -i unsloth
RuntimeError: Expected CUDA GPU runtime not selected Runtime → Change runtime type → T4 GPU
Connection to runtime lost Colab session timeout (free tier: ~90min idle, ~12h max) Reconnect, resume from last checkpoint in output_dir
Dataset loading error JSONL format mismatch Validate with python3 -m json.tool < first_line.json, check encoding (UTF-8)
tokenizer.apply_chat_template error Wrong chat template for model Check model card for correct template; try "qwen-2.5" or "chatml"

Free Tier Constraints

Constraint Limit Workaround
GPU type T4 (15GB VRAM) Use 4-bit quantization + LoRA (fits ≤8B models)
Session duration ~12h continuous, ~90min idle timeout Use checkpointing, resume from saved state
Storage ~78GB disk, 12GB RAM Stream large datasets, clear cache with torch.cuda.empty_cache()
Daily GPU quota ~12h/day (varies) Start training early, use Colab Pro for more quota

Autonomous Agent Execution Pattern

When driven by a coding agent (Codex, Cursor, Claude):

  1. Cell-by-cell execution: The agent pastes and runs each code block sequentially
  2. Output inspection: After each cell, the agent reads stdout/stderr for errors
  3. Self-correction: On error, the agent diagnoses the issue and applies the fix from the Error Recovery table
  4. Progress tracking: Training logs (loss, learning_rate, epoch) are printed every logging_steps
  5. Completion verification: Agent confirms training finished by checking for saved model files in Drive

This pattern enables overnight autonomous fine-tuning — start the process before sleep, wake up to a trained model.


Composition with Existing Skills

Phase Composable Skill Purpose
Dataset prep hf-datasets Create/validate HF-format datasets
Dataset upload gws-drive Upload to Google Drive via CLI
Model search hf-models Find the right base model on Hub
Post-training eval hf-evaluation Run benchmarks on the fine-tuned model
Model upload hf-cli Push to HuggingFace Hub
GGUF conversion hf-model-trainer Reference for GGUF export patterns
Result distribution x-to-slack Post training results to Slack
Install via CLI
npx skills add https://github.com/sylvanus4/github-to-notion-sync --skill colab-unsloth-finetune
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator