name: colab-unsloth-finetune description: >- Orchestrate autonomous LLM fine-tuning on Google Colab's free GPU tier using Unsloth, driven from VS Code (or Codex Desktop App) with sequential command execution and self-correcting error recovery.
Colab Unsloth Finetune — Autonomous LLM Fine-Tuning via Google Colab + Unsloth
Orchestrate autonomous LLM fine-tuning on Google Colab's free GPU tier using Unsloth, driven from VS Code (or Codex Desktop App) with sequential command execution and self-correcting error recovery.
Trigger
Use when the user asks to "fine-tune on Colab", "Colab Unsloth training", "free GPU fine-tuning", "autonomous Colab training", "overnight Colab fine-tune", "Colab으로 학습", "Unsloth 파인튜닝", "무료 GPU 학습", "코랩 파인튜닝", "자율 파인튜닝", "colab-unsloth-finetune", or wants to run LLM fine-tuning on Google Colab using Unsloth without paying for GPU.
Do NOT use for RunPod/HF Jobs GPU training (use runpod-pods or hf-model-trainer).
Do NOT use for local GPU training without Colab (use hf-model-trainer with local setup).
Do NOT use for model evaluation only (use hf-evaluation).
Do NOT use for inference endpoint deployment (use hf-endpoints).
Prerequisites
| Requirement | How to Verify |
|---|---|
| Google account with Drive access | Browser login at drive.google.com |
| VS Code + Google Colab extension | code --list-extensions | grep -i colab or install googlecolab.google-colab |
| Chrome with Colab extension (optional) | Check chrome://extensions for "Open in Colab" |
| Dataset in JSONL format | wc -l <dataset>.jsonl + head -1 <dataset>.jsonl | python3 -m json.tool |
| Sufficient Google Drive space | Dataset size + ~2x for model checkpoints |
| HuggingFace token (for gated models) | hf whoami or check HF_TOKEN env var |
Workflow — 10-Phase Pipeline
Phase 1: Prepare Dataset
- Validate the JSONL dataset locally:
python3 -c " import json, sys with open('<dataset>.jsonl') as f: lines = f.readlines() print(f'Total samples: {len(lines)}') sample = json.loads(lines[0]) print(f'Keys: {list(sample.keys())}') print(f'File size: {sys.getsizeof(open(\"<dataset>.jsonl\").read()) / 1024 / 1024:.1f} MB') " - Ensure the dataset follows one of Unsloth's supported formats:
- ShareGPT:
{"conversations": [{"from": "human", "value": "..."}, {"from": "gpt", "value": "..."}]} - Alpaca:
{"instruction": "...", "input": "...", "output": "..."} - Raw text:
{"text": "..."}
- ShareGPT:
Phase 2: Upload Dataset to Google Drive
- Upload via
gws-driveskill orgws drive upload:gws drive upload <dataset>.jsonl --folder "Colab_Training" - Or upload manually via drive.google.com
- Note the Drive path for later:
/content/drive/MyDrive/Colab_Training/<dataset>.jsonl
Phase 3: Create Google Colab Notebook
- Go to colab.research.google.com → New Notebook
- Runtime → Change runtime type → GPU (T4 is free tier, sufficient for ≤8B models with 4-bit LoRA)
- Name the notebook descriptively:
unsloth-finetune-YYYY-MM-DD.ipynb - Save to Google Drive
Phase 4: Connect VS Code to Colab (Optional — for Codex/Agent-Driven Execution)
- In VS Code, install the Google Colab extension (
googlecolab.google-colab) - Copy the Colab notebook URL
- Paste the URL into VS Code / Codex Desktop App
- This enables the coding agent to execute cells sequentially in the Colab runtime
Alternative: Use the Chrome "Open in Colab" extension for direct browser-based execution.
Phase 5: Mount Google Drive
Execute in the first Colab cell:
from google.colab import drive
drive.mount('/content/drive')
Authorize via the OAuth popup.
Phase 6: Download Dataset from Drive
import shutil
shutil.copy(
'/content/drive/MyDrive/Colab_Training/<dataset>.jsonl',
'/content/<dataset>.jsonl'
)
Verify:
!wc -l /content/<dataset>.jsonl
!head -1 /content/<dataset>.jsonl
Phase 7: Install Unsloth
%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
Verify installation:
import unsloth
print(f"Unsloth version: {unsloth.__version__}")
Phase 8: Configure and Run Training
from unsloth import FastLanguageModel
import torch
# 1. Load model with 4-bit quantization
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="<model_id>", # e.g. "unsloth/Qwen3-4B-bnb-4bit"
max_seq_length=2048,
dtype=None, # auto-detect
load_in_4bit=True,
)
# 2. Apply LoRA adapters
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth",
random_state=3407,
)
# 3. Prepare dataset
from datasets import load_dataset
dataset = load_dataset("json", data_files="/content/<dataset>.jsonl", split="train")
# 4. Format dataset (adapt to your format)
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(tokenizer, chat_template="chatml")
def formatting_prompts_func(examples):
convos = examples["conversations"]
texts = [tokenizer.apply_chat_template(
convo, tokenize=False, add_generation_prompt=False
) for convo in convos]
return {"text": texts}
dataset = dataset.map(formatting_prompts_func, batched=True)
# 5. Train
from trl import SFTTrainer
from transformers import TrainingArguments
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=2048,
dataset_num_proc=2,
packing=False,
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=5,
num_train_epochs=1,
learning_rate=2e-4,
fp16=not torch.cuda.is_bf16_supported(),
bf16=torch.cuda.is_bf16_supported(),
logging_steps=1,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="linear",
seed=3407,
output_dir="/content/outputs",
report_to="none",
),
)
gpu_stats = torch.cuda.get_device_properties(0)
print(f"GPU: {gpu_stats.name}, VRAM: {gpu_stats.total_mem / 1024**3:.1f} GB")
trainer_stats = trainer.train()
print(f"Training completed in {trainer_stats.metrics['train_runtime']:.0f}s")
Phase 9: Save and Export Model
# Save to Drive
model.save_pretrained("/content/drive/MyDrive/Colab_Training/finetuned-model")
tokenizer.save_pretrained("/content/drive/MyDrive/Colab_Training/finetuned-model")
# Optional: Push to HuggingFace Hub
model.push_to_hub("<your-hf-username>/<model-name>", token="<HF_TOKEN>")
tokenizer.push_to_hub("<your-hf-username>/<model-name>", token="<HF_TOKEN>")
# Optional: Export as GGUF for local inference
model.save_pretrained_gguf(
"/content/drive/MyDrive/Colab_Training/finetuned-model-gguf",
tokenizer,
quantization_method="q4_k_m",
)
Phase 10: Verify Results
# Quick inference test
FastLanguageModel.for_inference(model)
inputs = tokenizer(["<test_prompt>"], return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.batch_decode(outputs))
Error Recovery Patterns
| Error | Cause | Fix |
|---|---|---|
CUDA out of memory |
Batch size too large or model too big for free T4 (15GB) | Reduce per_device_train_batch_size to 1, enable gradient_checkpointing, use 4-bit quantization |
Drive mount timeout |
OAuth session expired | Re-run drive.mount() with force_remount=True |
Module not found: unsloth |
Installation failed silently | Run !pip install unsloth again, check for version conflicts with !pip list | grep -i unsloth |
RuntimeError: Expected CUDA |
GPU runtime not selected | Runtime → Change runtime type → T4 GPU |
Connection to runtime lost |
Colab session timeout (free tier: ~90min idle, ~12h max) | Reconnect, resume from last checkpoint in output_dir |
Dataset loading error |
JSONL format mismatch | Validate with python3 -m json.tool < first_line.json, check encoding (UTF-8) |
tokenizer.apply_chat_template error |
Wrong chat template for model | Check model card for correct template; try "qwen-2.5" or "chatml" |
Free Tier Constraints
| Constraint | Limit | Workaround |
|---|---|---|
| GPU type | T4 (15GB VRAM) | Use 4-bit quantization + LoRA (fits ≤8B models) |
| Session duration | ~12h continuous, ~90min idle timeout | Use checkpointing, resume from saved state |
| Storage | ~78GB disk, 12GB RAM | Stream large datasets, clear cache with torch.cuda.empty_cache() |
| Daily GPU quota | ~12h/day (varies) | Start training early, use Colab Pro for more quota |
Autonomous Agent Execution Pattern
When driven by a coding agent (Codex, Cursor, Claude):
- Cell-by-cell execution: The agent pastes and runs each code block sequentially
- Output inspection: After each cell, the agent reads stdout/stderr for errors
- Self-correction: On error, the agent diagnoses the issue and applies the fix from the Error Recovery table
- Progress tracking: Training logs (
loss,learning_rate,epoch) are printed everylogging_steps - Completion verification: Agent confirms training finished by checking for saved model files in Drive
This pattern enables overnight autonomous fine-tuning — start the process before sleep, wake up to a trained model.
Composition with Existing Skills
| Phase | Composable Skill | Purpose |
|---|---|---|
| Dataset prep | hf-datasets |
Create/validate HF-format datasets |
| Dataset upload | gws-drive |
Upload to Google Drive via CLI |
| Model search | hf-models |
Find the right base model on Hub |
| Post-training eval | hf-evaluation |
Run benchmarks on the fine-tuned model |
| Model upload | hf-cli |
Push to HuggingFace Hub |
| GGUF conversion | hf-model-trainer |
Reference for GGUF export patterns |
| Result distribution | x-to-slack |
Post training results to Slack |