name: model-fine-tuning description: Fine-tune LLMs and ML models — LoRA, QLoRA, PEFT, Hugging Face. Dataset prep, training, evaluation, deployment domain: core tags:
- fine
- infrastructure
- machine-learning
- memory
- model
- self-improvement
- tuning
Overview
Fine-tune pre-trained models for specific tasks. Covers LoRA/QLoRA for efficient training, dataset preparation, evaluation, and deployment of custom models.
Capabilities
- Fine-tune LLMs with LoRA and QLoRA (low-rank adaptation)
- Prepare datasets in instruction/chat format
- Use Hugging Face Transformers + PEFT for training
- Evaluate fine-tuned models with benchmarks
- Merge LoRA adapters back into base models
- Deploy fine-tuned models via vLLM or Ollama
When to Use
- Need a model specialized for a specific domain (legal, medical, code)
- Want better performance on specific tasks than general models
- Have domain-specific data that improves with training
- Need to reduce model size while maintaining quality
- Building a product that needs a custom AI model
When NOT to Use
- Task is outside your authorization scope
- You need to implement controls (use implementing-* skills)
- Task is about analysis, not action (use analyzing-* skills)
- You don't have access to target systems
- Task requires compliance expertise (consult professionals)
- Task is about defense, not offense (use defensive skills)
Pseudo Code
# Example workflow for this skill
def execute(input_data):
# Step 1: Validate input
if not input_data:
raise ValueError("Input data is required")
# Step 2: Process core logic
result = process(input_data)
# Step 3: Validate output
validate_output(result)
return result
Dataset Preparation (Hugging Face Format)
from datasets import Dataset
# Instruction format
data = [
{"instruction": "Summarize this text", "input": "Long article...", "output": "Summary..."},
{"instruction": "Translate to French", "input": "Hello world", "output": "Bonjour le monde"},
]
# Chat format (for chat models)
data = [
{"messages": [
{"role": "system", "content": "You are a legal assistant."},
{"role": "user", "content": "What is a contract?"},
{"role": "assistant", "content": "A contract is a legally binding agreement..."}
]},
]
dataset = Dataset.from_list(data)
dataset.push_to_hub("username/my-dataset")
LoRA Fine-Tuning (Hugging Face + PEFT)
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer
# Load base model
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B-Instruct",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
# Configure LoRA
lora_config = LoraConfig(
r=16, # Rank (8-64)
lora_alpha=32, # Alpha scaling
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)
# Training
training_args = TrainingArguments(
output_dir="./output",
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-4,
warmup_steps=100,
logging_steps=10,
save_steps=500,
bf16=True,
optim="paged_adamw_8bit"
)
trainer = SFTTrainer(
model=model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
max_seq_length=2048
)
trainer.train()
trainer.save_model("./final-model")
QLoRA (4-bit Quantized Training)
from transformers import BitsAndBytesConfig
import torch
# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True
)
# Load model in 4-bit
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B-Instruct",
quantization_config=bnb_config,
device_map="auto"
)
# Then apply LoRA as above — uses ~6GB VRAM instead of ~16GB
Merge LoRA Adapter
from peft import PeftModel
# Load base + adapter
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
model = PeftModel.from_pretrained(base_model, "./final-model")
# Merge
merged_model = model.merge_and_unload()
merged_model.save_pretrained("./merged-model")
tokenizer.save_pretrained("./merged-model")
# Push to Hub
merged_model.push_to_hub("username/my-fine-tuned-model")
tokenizer.push_to_hub("username/my-fine-tuned-model")
Evaluate Model
import lm_eval
results = lm_eval.simple_evaluate(
model="hf",
model_args="pretrained=./merged-model",
tasks=["mmlu", "hellaswag", "arc_challenge"],
batch_size=8
)
print(results["results"])
Deploy with Ollama
# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./merged-model
PARAMETER temperature 0.7
PARAMETER top_p 0.9
SYSTEM "You are a specialized legal assistant."
EOF
# Build and run
ollama create my-legal-model -f Modelfile
ollama run my-legal-model
Common Patterns
- Use structured input/output schemas for reliable automation
- Add retry logic with exponential backoff for external calls
- Validate inputs before processing to fail fast
- Log execution steps for debugging and auditing
Training Data Quality Checklist
- 100-1000 examples minimum for LoRA
- Consistent format across all examples
- No duplicate or near-duplicate entries
- Balanced representation of target tasks
- Clean, well-formatted outputs
LoRA Hyperparameter Guide
r=8: Simple tasks, minimal data
r=16: Standard choice, good balance
r=32: Complex tasks, more data
r=64: Maximum capacity, risk of overfitting
learning_rate: 1e-4 to 3e-4 (lower for larger models)
epochs: 1-3 (more risks overfitting on small datasets)
batch_size: As large as VRAM allows
VRAM Requirements (QLoRA 4-bit)
7B: ~6GB VRAM → RTX 3060 12GB works
13B: ~10GB VRAM → RTX 3080 10GB works
70B: ~36GB VRAM → A100 40GB or 2x RTX 4090
How to Use
- Invoke the skill when relevant domain keywords appear in the request
- Provide required inputs as specified in the skill definition
- Review the output for correctness before delivering to the user
- Combine with related skills for complex multi-step workflows
Verification
After completing this skill, confirm:
- Output meets the defined quality and completeness requirements
- All prerequisites are verified and documented
- Error handling covers edge cases
- Results are accurate and actionable
Process
- Analyze the task requirements
- Apply domain expertise
- Verify output quality
Anti-Rationalization
| Rationalization | Reality |
|---|---|
| "I will add monitoring later" | Without monitoring, you cannot detect failures. Add it from day one. |
| "One model is enough" | Different tasks need different models. Route intelligently. |
| "Premature optimization" | Infrastructure decisions are hard to change later. Design for scale early. |