name: sagemaker-llm-training-skill description: > Standard Operating Procedure (SOP) for training and fine-tuning LLMs on Amazon SageMaker. Use when the user wants to: fine-tune, continued pretraining, CPT, preference optimization, DPO, RLHF, LoRA, QLoRA, Spectrum, full fine-tuning, SFT on SageMaker. Also use when choosing between SageMaker Training Jobs vs HyperPod clusters, selecting training containers (DLC), or GPU vs Trainium instances. Generates runnable notebooks or Python scripts. Triggers: "train llm", "fine-tune", "sagemaker training", "lora training", "qlora", "hyperpod training", "trainium training", "dlc container", "training job". argument-hint: "[model-id or 'start']"
SageMaker LLM Training Operator (SOP)
Guide users from intent → executable SageMaker training launcher using official AWS recipes when possible.
Guardrails
- Scope: AWS SageMaker Training only (Training Jobs or HyperPod)
- Produce artifacts: Deliver runnable code, not just advice
- Dynamic info: Fetch current container images and instance availability - do not hardcode versions
- Python version: SageMaker SDK v3 requires Python ≤3.13 (incompatible with 3.14+)
- DLC images source: https://aws.github.io/deep-learning-containers/reference/available_images/ (authoritative)
- Neuron compatibility: https://huggingface.co/docs/optimum-neuron/en/supported_architectures (authoritative)
- Primary recipe source: https://github.com/aws-samples/amazon-sagemaker-generativeai/tree/feature/gpro-rlvr-recipes/0_model_customization_recipes
Wizard Mode (CRITICAL)
Ask ONE question at a time. Wait for response before proceeding.
Use AskUserQuestion for choices. Use plain text for open-ended questions.
Wizard Steps
Step 1: Model
Ask: "Which model do you want to train?"
- Examples:
meta-llama/Llama-3.1-8B-Instruct,Qwen/Qwen2.5-7B-Instruct - Accept: HuggingFace ID or S3 path
→ Wait for response.
Step 2: Region
Ask: "Which AWS region should we use for training?"
- Suggest:
us-east-1,us-west-2(best GPU/Trainium availability) - If user says "default" or doesn't specify, use
us-east-1
→ Wait for response.
Step 3: Training Objective
Use AskUserQuestion:
- Instruction SFT - Fine-tune on instruction-response pairs
- Continued Pretraining (CPT) - Extend training on domain corpus
- Preference Optimization (DPO) - Align with human preferences
→ Wait for response.
Step 4: Code Readiness
Use AskUserQuestion:
- No, I need a recipe - Use AWS sample recipes (recommended)
- Yes, I have my own code - Bring existing training script
→ If No: Go to Step 5A (Recipe path) → If Yes: Go to Step 5B (Custom code path)
Step 5A: Recipe Selection (No existing code)
IMPORTANT: Check what recipes actually exist before offering technique choices.
Fetch available recipes from the primary recipe repo:
https://github.com/aws-samples/amazon-sagemaker-generativeai/tree/feature/gpro-rlvr-recipes/0_model_customization_recipesMatch model family to available recipes:
- Look for folder matching model family (llama, qwen, gemma, phi, deepseek, etc.)
- Note which techniques are available (QLoRA, Spectrum, Full)
Present ONLY available techniques using
AskUserQuestion:- Only show techniques that exist in the repo for this model family
- Example: Gemma has QLoRA and Full, but NOT Spectrum or LoRA
| Model Family | Available Techniques |
|---|---|
| Llama | QLoRA, Spectrum, Full |
| Qwen | QLoRA, Spectrum, Full |
| Gemma | QLoRA, Full |
| Phi | QLoRA, Spectrum, Full |
| DeepSeek | QLoRA, Spectrum, Full |
→ Continue to Step 6.
Step 5B: Custom Code Path (Has existing code)
- Ask: "Path to your training script and requirements.txt?"
- Inspect requirements:
python scripts/inspect_requirements.py <path> - Determine accelerator from dependencies per references/container-selection.md
→ Continue to Step 6.
Step 6: Infrastructure
Use AskUserQuestion:
- SageMaker Training Jobs - Transient, pay-per-use, simpler setup
- SageMaker HyperPod - Persistent cluster, Slurm/EKS orchestration
→ Wait for response.
Step 7: Accelerator
Trainium support is limited. Only these exact architectures are supported for training on Neuron:
| Supported | NOT Supported (variants) |
|---|---|
llama |
llama_vl, mllama, etc. |
qwen3 |
qwen3_vl, qwen2, qwen2_5, etc. |
granite |
granite_vl, etc. |
IMPORTANT: Variants are different architectures! Check the exact model_type value:
qwen3→ Supportedqwen3_vl→ NOT supported (Vision-Language variant)llama→ Supportedmllama→ NOT supported (Multimodal Llama)
Source: https://huggingface.co/docs/optimum-neuron/en/supported_architectures
Decision logic:
Fetch the model's architecture using WebFetch:
WebFetch: https://huggingface.co/<model-id>/raw/main/config.json Prompt: "What is the model_type value in this config?"Example:
https://huggingface.co/aisingapore/Apertus-SEA-LION-v4-8B-IT/raw/main/config.jsonIf
model_typeisllama,qwen3, orgranite→ Offer both GPU and TrainiumOtherwise → Use GPU only (do NOT offer Trainium)
Note: Model names don't always indicate architecture. Always check config.json.
If architecture IS supported - Use AskUserQuestion:
- GPU (NVIDIA) - Broad support, QLoRA compatible
- Trainium - Cost-effective for Llama/Qwen3/Granite (no 4-bit quantization)
If architecture is NOT supported - Inform user:
"This model architecture (
<model_type>) is not supported on Trainium for training. Only Llama, Qwen3, and Granite are currently supported. Using GPU instead."
→ If Trainium + QLoRA selected: Error - QLoRA not supported on Trainium → Wait for response.
Step 8: Dataset
Ask: "Where is your training dataset?"
- Accept: HuggingFace Hub dataset name (e.g.,
databricks/dolly-15k) - Accept: S3 URI with JSONL/Parquet files
- Validate format per references/data-contract.md
→ Wait for response.
Step 9: Context Length
Use AskUserQuestion:
- 4k tokens - Standard, lower memory
- 8k tokens - Balanced
- 16k+ tokens - Long context (higher memory)
→ Wait for response.
Step 10: Instance Sizing
Based on collected info, recommend instances per references/instance-sizing.md.
Get recommendations:
python scripts/fetch_instance_info.py --model-size <B> --technique <tech> --accelerator <gpu|trainium>Check quota availability for recommended instances:
aws service-quotas list-service-quotas --service-code sagemaker --region <region> \ --query "Quotas[?contains(QuotaName, 'training') && contains(QuotaName, '<instance-type>')].{Name:QuotaName, Value:Value}"Present recommendations with quota status:
- Show primary and alternative instances
- Indicate which have quota > 0
- If primary has no quota, suggest requesting increase or using alternative
If no quota in selected region, check other regions:
for region in us-west-2 eu-west-1; do aws service-quotas list-service-quotas --service-code sagemaker --region $region ... done
→ Wait for response.
Step 11: Spot Instances (Cost Optimization)
Use AskUserQuestion:
- Yes, use Spot instances - 50-70% cost savings, may be interrupted
- No, use On-Demand - Higher cost, guaranteed capacity
→ If Spot: Enable checkpointing for fault tolerance → Wait for response.
Step 12: S3 Output Bucket
Ask: "Where should we save the trained model? (S3 bucket or path)"
Options for discovery:
- Ask user for S3 bucket/path
- If user says "default": Use
sagemaker-{region}-{account_id}bucket - Optionally discover existing SageMaker buckets:
aws s3 ls | grep sagemaker - If multiple buckets found, confirm with user
→ Wait for response.
Step 13: Execution Role
Ask: "Which IAM role should SageMaker use? (ARN or 'discover')"
Options for discovery:
- Ask user for role ARN directly
- If user says "discover" or "find": List SageMaker execution roles:
aws iam list-roles --query "Roles[?contains(RoleName, 'SageMaker') || contains(RoleName, 'sagemaker')]" - If multiple roles found, present options and confirm
- Verify role has required permissions (S3, ECR access)
→ Wait for response.
Step 14: Output Format
Use AskUserQuestion:
- Jupyter Notebook - For SageMaker Studio
- Python Script - Standalone launcher, CI/CD friendly
→ Wait for response.
Step 15: Generate Artifacts
Fetch current container images from the official AWS DLC page:
WebFetch URL: https://aws.github.io/deep-learning-containers/reference/available_images/ Prompt: "Find the latest PyTorch training <gpu|neuron> container image URI for SageMaker in <region>"Fallback (may have stale versions):
python scripts/fetch_dlc_images.py --framework pytorch --region <region>Select appropriate template from
templates/:GPU templates:
lora_peft.py- LoRA with PEFTqlora_peft.py- QLoRA (4-bit)sft_trl.py- Full SFTdpo_trl.py- DPOcpt_hf.py- Continued pretraining
Trainium templates:
trainium/lora_neuron.py- LoRA for Trainium (eager attention, no device_map)trainium/sft_neuron.py- Full SFT for Trainium
Launcher:
launch_training_job.py- SDK v3 ModelTrainer launcher
HyperPod:
hyperpod/- HyperPod configs
Generate launcher using references/output-artifacts.md
Run references/checklist.md before delivering
Python Compatibility: SageMaker SDK v3 requires Python ≤3.13 (incompatible with 3.14+)
Reference Files
| File | Use When |
|---|---|
| references/best-practices.md | Default recommendations, technique selection, hyperparameters |
| references/recipe-sources.md | Finding AWS sample recipes |
| references/instance-sizing.md | Selecting instance types |
| references/container-selection.md | Choosing container images |
| references/neuron-validation.md | Verifying Trainium compatibility |
| references/data-contract.md | Dataset format requirements |
| references/output-artifacts.md | Generating final artifacts |
| references/checklist.md | Pre-delivery verification |
Scripts
| Script | Purpose |
|---|---|
scripts/fetch_dlc_images.py |
Get current container images |
scripts/fetch_instance_info.py |
Get instance specs and recommendations |
scripts/inspect_requirements.py |
Analyze user's requirements.txt |
scripts/validate_dataset.py |
Validate dataset format |
Templates
| Template | Training Method | Accelerator |
|---|---|---|
templates/lora_peft.py |
LoRA with PEFT | GPU |
templates/qlora_peft.py |
QLoRA (4-bit) | GPU |
templates/sft_trl.py |
Full SFT with TRL | GPU |
templates/dpo_trl.py |
DPO preference training | GPU |
templates/cpt_hf.py |
Continued pretraining | GPU |
templates/trainium/lora_neuron.py |
LoRA for Neuron SDK | Trainium |
templates/trainium/sft_neuron.py |
Full SFT for Neuron SDK | Trainium |
templates/launch_training_job.py |
SDK v3 ModelTrainer launcher | Both |
templates/hyperpod/ |
HyperPod recipes | Both |
Note: Trainium templates use attn_implementation="eager" and no device_map="auto" for Neuron compatibility.