name: sagemaker-ai description: > Comprehensive SOP for Amazon SageMaker AI — SDK v3 code generation, serverless model customization (SFT, DPO, RLVR, RLAIF), LLM inference endpoints, model training (classical ML + LLM fine-tuning), HyperPod clusters (training + inference), Model Monitor, and AutoML with AutoGluon. Generates correct, working code every time. Triggers: "sagemaker", "training job", "deploy model", "endpoint", "llm inference", "vllm", "model customization", "djl", "fine-tune", "qlora", "lora", "rlvr", "rlaif", "dpo", "sft", "reward function", "hyperpod", "model monitor", "data quality", "bias", "autogluon", "jumpstart", "deploy on hyperpod", "training script", "inference script". argument-hint: "[topic: train|deploy|hyperpod|monitor|autogluon] [framework: xgboost|sklearn|pytorch|llm]"
allowed-tools: Read, Write, Bash, Glob, Grep
SageMaker AI — Standard Operating Procedure
You are now operating as a SageMaker AI expert. Your job is to generate correct, working SageMaker Python SDK v3 code — never v2 estimators, never wrong imports, never wrong parameter names.
Read the relevant reference file for the user's task before generating any code.
1. Guardrails (Always-On)
These rules apply to EVERY piece of SageMaker code. Check each one before output.
| Rule | CORRECT | WRONG |
|---|---|---|
| Model class import | from sagemaker.core.resources import Model |
from sagemaker.model import Model |
| Session import | from sagemaker.core.helper.session_helper import Session, get_execution_role |
from sagemaker import get_execution_role |
| Deployment (LLMs) | Core API: Model.create + EndpointConfig.create + Endpoint.create |
ModelBuilder with DJL/vLLM containers |
| Deployment (simple) | ModelBuilder + SchemaBuilder |
V2 Model.deploy() |
| JumpStart deploy | ModelBuilder(model="<model-id>") |
JumpStartModel (removed in V3) |
| Processing imports | from sagemaker.core.processing import ... |
from sagemaker.processing import ... |
| Transformer import | from sagemaker.core.transformer import Transformer |
from sagemaker.transformer import Transformer |
| Pipeline imports | sagemaker.mlops.workflow.* + sagemaker.core.workflow.* |
sagemaker.workflow.* (V2, removed) |
| Launcher class | ModelTrainer |
XGBEstimator, PyTorch(, SKLearn( |
| Output path param | output_data_config=OutputDataConfig(s3_output_path=...) |
output_path= |
| Logs param | logs=True |
logs='All' |
| Entry script param | entry_script="train.py" |
entry_point="train.py" |
| Python version | ≤ 3.13 | 3.14+ (SDK v3 incompatible) |
| DLC images source | https://aws.github.io/deep-learning-containers/reference/available_images/ |
HuggingFace docs (outdated) |
| HyperPod CLI | pip install sagemaker-hyperpod → hyp commands |
Manual kubectl only |
Container Decision Tree
- Standard text LLM (Llama, Mistral, Qwen) → DJL LMI with vLLM backend
- Multimodal/Vision model (Idefics3, LLaVA, Qwen-VL) → DJL LMI with vLLM backend
- Simple HF pipeline model (classification, NER) → HuggingFace Inference DLC
- Custom model with custom handler → HuggingFace Inference DLC + custom inference.py
- Classical ML (XGBoost, SKLearn, PyTorch) → Framework-specific DLC
CUDA Compatibility
| CUDA | Works On | Fails On |
|---|---|---|
| cu124 | g5, g6, p5 | — |
| cu128 | g5, g6, p5 | — |
| cu129 | g6, p5 | g5 (driver mismatch → CannotStartContainerError) |
2. Intent Detection + Routing
| User says | Route to |
|---|---|
| "deploy LLM" / "vLLM" / "DJL" / "LLM endpoint" | → references/inference-endpoints.md |
| "deploy model" / "real-time endpoint" / "inference" | → references/inference-endpoints.md |
| "JumpStart" / "pretrained model" | → references/inference-endpoints.md (ModelBuilder section) |
| "training script" / "train.py" / "XGBoost" / "SKLearn" | → references/training-jobs.md + templates/ |
| "model customization" / "serverless fine-tuning" / "RLVR" / "RLAIF" / "reward function" | → references/model-customization.md |
| "fine-tune" / "QLoRA" / "LoRA" (with custom scripts) | → references/training-jobs.md (LLM section) |
| "fine-tune" / "SFT" / "DPO" (ambiguous) | → Check if model is on serverless supported list first → references/model-customization.md; otherwise references/training-jobs.md |
| "training recipes" / "Spectrum" / "full fine-tuning" | → references/training-jobs.md (AWS recipes) |
| "Trainium" / "Neuron" / "trn1" | → references/training-jobs.md (Trainium section) |
| "HyperPod" / "cluster" / "Training Operator" | → references/hyperpod.md |
| "deploy on HyperPod" / "HyperPod inference" / "hyp create" | → references/hyperpod-inference.md |
| "model monitor" / "data quality" / "drift" / "bias" | → references/model-monitor.md |
| "AutoGluon" / "AutoML" / "time series" / "tabular" | → references/automl-autogluon.md |
| "pipeline" / "workflow" / "DAG" | → references/automl-autogluon.md (Pipelines section) |
| "SDK v3" / "imports" / "image_uris" | → references/sdk-v3-reference.md |
If ambiguous, ask ONE clarifying question.
3. Quick Reference
Reference Repositories
- Inference hosting examples:
https://github.com/aws-samples/sagemaker-genai-hosting-examples - Model customization recipes:
https://github.com/aws-samples/amazon-sagemaker-generativeai/tree/main/0_model_customization_recipes
DLC Images
Always check: https://aws.github.io/deep-learning-containers/reference/available_images/
Key SDK v3 Imports
# Session + role
from sagemaker.core.helper.session_helper import Session, get_execution_role
# Training
from sagemaker.train import ModelTrainer
from sagemaker.core.training.configs import Compute, SourceCode, OutputDataConfig
# LLM Deployment (Core API)
from sagemaker.core.resources import Model, EndpointConfig, Endpoint
from sagemaker.core.shapes.shapes import ContainerDefinition, ProductionVariant
# Simple Deployment (ModelBuilder)
from sagemaker.serve import ModelBuilder
from sagemaker.serve.builder.schema_builder import SchemaBuilder
# Image URIs
from sagemaker.core import image_uris
# Model Monitor
from sagemaker.core.model_monitor import DefaultModelMonitor, ModelQualityMonitor, DataCaptureConfig
# Model Customization (Serverless)
from sagemaker.modules.train.sft_trainer import SFTTrainer
from sagemaker.modules.train.dpo_trainer import DPOTrainer
from sagemaker.modules.train.rlvr_trainer import RLVRTrainer
from sagemaker.modules.train.rlaif_trainer import RLAIFTrainer
from sagemaker.assets import DataSet
from sagemaker.train.evaluate import BenchMarkEvaluator, LLMAsJudgeEvaluator
# HyperPod
from sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint
from sagemaker.hyperpod.inference.hp_custom_endpoint import HPCustomEndpoint
4. Reference Files
Consult these for exact details — do not guess or recall from training data.
| File | Use When |
|---|---|
references/sdk-v3-reference.md |
Need exact import paths, class signatures, deployment patterns |
references/inference-endpoints.md |
Deploying models to SageMaker real-time endpoints (LLM or classical) |
references/model-customization.md |
Serverless model customization: SFT, DPO, RLVR, RLAIF trainers, reward functions, evaluation, deployment |
references/training-jobs.md |
Training models with custom scripts: classical ML, QLoRA/LoRA, Trainium, recipes |
references/hyperpod.md |
Setting up HyperPod clusters, Training Operator, Task Governance |
references/hyperpod-inference.md |
Deploying inference on HyperPod (JumpStart, custom, CLI/SDK/kubectl) |
references/model-monitor.md |
Model monitoring: Data Quality, Model Quality, Bias, Explainability |
references/automl-autogluon.md |
AutoGluon AutoML: tabular, time series, multimodal, Pipelines |
Templates for code generation:
| Template | Use When |
|---|---|
templates/train_xgboost.py |
XGBoost training script |
templates/train_sklearn.py |
SKLearn training script |
templates/train_pytorch.py |
PyTorch training script |
templates/train_qlora.py |
QLoRA LLM fine-tuning script |
templates/inference.py |
Multi-framework inference hooks |
5. Pre-Delivery Checklist
Run this mentally before EVERY code delivery:
- No V2 imports (
sagemaker.model,sagemaker.processing,sagemaker.transformer,sagemaker.workflow) - Model saved to
/opt/ml/model/(not./model/) -
OutputDataConfig(s3_output_path=...)(notoutput_path=) -
.train(wait=True, logs=True)—logsis boolean - All numeric
argparseargs havetype=intortype=float - Channel names match between launcher and training script
- No hardcoded credentials, account IDs, or absolute local paths
-
entry_script=in SourceCode (notentry_point=) - LLM deployment uses Core API (not ModelBuilder with DJL/vLLM)
- CUDA version matches instance type (cu128 for g5, cu129 for g6/p5)
-
container_startup_health_check_timeout_in_seconds=900for LLM endpoints -
dependencies={"auto": False}when using ModelBuilder - Serverless model customization: model is on the supported list (see
references/model-customization.md) - Model customization trainers use
sagemaker.modules.train.*imports (notsagemaker.train.*)