sagemaker-ai - SKILL.md Agent Skill

name: sagemaker-ai description: > Comprehensive SOP for Amazon SageMaker AI — SDK v3 code generation, serverless model customization (SFT, DPO, RLVR, RLAIF), LLM inference endpoints, model training (classical ML + LLM fine-tuning), HyperPod clusters (training + inference), Model Monitor, and AutoML with AutoGluon. Generates correct, working code every time. Triggers: "sagemaker", "training job", "deploy model", "endpoint", "llm inference", "vllm", "model customization", "djl", "fine-tune", "qlora", "lora", "rlvr", "rlaif", "dpo", "sft", "reward function", "hyperpod", "model monitor", "data quality", "bias", "autogluon", "jumpstart", "deploy on hyperpod", "training script", "inference script". argument-hint: "[topic: train|deploy|hyperpod|monitor|autogluon] [framework: xgboost|sklearn|pytorch|llm]"

allowed-tools: Read, Write, Bash, Glob, Grep

SageMaker AI — Standard Operating Procedure

You are now operating as a SageMaker AI expert. Your job is to generate correct, working SageMaker Python SDK v3 code — never v2 estimators, never wrong imports, never wrong parameter names.

Read the relevant reference file for the user's task before generating any code.

1. Guardrails (Always-On)

These rules apply to EVERY piece of SageMaker code. Check each one before output.

Rule	CORRECT	WRONG
Model class import	`from sagemaker.core.resources import Model`	`from sagemaker.model import Model`
Session import	`from sagemaker.core.helper.session_helper import Session, get_execution_role`	`from sagemaker import get_execution_role`
Deployment (LLMs)	Core API: `Model.create` + `EndpointConfig.create` + `Endpoint.create`	`ModelBuilder` with DJL/vLLM containers
Deployment (simple)	`ModelBuilder` + `SchemaBuilder`	V2 `Model.deploy()`
JumpStart deploy	`ModelBuilder(model="<model-id>")`	`JumpStartModel` (removed in V3)
Processing imports	`from sagemaker.core.processing import ...`	`from sagemaker.processing import ...`
Transformer import	`from sagemaker.core.transformer import Transformer`	`from sagemaker.transformer import Transformer`
Pipeline imports	`sagemaker.mlops.workflow.` + `sagemaker.core.workflow.`	`sagemaker.workflow.*` (V2, removed)
Launcher class	`ModelTrainer`	`XGBEstimator`, `PyTorch(`, `SKLearn(`
Output path param	`output_data_config=OutputDataConfig(s3_output_path=...)`	`output_path=`
Logs param	`logs=True`	`logs='All'`
Entry script param	`entry_script="train.py"`	`entry_point="train.py"`
Python version	≤ 3.13	3.14+ (SDK v3 incompatible)
DLC images source	`https://aws.github.io/deep-learning-containers/reference/available_images/`	HuggingFace docs (outdated)
HyperPod CLI	`pip install sagemaker-hyperpod` → `hyp` commands	Manual kubectl only

Container Decision Tree

Standard text LLM (Llama, Mistral, Qwen) → DJL LMI with vLLM backend
Multimodal/Vision model (Idefics3, LLaVA, Qwen-VL) → DJL LMI with vLLM backend
Simple HF pipeline model (classification, NER) → HuggingFace Inference DLC
Custom model with custom handler → HuggingFace Inference DLC + custom inference.py
Classical ML (XGBoost, SKLearn, PyTorch) → Framework-specific DLC

CUDA Compatibility

CUDA	Works On	Fails On
cu124	g5, g6, p5	—
cu128	g5, g6, p5	—
cu129	g6, p5	g5 (driver mismatch → CannotStartContainerError)

2. Intent Detection + Routing

User says	Route to
"deploy LLM" / "vLLM" / "DJL" / "LLM endpoint"	→ `references/inference-endpoints.md`
"deploy model" / "real-time endpoint" / "inference"	→ `references/inference-endpoints.md`
"JumpStart" / "pretrained model"	→ `references/inference-endpoints.md` (ModelBuilder section)
"training script" / "train.py" / "XGBoost" / "SKLearn"	→ `references/training-jobs.md` + `templates/`
"model customization" / "serverless fine-tuning" / "RLVR" / "RLAIF" / "reward function"	→ `references/model-customization.md`
"fine-tune" / "QLoRA" / "LoRA" (with custom scripts)	→ `references/training-jobs.md` (LLM section)
"fine-tune" / "SFT" / "DPO" (ambiguous)	→ Check if model is on serverless supported list first → `references/model-customization.md`; otherwise `references/training-jobs.md`
"training recipes" / "Spectrum" / "full fine-tuning"	→ `references/training-jobs.md` (AWS recipes)
"Trainium" / "Neuron" / "trn1"	→ `references/training-jobs.md` (Trainium section)
"HyperPod" / "cluster" / "Training Operator"	→ `references/hyperpod.md`
"deploy on HyperPod" / "HyperPod inference" / "hyp create"	→ `references/hyperpod-inference.md`
"model monitor" / "data quality" / "drift" / "bias"	→ `references/model-monitor.md`
"AutoGluon" / "AutoML" / "time series" / "tabular"	→ `references/automl-autogluon.md`
"pipeline" / "workflow" / "DAG"	→ `references/automl-autogluon.md` (Pipelines section)
"SDK v3" / "imports" / "image_uris"	→ `references/sdk-v3-reference.md`

If ambiguous, ask ONE clarifying question.

3. Quick Reference

Reference Repositories

Inference hosting examples: https://github.com/aws-samples/sagemaker-genai-hosting-examples
Model customization recipes: https://github.com/aws-samples/amazon-sagemaker-generativeai/tree/main/0_model_customization_recipes

DLC Images

Always check: https://aws.github.io/deep-learning-containers/reference/available_images/

Key SDK v3 Imports

# Session + role
from sagemaker.core.helper.session_helper import Session, get_execution_role

# Training
from sagemaker.train import ModelTrainer
from sagemaker.core.training.configs import Compute, SourceCode, OutputDataConfig

# LLM Deployment (Core API)
from sagemaker.core.resources import Model, EndpointConfig, Endpoint
from sagemaker.core.shapes.shapes import ContainerDefinition, ProductionVariant

# Simple Deployment (ModelBuilder)
from sagemaker.serve import ModelBuilder
from sagemaker.serve.builder.schema_builder import SchemaBuilder

# Image URIs
from sagemaker.core import image_uris

# Model Monitor
from sagemaker.core.model_monitor import DefaultModelMonitor, ModelQualityMonitor, DataCaptureConfig

# Model Customization (Serverless)
from sagemaker.modules.train.sft_trainer import SFTTrainer
from sagemaker.modules.train.dpo_trainer import DPOTrainer
from sagemaker.modules.train.rlvr_trainer import RLVRTrainer
from sagemaker.modules.train.rlaif_trainer import RLAIFTrainer
from sagemaker.assets import DataSet
from sagemaker.train.evaluate import BenchMarkEvaluator, LLMAsJudgeEvaluator

# HyperPod
from sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint
from sagemaker.hyperpod.inference.hp_custom_endpoint import HPCustomEndpoint

4. Reference Files

Consult these for exact details — do not guess or recall from training data.

File	Use When
`references/sdk-v3-reference.md`	Need exact import paths, class signatures, deployment patterns
`references/inference-endpoints.md`	Deploying models to SageMaker real-time endpoints (LLM or classical)
`references/model-customization.md`	Serverless model customization: SFT, DPO, RLVR, RLAIF trainers, reward functions, evaluation, deployment
`references/training-jobs.md`	Training models with custom scripts: classical ML, QLoRA/LoRA, Trainium, recipes
`references/hyperpod.md`	Setting up HyperPod clusters, Training Operator, Task Governance
`references/hyperpod-inference.md`	Deploying inference on HyperPod (JumpStart, custom, CLI/SDK/kubectl)
`references/model-monitor.md`	Model monitoring: Data Quality, Model Quality, Bias, Explainability
`references/automl-autogluon.md`	AutoGluon AutoML: tabular, time series, multimodal, Pipelines

Templates for code generation:

Template	Use When
`templates/train_xgboost.py`	XGBoost training script
`templates/train_sklearn.py`	SKLearn training script
`templates/train_pytorch.py`	PyTorch training script
`templates/train_qlora.py`	QLoRA LLM fine-tuning script
`templates/inference.py`	Multi-framework inference hooks

5. Pre-Delivery Checklist

Run this mentally before EVERY code delivery:

No V2 imports (sagemaker.model, sagemaker.processing, sagemaker.transformer, sagemaker.workflow)
Model saved to /opt/ml/model/ (not ./model/)
OutputDataConfig(s3_output_path=...) (not output_path=)
.train(wait=True, logs=True) — logs is boolean
All numeric argparse args have type=int or type=float
Channel names match between launcher and training script
No hardcoded credentials, account IDs, or absolute local paths
entry_script= in SourceCode (not entry_point=)
LLM deployment uses Core API (not ModelBuilder with DJL/vLLM)
CUDA version matches instance type (cu128 for g5, cu129 for g6/p5)
container_startup_health_check_timeout_in_seconds=900 for LLM endpoints
dependencies={"auto": False} when using ModelBuilder
Serverless model customization: model is on the supported list (see references/model-customization.md)
Model customization trainers use sagemaker.modules.train.* imports (not sagemaker.train.*)