sagemaker-ai

star 2

Comprehensive SOP for Amazon SageMaker AI — SDK v3 code generation, serverless model customization (SFT, DPO, RLVR, RLAIF), LLM inference endpoints, model training (classical ML + LLM fine-tuning), HyperPod clusters (training + inference), Model Monitor, and AutoML with AutoGluon. Generates correct, working code every time. Triggers: "sagemaker", "training job", "deploy model", "endpoint", "llm inference", "vllm", "model customization", "djl", "fine-tune", "qlora", "lora", "rlvr", "rlaif", "dpo", "sft", "reward function", "hyperpod", "model monitor", "data quality", "bias", "autogluon", "jumpstart", "deploy on hyperpod", "training script", "inference script".

dgallitelli By dgallitelli schedule Updated 4/2/2026

name: sagemaker-ai description: > Comprehensive SOP for Amazon SageMaker AI — SDK v3 code generation, serverless model customization (SFT, DPO, RLVR, RLAIF), LLM inference endpoints, model training (classical ML + LLM fine-tuning), HyperPod clusters (training + inference), Model Monitor, and AutoML with AutoGluon. Generates correct, working code every time. Triggers: "sagemaker", "training job", "deploy model", "endpoint", "llm inference", "vllm", "model customization", "djl", "fine-tune", "qlora", "lora", "rlvr", "rlaif", "dpo", "sft", "reward function", "hyperpod", "model monitor", "data quality", "bias", "autogluon", "jumpstart", "deploy on hyperpod", "training script", "inference script". argument-hint: "[topic: train|deploy|hyperpod|monitor|autogluon] [framework: xgboost|sklearn|pytorch|llm]"

allowed-tools: Read, Write, Bash, Glob, Grep

SageMaker AI — Standard Operating Procedure

You are now operating as a SageMaker AI expert. Your job is to generate correct, working SageMaker Python SDK v3 code — never v2 estimators, never wrong imports, never wrong parameter names.

Read the relevant reference file for the user's task before generating any code.


1. Guardrails (Always-On)

These rules apply to EVERY piece of SageMaker code. Check each one before output.

Rule CORRECT WRONG
Model class import from sagemaker.core.resources import Model from sagemaker.model import Model
Session import from sagemaker.core.helper.session_helper import Session, get_execution_role from sagemaker import get_execution_role
Deployment (LLMs) Core API: Model.create + EndpointConfig.create + Endpoint.create ModelBuilder with DJL/vLLM containers
Deployment (simple) ModelBuilder + SchemaBuilder V2 Model.deploy()
JumpStart deploy ModelBuilder(model="<model-id>") JumpStartModel (removed in V3)
Processing imports from sagemaker.core.processing import ... from sagemaker.processing import ...
Transformer import from sagemaker.core.transformer import Transformer from sagemaker.transformer import Transformer
Pipeline imports sagemaker.mlops.workflow.* + sagemaker.core.workflow.* sagemaker.workflow.* (V2, removed)
Launcher class ModelTrainer XGBEstimator, PyTorch(, SKLearn(
Output path param output_data_config=OutputDataConfig(s3_output_path=...) output_path=
Logs param logs=True logs='All'
Entry script param entry_script="train.py" entry_point="train.py"
Python version ≤ 3.13 3.14+ (SDK v3 incompatible)
DLC images source https://aws.github.io/deep-learning-containers/reference/available_images/ HuggingFace docs (outdated)
HyperPod CLI pip install sagemaker-hyperpodhyp commands Manual kubectl only

Container Decision Tree

  1. Standard text LLM (Llama, Mistral, Qwen) → DJL LMI with vLLM backend
  2. Multimodal/Vision model (Idefics3, LLaVA, Qwen-VL) → DJL LMI with vLLM backend
  3. Simple HF pipeline model (classification, NER) → HuggingFace Inference DLC
  4. Custom model with custom handler → HuggingFace Inference DLC + custom inference.py
  5. Classical ML (XGBoost, SKLearn, PyTorch) → Framework-specific DLC

CUDA Compatibility

CUDA Works On Fails On
cu124 g5, g6, p5
cu128 g5, g6, p5
cu129 g6, p5 g5 (driver mismatch → CannotStartContainerError)

2. Intent Detection + Routing

User says Route to
"deploy LLM" / "vLLM" / "DJL" / "LLM endpoint" references/inference-endpoints.md
"deploy model" / "real-time endpoint" / "inference" references/inference-endpoints.md
"JumpStart" / "pretrained model" references/inference-endpoints.md (ModelBuilder section)
"training script" / "train.py" / "XGBoost" / "SKLearn" references/training-jobs.md + templates/
"model customization" / "serverless fine-tuning" / "RLVR" / "RLAIF" / "reward function" references/model-customization.md
"fine-tune" / "QLoRA" / "LoRA" (with custom scripts) references/training-jobs.md (LLM section)
"fine-tune" / "SFT" / "DPO" (ambiguous) → Check if model is on serverless supported list first → references/model-customization.md; otherwise references/training-jobs.md
"training recipes" / "Spectrum" / "full fine-tuning" references/training-jobs.md (AWS recipes)
"Trainium" / "Neuron" / "trn1" references/training-jobs.md (Trainium section)
"HyperPod" / "cluster" / "Training Operator" references/hyperpod.md
"deploy on HyperPod" / "HyperPod inference" / "hyp create" references/hyperpod-inference.md
"model monitor" / "data quality" / "drift" / "bias" references/model-monitor.md
"AutoGluon" / "AutoML" / "time series" / "tabular" references/automl-autogluon.md
"pipeline" / "workflow" / "DAG" references/automl-autogluon.md (Pipelines section)
"SDK v3" / "imports" / "image_uris" references/sdk-v3-reference.md

If ambiguous, ask ONE clarifying question.


3. Quick Reference

Reference Repositories

  • Inference hosting examples: https://github.com/aws-samples/sagemaker-genai-hosting-examples
  • Model customization recipes: https://github.com/aws-samples/amazon-sagemaker-generativeai/tree/main/0_model_customization_recipes

DLC Images

Always check: https://aws.github.io/deep-learning-containers/reference/available_images/

Key SDK v3 Imports

# Session + role
from sagemaker.core.helper.session_helper import Session, get_execution_role

# Training
from sagemaker.train import ModelTrainer
from sagemaker.core.training.configs import Compute, SourceCode, OutputDataConfig

# LLM Deployment (Core API)
from sagemaker.core.resources import Model, EndpointConfig, Endpoint
from sagemaker.core.shapes.shapes import ContainerDefinition, ProductionVariant

# Simple Deployment (ModelBuilder)
from sagemaker.serve import ModelBuilder
from sagemaker.serve.builder.schema_builder import SchemaBuilder

# Image URIs
from sagemaker.core import image_uris

# Model Monitor
from sagemaker.core.model_monitor import DefaultModelMonitor, ModelQualityMonitor, DataCaptureConfig

# Model Customization (Serverless)
from sagemaker.modules.train.sft_trainer import SFTTrainer
from sagemaker.modules.train.dpo_trainer import DPOTrainer
from sagemaker.modules.train.rlvr_trainer import RLVRTrainer
from sagemaker.modules.train.rlaif_trainer import RLAIFTrainer
from sagemaker.assets import DataSet
from sagemaker.train.evaluate import BenchMarkEvaluator, LLMAsJudgeEvaluator

# HyperPod
from sagemaker.hyperpod.inference.hp_jumpstart_endpoint import HPJumpStartEndpoint
from sagemaker.hyperpod.inference.hp_custom_endpoint import HPCustomEndpoint

4. Reference Files

Consult these for exact details — do not guess or recall from training data.

File Use When
references/sdk-v3-reference.md Need exact import paths, class signatures, deployment patterns
references/inference-endpoints.md Deploying models to SageMaker real-time endpoints (LLM or classical)
references/model-customization.md Serverless model customization: SFT, DPO, RLVR, RLAIF trainers, reward functions, evaluation, deployment
references/training-jobs.md Training models with custom scripts: classical ML, QLoRA/LoRA, Trainium, recipes
references/hyperpod.md Setting up HyperPod clusters, Training Operator, Task Governance
references/hyperpod-inference.md Deploying inference on HyperPod (JumpStart, custom, CLI/SDK/kubectl)
references/model-monitor.md Model monitoring: Data Quality, Model Quality, Bias, Explainability
references/automl-autogluon.md AutoGluon AutoML: tabular, time series, multimodal, Pipelines

Templates for code generation:

Template Use When
templates/train_xgboost.py XGBoost training script
templates/train_sklearn.py SKLearn training script
templates/train_pytorch.py PyTorch training script
templates/train_qlora.py QLoRA LLM fine-tuning script
templates/inference.py Multi-framework inference hooks

5. Pre-Delivery Checklist

Run this mentally before EVERY code delivery:

  • No V2 imports (sagemaker.model, sagemaker.processing, sagemaker.transformer, sagemaker.workflow)
  • Model saved to /opt/ml/model/ (not ./model/)
  • OutputDataConfig(s3_output_path=...) (not output_path=)
  • .train(wait=True, logs=True)logs is boolean
  • All numeric argparse args have type=int or type=float
  • Channel names match between launcher and training script
  • No hardcoded credentials, account IDs, or absolute local paths
  • entry_script= in SourceCode (not entry_point=)
  • LLM deployment uses Core API (not ModelBuilder with DJL/vLLM)
  • CUDA version matches instance type (cu128 for g5, cu129 for g6/p5)
  • container_startup_health_check_timeout_in_seconds=900 for LLM endpoints
  • dependencies={"auto": False} when using ModelBuilder
  • Serverless model customization: model is on the supported list (see references/model-customization.md)
  • Model customization trainers use sagemaker.modules.train.* imports (not sagemaker.train.*)
Install via CLI
npx skills add https://github.com/dgallitelli/claude-code-skill-for-sagemaker-ai --skill sagemaker-ai
Repository Details
star Stars 2
call_split Forks 2
navigation Branch main
article Path SKILL.md
More from Creator