name: azure-ml-llm-trainer description: Train or fine-tune LLMs on Azure ML managed compute with TRL trainers. Uses direct trainer loops (SFT, DPO, RL) without relying on serverless APIs or Hugging Face infrastructure. license: See repository root
Azure ML LLM Trainer
This skill provides direct training on Azure ML managed compute using TRL trainers—an alternative to Azure AI Foundry's serverless fine-tuning APIs.
Four fine-tuning options in Azure AI Foundry:
- Serverless API (Foundry models) — Use
create_finetuning_job()for Phi, Mistral; no compute setup needed - OpenAI API (OpenAI models) — Use OpenAI SDK with Azure endpoint for GPT-4o, GPT-4 Turbo
- Managed Compute (Portal UI) — Web UI–driven fine-tuning with automatic compute provisioning; limited SDK
- Direct Training (This Skill) — Run TRL trainers on your own Azure ML compute for full control and transparency
Use this skill when:
- You need full control over training loops and hyperparameters
- You want to use TRL (Transformer Reinforcement Learning) methods directly
- You prefer running on your own compute resources (no vendor lock-in)
- You want to experiment with advanced training techniques (LoRA, gradient checkpointing, etc.)
Template Files
These are templates in examples/ directory. Generate new files in your project based on these templates:
examples/submit_sft_job.py— Template for submitting SFT training jobsexamples/src/train_sft.py— Template for SFT trainer entry point (TRL SFTTrainer)examples/submit_dpo_job.py— Template for DPO training job submissionexamples/src/train_dpo.py— Template for DPO trainer entry point (TRL DPOTrainer)examples/submit_rl_job.py— Template for RL/PPO training job submissionexamples/src/train_rl.py— Template for RL trainer entry point (TRL PPOTrainer)examples/environment/conda.yml— Template for runtime dependencies (transformers, trl, datasets, torch)
Do NOT reference these files directly. Copy and adapt them for your project structure.
Quick start
- az login then set AZURE_SUBSCRIPTION_ID, AZURE_RESOURCE_GROUP, AZUREML_WORKSPACE_NAME.
- Create training files in your project:
- Copy
examples/submit_sft_job.pyto your project assubmit_training.py - Copy
examples/src/train_sft.pyto your project assrc/train_sft.py - Copy
examples/environment/conda.ymlto your project asenvironment/conda.yml
- Copy
- Upload a JSONL dataset to a workspace datastore (workspaceblobstore or your own). SFT dataset format: Each line must be valid JSON with a
"messages"field containing chat-completion format:{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}. The trainer uses this field directly. - Ensure a compute target exists (GPU recommended, for example gpu-cluster).
- Submit:
python submit_training.py --compute <compute-name> --data-path <azureml://.../dataset.jsonl> --model-name azureml://registries/azureml/models/Phi-3-mini-4k-instruct/versions/1. - Monitor in Azure ML studio; trained weights land in the job output folder.
DPO quick start
- Dataset format (JSONL): Each line must contain
"chosen"and"rejected"fields with chat-completion format messages:{"chosen": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}], "rejected": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}. - Hyperparameters:
beta(default 0.1) controls KL penalty,l2_multiplier(default 0.1) for regularization. - Submit:
python sample/submit_dpo_job.py --compute <compute-name> --data-path <azureml://.../dpo.jsonl> --model-name azureml://registries/azureml/models/Phi-3-mini-4k-instruct/versions/1 --beta 0.1 --l2_multiplier 0.1.
RL (PPO-style) quick start
- Dataset format (JSONL): Each line must contain a
"prompt"field (string) and optional"reward"(float) for explicit reward signals. If reward is missing, length-based reward shaping is used as fallback:{"prompt": "user message", "reward": 0.5}. - Submit:
python sample/submit_rl_job.py --compute <compute-name> --data-path <azureml://.../rl.jsonl> --model-name azureml://registries/azureml/models/Phi-3-mini-4k-instruct/versions/1.
Notes
- Why direct training? Serverless APIs abstract away training details; direct training gives you full control over trainer config, callbacks, checkpointing, and custom loss functions.
- Model source: Use fine-tuning-enabled base models from Azure AI Foundry model catalog (e.g.,
azureml://registries/azureml/models/Phi-3-mini-4k-instruct/versions/1). Avoid Hugging Face downloads. - Hyperparameters:
- SFT:
batch_size,learning_rate(default 2e-5),n_epochs(default 1),seed - DPO: Add
beta(KL penalty, default 0.1),l2_multiplier(regularization, default 0.1) - RL/PPO:
ppo_epochs,learning_rate, reward shaping via custom logic
- SFT:
- Data: Must be in Azure ML datastores as JSONL; referenced via
azureml://URIs. Keep datasets in Azure; do not rely on external sources. - Artifacts: Trained models saved to job output folder; register as Azure AI Foundry model for deployment or further fine-tuning.
When to Use This vs Other Fine-Tuning Methods
| Criterion | Direct Training (This Skill) | Serverless API | Managed Compute | OpenAI API |
|---|---|---|---|---|
| Control | Full (trainer config, callbacks) | Limited | UI-based | Limited |
| Cost Model | Per compute hour | Per training tokens | Per training tokens | Per training tokens |
| Setup | Requires compute cluster | Automatic | Automatic | N/A (Azure OpenAI) |
| Supported Methods | SFT, DPO, RL/PPO (TRL) | SFT (mostly) | SFT | SFT, DPO, RL with graders |
| SDK/Programmatic | Yes (full MLClient) | Yes (Python) | Minimal (mostly UI) | Yes (OpenAI SDK) |
| Best for | Experimentation, research, custom loss | Production quick-start | Production (non-devs) | Production OpenAI models |