aoai-model-migration - SKILL.md Agent Skill

name: aoai-model-migration description: > Migrate Azure OpenAI applications from GPT-4o/GPT-4o-mini to newer models (GPT-4.1, GPT-5, GPT-5.1 through GPT-5.4, o-series). Covers API changes, client configuration, parameter adaptation, prompt adjustments, and authentication. USE FOR: migrate model, switch model, upgrade model, GPT-4o replacement, AzureOpenAI to OpenAI client, v1 API, max_completion_tokens, reasoning_effort, developer role, system role, parameter adaptation, client factory, model classification. DO NOT USE FOR: retirement dates or lifecycle planning (use aoai-model-lifecycle), evaluation or A/B testing (use aoai-migration-evaluation).

Azure OpenAI Model Migration Skill

⚠️ Retirement dates and model availability change frequently. Always verify against the official Azure OpenAI Model Retirements page.

Purpose

Guide developers through migrating Azure OpenAI applications from GPT-4o / GPT-4o-mini to newer model families (GPT-4.1, GPT-5, GPT-5.1, GPT-5.2) and o-series reasoning models (o1 → o3, o3-mini → o4-mini). This skill covers API surface changes, client configuration, parameter adaptation, and prompt adjustments.

When to Use

Migrating from GPT-4o or GPT-4o-mini to any newer Azure OpenAI model
Migrating o-series models (o1 → o3, o3-mini → o4-mini)
Adapting code to the new v1 API (/openai/v1/) used by GPT-4.1+ and GPT-5+
Adapting parameters and system prompts for reasoning models (GPT-5, GPT-5.1, GPT-5.2, o-series)
Choosing the right replacement model for a given workload

Migration Paths

GPT Series

Source Model	Target Model	Type	Best For
GPT-4o / GPT-4.1	GPT-5.4-mini	Reasoning	Recommended — comparable quality at lower cost/latency (tier-down strategy)
GPT-4o-mini / GPT-4.1-mini	GPT-5.4-nano	Reasoning	Recommended — comparable quality at a fraction of the cost
GPT-4o	GPT-5.1	Reasoning	Official auto-migration target (Standard deployments, completed March 2026)
GPT-4o	GPT-5.4	Reasoning	Best overall quality (Mar 2026), longest runway
GPT-4o-mini	GPT-4.1-mini	Standard	Official auto-migration target (Standard deployments)

💡 Tier-down strategy: Newer-generation smaller models match or exceed older-generation larger ones with better latency and lower cost. Target GPT-5.4-mini instead of GPT-4.1/GPT-5, and GPT-5.4-nano instead of GPT-4.1-mini — longer runway (Sep 2027), better quality-to-cost tradeoff.

📝 Note: GPT-4o Standard deployments were auto-upgraded to GPT-5.1 and retired on 2026-03-31. GPT-4.1 family was deprecated on 2026-04-14 (no new customers).

o-Series (Reasoning Models)

Source Model	Target Model	Type	Best For
o1	o3	Reasoning	Successor reasoning model
o3-mini	o4-mini	Reasoning	Faster, cheaper reasoning
o1-pro	o3-pro	Reasoning	Pro-tier reasoning

How to Choose

Priority	GPT-4o replacement	GPT-4o-mini replacement
Best quality/latency tradeoff	GPT-5.4-mini	GPT-5.4-nano
Best overall quality	GPT-5.4	GPT-5.4-mini
Best reasoning / agentic	GPT-5.4	GPT-5.4-mini
Lowest cost	GPT-5.4-nano	GPT-5.4-nano

Key API Changes

1. Client Configuration

GPT-4.1+ and GPT-5+ use the v1 API, which requires the OpenAI client instead of AzureOpenAI.

Before (GPT-4o — versioned API):

from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(
    azure_ad_token_provider=token_provider,
    api_version="2024-12-01-preview",
    azure_endpoint=AZURE_OPENAI_ENDPOINT
)

After (GPT-4.1 / GPT-5 — v1 API):

from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default"
)

client = OpenAI(
    api_key=token_provider(),
    base_url=f"{AZURE_OPENAI_ENDPOINT}/openai/v1/"
)

2. Model Family Classification

Use these sets to determine which API and parameters a model requires:

# Models using the new v1 API (OpenAI client with /openai/v1/ endpoint)
V1_MODELS = {
    "gpt-4.1", "gpt-4.1-mini", "gpt-4.1-nano",
    "gpt-5", "gpt-5.1", "gpt-5.2", "gpt-5-mini", "gpt-5-nano",
    "gpt-5-pro", "gpt-5-codex", "gpt-5.1-codex", "gpt-5.1-codex-mini",
    "gpt-5.2-codex", "gpt-5.3-codex",
    "gpt-5.4", "gpt-5.4-pro", "gpt-5.4-mini", "gpt-5.4-nano",
    "codex-mini",
}

# Reasoning models (no temperature/top_p, use max_completion_tokens, developer role)
REASONING_MODELS = {
    "gpt-5", "gpt-5.1", "gpt-5.2", "gpt-5-mini", "gpt-5-nano",
    "gpt-5-pro", "gpt-5.3-codex", "gpt-5.2-codex",
    "gpt-5.4", "gpt-5.4-pro", "gpt-5.4-mini", "gpt-5.4-nano",
}

# o-series reasoning models (also no temperature/top_p, use max_completion_tokens)
# Note: o-series use the classic AzureOpenAI client, NOT the v1 API
O_SERIES_MODELS = {
    "o1", "o1-pro", "o3-mini", "o3", "o3-pro", "o3-deep-research", "o4-mini",
}

3. Parameter Adaptation

Parameter	GPT-4o	GPT-4.1	GPT-5 / GPT-5.x	o-series (o1, o3, o4-mini)
`max_tokens`	Supported	Use `max_completion_tokens`	Use `max_completion_tokens`	Use `max_completion_tokens`
`temperature`	Supported	Supported	Not supported (remove it)	Not supported (remove it)
`top_p`	Supported	Supported	Not supported (remove it)	Not supported (remove it)
`reasoning_effort`	N/A	N/A	See below	Supported
System role	`"system"`	`"system"`	`"developer"`	`"developer"`

Parameter adaptation pattern:

def adapt_params(model_name: str, params: dict) -> dict:
    """Adapt parameters for the target model."""
    adapted = params.copy()

    # max_tokens → max_completion_tokens for v1 models
    if model_name in V1_MODELS and "max_tokens" in adapted:
        adapted["max_completion_tokens"] = adapted.pop("max_tokens")

    # Reasoning models don't support temperature/top_p
    if model_name in REASONING_MODELS or model_name in O_SERIES_MODELS:
        adapted.pop("temperature", None)
        adapted.pop("top_p", None)

    return adapted

4. Reasoning Effort

Model	Type	`reasoning_effort` levels	Default
GPT-4.1 / 4.1-mini / 4.1-nano	Standard	N/A (no reasoning)	—
GPT-5 / 5-mini / 5-nano	Reasoning	`minimal`, `low`, `medium`, `high`	`medium`
GPT-5.1	Reasoning	`none`, `low`, `medium`, `high`	`none`
GPT-5.2 / 5.3-codex / 5.4 / 5.4-pro	Reasoning	`none`, `low`, `medium`, `high`	`none`
GPT-5.4-mini / 5.4-nano	Reasoning	`none`, `low`, `medium`, `high`	`none`
o-series (o1, o3, o4-mini)	Reasoning	`low`, `medium`, `high`	`medium`

Important: reasoning_effort="none" is only supported from GPT-5.1 onwards (GPT-5.1 and GPT-5.2). GPT-5, GPT-5-mini, and GPT-5-nano minimum is "minimal", which still incurs reasoning tokens and added latency.

5. System Role for Reasoning Models

GPT-5.x and o-series models use "developer" instead of "system" for the system message role:

# GPT-4o / GPT-4.1
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": query},
]

# GPT-5.x / o-series
messages = [
    {"role": "developer", "content": "You are a helpful assistant."},
    {"role": "user", "content": query},
]

Automatic role adaptation pattern:

def uses_developer_role(model_name: str) -> bool:
    return model_name in REASONING_MODELS or model_name in O_SERIES_MODELS

# In your calling code:
if uses_developer_role(model_name):
    messages = [
        {**m, "role": "developer"} if m.get("role") == "system" else m
        for m in messages
    ]

6. Client Factory Pattern

Use a factory function to create the right client for any model:

from openai import AzureOpenAI, OpenAI

def create_client(model_name: str, endpoint: str, api_key: str = None) -> AzureOpenAI | OpenAI:
    """Create the appropriate client for a given model."""
    if model_name in V1_MODELS:
        base_url = endpoint.rstrip("/") + "/openai/v1"
        return OpenAI(base_url=base_url, api_key=api_key or token_provider())
    else:
        return AzureOpenAI(
            azure_endpoint=endpoint,
            azure_ad_token_provider=token_provider,
            api_version="2024-12-01-preview",
        )

Error Handling

call_model() provides actionable error messages for common failures:

from src.clients import call_model, create_client

client = create_client("gpt-5.1")
try:
    response = call_model(client, "gpt-5.1", messages)
except RuntimeError as e:
    # Raises descriptive errors:
    # - "Deployment 'gpt-5.1' not found. Check your deployment name..."
    # - "Authentication failed. Run 'az login' for Entra ID auth..."
    print(f"Migration issue: {e}")

Repository Resources

This repo provides reusable modules under src/:

src/config.py — Model family helpers (is_v1(), is_reasoning(), is_o_series(), uses_developer_role()), environment config
src/clients.py — Client factory (create_client()), parameter-adapting call_model() with automatic role adaptation
src/evaluate/ — Full evaluation framework for comparing models (see aoai-migration-evaluation skill)

Deep-dive documentation (always check these for the latest dates and guidance):

docs/retirement-timeline.md — authoritative retirement dates and planning matrix
docs/migration-paths.md — detailed migration paths with decision trees
docs/api-changes-by-model.md — comprehensive API changes reference
docs/evaluation-guide.md — evaluation methodology and setup
samples/rag_pipeline/ — working end-to-end migration example

💡 Tip: This skill provides quick guidance for common migration tasks. For the latest model dates, detailed walkthroughs, and working code samples, always check the repo documentation above — it is updated more frequently than this skill.

Steps for a Migration

Identify your target model using the migration paths table above — consider the tier-down strategy (GPT-5.4-mini/nano) for best cost-quality tradeoff.
Update client initialization — switch from AzureOpenAI to OpenAI for v1 models.
Adapt parameters — replace max_tokens with max_completion_tokens, remove temperature/top_p for reasoning models.
Update system message role — use "developer" for GPT-5.x and o-series models.
Set reasoning_effort if using a reasoning model — GPT-5.4-mini supports "none" for zero reasoning overhead; start with "low" for cost-sensitive workloads.
Run evaluations to validate the new model matches or exceeds the old model's quality (see aoai-migration-evaluation skill).
Deploy progressively — canary rollout for high-traffic workloads.

Validate After Migration

After updating your code, verify output quality hasn't regressed:

from src.evaluate.core import MigrationEvaluator

evaluator = MigrationEvaluator(
    source_model="gpt-4o",
    target_model="gpt-5.1",
    test_cases="data/golden_rag.jsonl",  # 54 pre-built test cases in data/
    metrics=["coherence", "relevance", "groundedness"],
)
report = evaluator.run()
report.print_report()

See the aoai-migration-evaluation skill for full evaluation guidance, including custom evaluators and PII redaction for production data.

Must Not

Hard-code model names deep in application code. Use config/environment variables.
Use temperature or top_p with reasoning models (GPT-5.x, o-series) — they are not supported.
Assume GPT-4.1 family is still available for new deployments — it was deprecated April 14, 2026.
Use max_tokens with v1 API models — use max_completion_tokens instead.
Skip evaluation before deploying a new model in production.
Assume reasoning_effort="none" works on GPT-5/GPT-5-mini — only GPT-5.1+ supports it.
Use AzureOpenAI client with v1 models — use OpenAI client with base_url pointing to /openai/v1/.
Use "system" role with reasoning models — use "developer" role instead.

Structured Outputs & Responses API

Structured Outputs

If your application uses response_format for JSON output, it works across model generations:

Feature	GPT-4o	GPT-4.1	GPT-5+
`{ "type": "json_object" }`	Supported	Supported	Supported
`{ "type": "json_schema", ... }`	Supported (2024-08-06+)	Supported	Supported
Strict mode	Supported	Supported	Supported

Test your JSON schemas against the new model — different models may interpret schema constraints differently.

Responses API

Azure OpenAI now supports the Responses API alongside Chat Completions. It offers built-in tool use, file search, and web search. Existing Chat Completions code continues to work. See the Responses API docs.

References

Azure OpenAI Model Retirements — authoritative retirement dates
Azure OpenAI Models Overview — model capabilities & availability
GPT-5 vs GPT-4.1: Choosing the Right Model
Responses API — new API surface
Azure OpenAI SDKs — all supported languages
What's New in Azure OpenAI — latest changes