aoai-model-migration

star 3

Migrate Azure OpenAI applications from GPT-4o/GPT-4o-mini to newer models (GPT-4.1, GPT-5, GPT-5.1 through GPT-5.4, o-series). Covers API changes, client configuration, parameter adaptation, prompt adjustments, and authentication. USE FOR: migrate model, switch model, upgrade model, GPT-4o replacement, AzureOpenAI to OpenAI client, v1 API, max_completion_tokens, reasoning_effort, developer role, system role, parameter adaptation, client factory, model classification. DO NOT USE FOR: retirement dates or lifecycle planning (use aoai-model-lifecycle), evaluation or A/B testing (use aoai-migration-evaluation).

aiappsgbb By aiappsgbb schedule Updated 4/14/2026

name: aoai-model-migration description: > Migrate Azure OpenAI applications from GPT-4o/GPT-4o-mini to newer models (GPT-4.1, GPT-5, GPT-5.1 through GPT-5.4, o-series). Covers API changes, client configuration, parameter adaptation, prompt adjustments, and authentication. USE FOR: migrate model, switch model, upgrade model, GPT-4o replacement, AzureOpenAI to OpenAI client, v1 API, max_completion_tokens, reasoning_effort, developer role, system role, parameter adaptation, client factory, model classification. DO NOT USE FOR: retirement dates or lifecycle planning (use aoai-model-lifecycle), evaluation or A/B testing (use aoai-migration-evaluation).

Azure OpenAI Model Migration Skill

⚠️ Retirement dates and model availability change frequently. Always verify against the official Azure OpenAI Model Retirements page.

Purpose

Guide developers through migrating Azure OpenAI applications from GPT-4o / GPT-4o-mini to newer model families (GPT-4.1, GPT-5, GPT-5.1, GPT-5.2) and o-series reasoning models (o1 → o3, o3-mini → o4-mini). This skill covers API surface changes, client configuration, parameter adaptation, and prompt adjustments.

When to Use

  • Migrating from GPT-4o or GPT-4o-mini to any newer Azure OpenAI model
  • Migrating o-series models (o1 → o3, o3-mini → o4-mini)
  • Adapting code to the new v1 API (/openai/v1/) used by GPT-4.1+ and GPT-5+
  • Adapting parameters and system prompts for reasoning models (GPT-5, GPT-5.1, GPT-5.2, o-series)
  • Choosing the right replacement model for a given workload

Migration Paths

GPT Series

Source Model Target Model Type Best For
GPT-4o / GPT-4.1 GPT-5.4-mini Reasoning Recommended — comparable quality at lower cost/latency (tier-down strategy)
GPT-4o-mini / GPT-4.1-mini GPT-5.4-nano Reasoning Recommended — comparable quality at a fraction of the cost
GPT-4o GPT-5.1 Reasoning Official auto-migration target (Standard deployments, completed March 2026)
GPT-4o GPT-5.4 Reasoning Best overall quality (Mar 2026), longest runway
GPT-4o-mini GPT-4.1-mini Standard Official auto-migration target (Standard deployments)

💡 Tier-down strategy: Newer-generation smaller models match or exceed older-generation larger ones with better latency and lower cost. Target GPT-5.4-mini instead of GPT-4.1/GPT-5, and GPT-5.4-nano instead of GPT-4.1-mini — longer runway (Sep 2027), better quality-to-cost tradeoff.

📝 Note: GPT-4o Standard deployments were auto-upgraded to GPT-5.1 and retired on 2026-03-31. GPT-4.1 family was deprecated on 2026-04-14 (no new customers).

o-Series (Reasoning Models)

Source Model Target Model Type Best For
o1 o3 Reasoning Successor reasoning model
o3-mini o4-mini Reasoning Faster, cheaper reasoning
o1-pro o3-pro Reasoning Pro-tier reasoning

How to Choose

Priority GPT-4o replacement GPT-4o-mini replacement
Best quality/latency tradeoff GPT-5.4-mini GPT-5.4-nano
Best overall quality GPT-5.4 GPT-5.4-mini
Best reasoning / agentic GPT-5.4 GPT-5.4-mini
Lowest cost GPT-5.4-nano GPT-5.4-nano

Key API Changes

1. Client Configuration

GPT-4.1+ and GPT-5+ use the v1 API, which requires the OpenAI client instead of AzureOpenAI.

Before (GPT-4o — versioned API):

from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(
    azure_ad_token_provider=token_provider,
    api_version="2024-12-01-preview",
    azure_endpoint=AZURE_OPENAI_ENDPOINT
)

After (GPT-4.1 / GPT-5 — v1 API):

from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default"
)

client = OpenAI(
    api_key=token_provider(),
    base_url=f"{AZURE_OPENAI_ENDPOINT}/openai/v1/"
)

2. Model Family Classification

Use these sets to determine which API and parameters a model requires:

# Models using the new v1 API (OpenAI client with /openai/v1/ endpoint)
V1_MODELS = {
    "gpt-4.1", "gpt-4.1-mini", "gpt-4.1-nano",
    "gpt-5", "gpt-5.1", "gpt-5.2", "gpt-5-mini", "gpt-5-nano",
    "gpt-5-pro", "gpt-5-codex", "gpt-5.1-codex", "gpt-5.1-codex-mini",
    "gpt-5.2-codex", "gpt-5.3-codex",
    "gpt-5.4", "gpt-5.4-pro", "gpt-5.4-mini", "gpt-5.4-nano",
    "codex-mini",
}

# Reasoning models (no temperature/top_p, use max_completion_tokens, developer role)
REASONING_MODELS = {
    "gpt-5", "gpt-5.1", "gpt-5.2", "gpt-5-mini", "gpt-5-nano",
    "gpt-5-pro", "gpt-5.3-codex", "gpt-5.2-codex",
    "gpt-5.4", "gpt-5.4-pro", "gpt-5.4-mini", "gpt-5.4-nano",
}

# o-series reasoning models (also no temperature/top_p, use max_completion_tokens)
# Note: o-series use the classic AzureOpenAI client, NOT the v1 API
O_SERIES_MODELS = {
    "o1", "o1-pro", "o3-mini", "o3", "o3-pro", "o3-deep-research", "o4-mini",
}

3. Parameter Adaptation

Parameter GPT-4o GPT-4.1 GPT-5 / GPT-5.x o-series (o1, o3, o4-mini)
max_tokens Supported Use max_completion_tokens Use max_completion_tokens Use max_completion_tokens
temperature Supported Supported Not supported (remove it) Not supported (remove it)
top_p Supported Supported Not supported (remove it) Not supported (remove it)
reasoning_effort N/A N/A See below Supported
System role "system" "system" "developer" "developer"

Parameter adaptation pattern:

def adapt_params(model_name: str, params: dict) -> dict:
    """Adapt parameters for the target model."""
    adapted = params.copy()

    # max_tokens → max_completion_tokens for v1 models
    if model_name in V1_MODELS and "max_tokens" in adapted:
        adapted["max_completion_tokens"] = adapted.pop("max_tokens")

    # Reasoning models don't support temperature/top_p
    if model_name in REASONING_MODELS or model_name in O_SERIES_MODELS:
        adapted.pop("temperature", None)
        adapted.pop("top_p", None)

    return adapted

4. Reasoning Effort

Model Type reasoning_effort levels Default
GPT-4.1 / 4.1-mini / 4.1-nano Standard N/A (no reasoning)
GPT-5 / 5-mini / 5-nano Reasoning minimal, low, medium, high medium
GPT-5.1 Reasoning none, low, medium, high none
GPT-5.2 / 5.3-codex / 5.4 / 5.4-pro Reasoning none, low, medium, high none
GPT-5.4-mini / 5.4-nano Reasoning none, low, medium, high none
o-series (o1, o3, o4-mini) Reasoning low, medium, high medium

Important: reasoning_effort="none" is only supported from GPT-5.1 onwards (GPT-5.1 and GPT-5.2). GPT-5, GPT-5-mini, and GPT-5-nano minimum is "minimal", which still incurs reasoning tokens and added latency.

5. System Role for Reasoning Models

GPT-5.x and o-series models use "developer" instead of "system" for the system message role:

# GPT-4o / GPT-4.1
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": query},
]

# GPT-5.x / o-series
messages = [
    {"role": "developer", "content": "You are a helpful assistant."},
    {"role": "user", "content": query},
]

Automatic role adaptation pattern:

def uses_developer_role(model_name: str) -> bool:
    return model_name in REASONING_MODELS or model_name in O_SERIES_MODELS

# In your calling code:
if uses_developer_role(model_name):
    messages = [
        {**m, "role": "developer"} if m.get("role") == "system" else m
        for m in messages
    ]

6. Client Factory Pattern

Use a factory function to create the right client for any model:

from openai import AzureOpenAI, OpenAI

def create_client(model_name: str, endpoint: str, api_key: str = None) -> AzureOpenAI | OpenAI:
    """Create the appropriate client for a given model."""
    if model_name in V1_MODELS:
        base_url = endpoint.rstrip("/") + "/openai/v1"
        return OpenAI(base_url=base_url, api_key=api_key or token_provider())
    else:
        return AzureOpenAI(
            azure_endpoint=endpoint,
            azure_ad_token_provider=token_provider,
            api_version="2024-12-01-preview",
        )

Error Handling

call_model() provides actionable error messages for common failures:

from src.clients import call_model, create_client

client = create_client("gpt-5.1")
try:
    response = call_model(client, "gpt-5.1", messages)
except RuntimeError as e:
    # Raises descriptive errors:
    # - "Deployment 'gpt-5.1' not found. Check your deployment name..."
    # - "Authentication failed. Run 'az login' for Entra ID auth..."
    print(f"Migration issue: {e}")

Repository Resources

This repo provides reusable modules under src/:

  • src/config.py — Model family helpers (is_v1(), is_reasoning(), is_o_series(), uses_developer_role()), environment config
  • src/clients.py — Client factory (create_client()), parameter-adapting call_model() with automatic role adaptation
  • src/evaluate/ — Full evaluation framework for comparing models (see aoai-migration-evaluation skill)

Deep-dive documentation (always check these for the latest dates and guidance):

💡 Tip: This skill provides quick guidance for common migration tasks. For the latest model dates, detailed walkthroughs, and working code samples, always check the repo documentation above — it is updated more frequently than this skill.

Steps for a Migration

  1. Identify your target model using the migration paths table above — consider the tier-down strategy (GPT-5.4-mini/nano) for best cost-quality tradeoff.
  2. Update client initialization — switch from AzureOpenAI to OpenAI for v1 models.
  3. Adapt parameters — replace max_tokens with max_completion_tokens, remove temperature/top_p for reasoning models.
  4. Update system message role — use "developer" for GPT-5.x and o-series models.
  5. Set reasoning_effort if using a reasoning model — GPT-5.4-mini supports "none" for zero reasoning overhead; start with "low" for cost-sensitive workloads.
  6. Run evaluations to validate the new model matches or exceeds the old model's quality (see aoai-migration-evaluation skill).
  7. Deploy progressively — canary rollout for high-traffic workloads.

Validate After Migration

After updating your code, verify output quality hasn't regressed:

from src.evaluate.core import MigrationEvaluator

evaluator = MigrationEvaluator(
    source_model="gpt-4o",
    target_model="gpt-5.1",
    test_cases="data/golden_rag.jsonl",  # 54 pre-built test cases in data/
    metrics=["coherence", "relevance", "groundedness"],
)
report = evaluator.run()
report.print_report()

See the aoai-migration-evaluation skill for full evaluation guidance, including custom evaluators and PII redaction for production data.

Must Not

  • Hard-code model names deep in application code. Use config/environment variables.
  • Use temperature or top_p with reasoning models (GPT-5.x, o-series) — they are not supported.
  • Assume GPT-4.1 family is still available for new deployments — it was deprecated April 14, 2026.
  • Use max_tokens with v1 API models — use max_completion_tokens instead.
  • Skip evaluation before deploying a new model in production.
  • Assume reasoning_effort="none" works on GPT-5/GPT-5-mini — only GPT-5.1+ supports it.
  • Use AzureOpenAI client with v1 models — use OpenAI client with base_url pointing to /openai/v1/.
  • Use "system" role with reasoning models — use "developer" role instead.

Structured Outputs & Responses API

Structured Outputs

If your application uses response_format for JSON output, it works across model generations:

Feature GPT-4o GPT-4.1 GPT-5+
{ "type": "json_object" } Supported Supported Supported
{ "type": "json_schema", ... } Supported (2024-08-06+) Supported Supported
Strict mode Supported Supported Supported

Test your JSON schemas against the new model — different models may interpret schema constraints differently.

Responses API

Azure OpenAI now supports the Responses API alongside Chat Completions. It offers built-in tool use, file search, and web search. Existing Chat Completions code continues to work. See the Responses API docs.

References

Install via CLI
npx skills add https://github.com/aiappsgbb/AOAI-models-migration --skill aoai-model-migration
Repository Details
star Stars 3
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator