name: xr-createShadow
description: Use when creating, updating, pausing, resuming, or deleting an AzureOpenAI model-pool shadow deployment, especially when the user gives prod and shadow engine definitions and wants the minimum additional inputs needed to keep everything else aligned with prod, reuse or publish the correct shadow model definition, build the shadow payload, and verify the shadow reaches a useful live state.
compatibility: Requires Azure CLI login with model-pool Shadow API access and local repo context under /home/xiaoranli/repo.
xr-createShadow
Use this skill to turn a prod engine definition plus a shadow engine definition into a model-pool shadow deployment with the minimum extra operator input.
Default behavior:
- infer control-side fields from repo templates first,
- ask only for the missing deltas,
- write request payloads under
AI-gen/, - and execute the model-definition plus shadow-create flow instead of stopping at a draft.
Source of truth
/home/xiaoranli/repo/shadowDeployments.md/home/xiaoranli/repo/AMLRunbook/AMLDocumentation/DRI Handbook files/TeamDRIs/MPCM/ShadowTestTSG.md/home/xiaoranli/repo/Mumford.wiki/Mumford/Observability/How-to-setup-Shadow-Model-Deployment.md/home/xiaoranli/repo/vienna/src/azureml-api/deploy/templates/ev2/model-pool/AzureOpenAI/<region>//home/xiaoranli/repo/vienna/src/azureml-api/deploy/templates/ev2/model-pool/ModelDefinitionV2/Request//home/xiaoranli/repo/adm-engine-configs/scripts/python/engine/publish_model_definition.py/home/xiaoranli/repo/xr-request.sh
Required user inputs
Use these normalized field names when asking the user for inputs. The first two are the usual minimum initial input set:
prod_engine_definitionPrefer one of:- checked-in file path,
- full engine-definition JSON,
- or a stable name plus version that can be resolved from local context.
shadow_engine_definitionPrefer one of:- checked-in file path,
- full engine-definition JSON,
- or a stable name plus version that can be resolved from local context.
Then infer or collect the execution-critical fields in this order:
regionPrefer explicit input. If missing, infer it only when the prod route or template context is unambiguous.one production target selector Prefer
deployment_group_namewhen the user wants to pin a known DG. Otherwise accept one of:model_pool_name,control_model_definition_id,- or a concrete region-specific file under
AzureOpenAI/<region>/.
allotment_idPrefer the full value such as/stamps/AOAI/allotmentgroups/AOAI-3P/allotments/AOAI-3P.Default.skuThis must be the exact shadow deployment SKU.traffic_percentageIf the user does not care, default to10only whendeployment_group_nameis explicit. Ifdeployment_group_nameis omitted and planner resolves the active DG, expect the service to normalize the percentage tomax(1, floor(100 / deploymentCount))for the selected DG.
Collect these next only if the repo cannot infer them, the user wants non-default behavior, or the live create path needs them:
shadow_test_idtraffic_groupinstance_countauto_pause_after_daysshadow_model_definition_idshadow_model_definition_versionheader_override_noteswhen the shadow engine needs different request headers from prod
Critical nuance
Two engine definitions are a strong primary input set, but they are not always sufficient by themselves.
If the target shadow model definition does not already exist, the skill also needs one of these:
- a reusable request template from
ModelDefinitionV2/Request/, - a shadow engine-definition JSON that already carries the snapshot or model-feed data needed to build a model definition,
- or an explicit
ShadowModelDefinitionIdplusShadowModelDefinitionVersionthat already exists.
Do not pretend that two display-name-only engine definitions are enough to publish a model definition.
Also keep these distinctions straight:
engine definition id/nameis not the same thing asmodel definition id.- ED onboarding may create assets that let you publish a model definition, but it does not guarantee the correct shadow model definition for this test is the prod MD id.
- Before publishing a new version under the prod MD id, first check whether a shadow model definition already exists in model-pool under the shadow ED naming pattern.
Resolution rules
ControlModelDefinitionIdPrefer reading it from the closest matching file underAzureOpenAI/<region>/. If the prod engine definition clearly maps to a unique prod model definition in local context, use that as a cross-check, not as the sole source of truth.DeploymentGroupNamePrefer explicit user input. If missing, resolve it from the production template, Fleet Scheduler context, active model-pool data, or model-pool name. If an explicitDeploymentGroupNameis rejected byInvalidShadowTestDeploymentGroupName, retry withoutDeploymentGroupNameandTrafficGroupwhile keepingInstanceCount; planner can resolve the active DG fromControlModelDefinitionId.ShadowModelDefinitionIdFirst check whether a model definition already exists in model-pool for the shadow engine-definition naming pattern. If it does, prefer that existing shadow MD id and version. If only the engine changes and the GPU footprint stays compatible and no separate shadow MD exists, reusing the prod model-definition id with a new version is valid. If the model changes or the engine change alters GPU footprint, prefer a new model-definition id and keep the existing naming style, including GPU count when the nearby templates do that.SkuIf the user says the goal is engine-only comparison, prefer the prod SKU. Only diverge when the shadow engine's GPU footprint forces it.TrafficGroupAPI-optional, but prefer explicit input when the deployment group has several traffic groups.InstanceCountAPI-optional. If omitted, the service uses the model pool'sinstancePerDeploymentvalue.OverrideShadowTrafficPercentageMutable after creation. It can be updated later while the shadow isActive. IfDeploymentGroupNameis omitted, planner may rewrite the requested percentage tomax(1, floor(100 / deploymentCount))of the selected largest active DG.ShadowTestIdMust be unique. Deleted ids remain reserved for30days, so do not try to reuse them immediately.AllotmentIdTreat this as both the quota selector and the pre-create capacity-check key. If the user gives only a friendly alias or portal label, ask for the full allotment path before executing live calls.AutoPauseAfterDaysTreat this as optional and example-backed. It appears in current examples and TSG responses, but the core parameter table inshadowDeployments.mddoes not document it as part of the primary contract.
Capacity precheck
Before any live shadow PUT, run a quota gate using resolved region, AllotmentId, Sku, and InstanceCount.
- Parse
AllotmentIdas/stamps/{stamp}/allotmentgroups/{group}/allotments/{name}. - Map the requested SKU to the MPCM VM family at minimum as:
H100->NDH100V5H200->NDH200V5MI300->NDMI300XV5A100->NDAMV4
- Query MPCM before creating the shadow:
https://westcentralus.api.azureml.ms/model-pool-capacity-manager/v1.0/stamps/{stamp}/allotmentGroups/{group}/allotments/{name}?regions={region}&vmFamilies={vmFamily}&includeRegion=true - If MPCM returns
404, stop. The allotment is missing, stale, or not onboarded for the requested scope. - If available capacity is below the requested
InstanceCount, stop and surface the shortfall instead of attempting the shadow create call. - If
InstanceCountis omitted, first resolve the model-poolinstancePerDeploymentvalue from deployment-group or model-pool context. If that still remains unknown, use a conservative one-instance precheck and say that the live create path may still require more capacity. - Optionally list existing shadows on the same
AllotmentIdas a cross-check for competing usage. - Treat this precheck as mandatory for live execution. Skip it only when the user explicitly wants a draft payload without API calls.
Context gathering
Follow this order:
- Read the repo
AGENTS.md. - Read the shadow docs listed in
Source of truth. - Search
AzureOpenAI/<region>/for the closest production JSON and extract:modelDefinitionId,- the model-pool naming pattern,
- and any obvious traffic-group hint.
- Diff the prod and shadow engine definitions:
engine.engineIdengine.skuspipeReplicaGroupsdeploymentProperties- any request-header-relevant envs
- Search
ModelDefinitionV2/Request/for the closest request template if a new shadow model definition must be published. - Query model-pool for an existing shadow model definition before publishing anything:
- first try the shadow engine-definition name as a model-definition id,
- then try
lkgOrLatestor the explicit version if known. Reuse an existing active shadow MD when it already points at the intended engine version.
- If the user supplied full engine-definition JSON, check whether the shadow one already contains snapshot, engine, and SKU data before asking for more. Prefer inheriting unchanged fields from the prod side when the user's intent is engine-only testing.
- Prefer
/home/xiaoranli/repo/adm-engine-configs/scripts/python/engine/publish_model_definition.pyover inventing a new publisher from scratch. - Run the MPCM capacity precheck with resolved
AllotmentId,Sku,region, andInstanceCountbefore the shadowPUT. - If
Preparingpersists, check:
- shadow
GET, - deployment-group
shadowState, - and deployment-group admin
oplogfiltered byshadowTestIdordeploymentNamebefore blaming the payload.
- If the user wants a post-create sanity check, prefer
/home/xiaoranli/repo/xr-request.shor a user-supplied payload. - If the user asks whether a newly created deployment or endpoint is actually receiving traffic, wants return-code distribution, or wants direct ADX links, hand off to
xiaoranli-kuda's endpoint/deployment quick-check path. Treat that as the default deployment-verification workflow, with shadow as just one example.
Execution flow
- Resolve the control-side fields from repo context first.
- Diff prod and shadow engine definitions and keep all non-engine fields aligned with prod unless the shadow engine forces a change.
- Decide whether an existing shadow model definition already exists in model-pool.
- If it exists, prefer reusing that shadow MD id and version instead of republishing under the prod MD id.
- If it does not exist, build and publish the model definition.
Default behavior for engine-only tests:
- inherit prod model-definition shape,
- replace only the engine-side field or version that must change,
- preserve prod-compatible SKU unless footprint changed.
- If
DeploymentGroupNameis unknown or stale, prefer a create request withoutDeploymentGroupNameandTrafficGroup, but keepInstanceCountso planner can resolve the active DG. - Run the MPCM capacity precheck. If it fails, stop before the shadow create call and report the exact allotment, VM family, and available-vs-requested capacity.
- Write generated payloads under:
AI-gen/<shadow-test-id>-model-definition-request.jsonAI-gen/<shadow-test-id>-shadow-create-request.json
- Call the model-definition
PUTonly when required. - Call the shadow
PUT. - Poll the shadow
GETuntil it reachesActive,Failed, or a clearly stable nonterminal state. - If
Active, optionally run a sanity request or guide the user to compare shadow logs against prod. For deployment-level traffic or return-code verification, prefer thexiaoranli-kudaquick-check helper using the exact endpoint/deployment pair and a start time anchored to creation time. If the next question is exact shadowITL,TPOT, or "有没有 traffic 进来" on a mirrored rollout, hand off toxiaoranli-kudaand follow itsFixed Shadow ITL Workflowinstead of guessing from FrontDoor or Nexusrequests. - If it remains nonterminal, report:
deploymentName,- resolved
DeploymentGroupName, - resolved traffic percentage,
- deployment-group
shadowState, - and admin
oplogstate history.
- If
Failed, lead withcapacityDiagnostics,deploymentName, and the TSG branch that matches the failure class.
Status and observability
When checking or debugging a shadow, always start with a compact status summary:
shadowTestIdstatusshadowModelDefinitionIdandshadowModelDefinitionVersioncontrolModelDefinitionIddeploymentGroupNamedeploymentNameinstanceCountoverrideShadowTrafficPercentage
Always include the Shadow Test Status dashboard link using the resolved shadowTestId and deploymentGroupName:
https://dataexplorer.azure.com/dashboards/0f7089f3-5ba9-4128-ae02-155e4887c610?p-_startTime=3hours&p-_endTime=now&p-Region=all&p-_ShadowTestId=v-{shadowTestId}&p-_DeploymentGroupName=v-{deploymentGroupName}&p-_Status=all&p-Stamp=all&p-Allotment+Group=all&p-Allotment=all&p-ShadowModelPoolName=all&p-_operationId=all&p-_Endpoint=all&p-_Deployment=all#6144fdd7-0180-4449-ba88-ca9771186679
If deploymentName is empty, do not jump straight to container logs.
Stay on planner-facing checks first:
- shadow
GET - deployment-group
capacity.shadowState - planner/admin
oplog
If deploymentName is present, derive these identifiers before changing payloads:
endpointName: replace-dg-with-oe-indeploymentGroupNameresourceInstanceId: get it from scheduler traces or the first matching state-transition record
Use ADX or kusto-local against cluster https://aiscprodkusto.westus2.kusto.windows.net, database logs, when direct log queries are needed.
Use scheduler traces to separate planner/scheduling issues from container startup issues:
SchedulerStateTransitions
| where TIMESTAMP > ago(1h)
| where message has "{deploymentName}"
| project TIMESTAMP, CurrentState, ToState, ApplicationName, message
| order by TIMESTAMP asc
| take 50
Then use container traces for engine startup and health details:
ContainerTraces
| where TIMESTAMP > ago(1h)
| where ApplicationName == "{resourceInstanceId}"
| where Level >= 3 or RawMessage has "model" or RawMessage has "engine" or RawMessage has "ready" or RawMessage has "error" or RawMessage has "loading" or RawMessage has "started"
| project TIMESTAMP, CodePackageName, RawMessage, Level
| order by TIMESTAMP asc
| take 100
If there are no container logs yet, treat that as "container has not started" rather than proof that the payload is wrong.
Failure handling
FailedStart withcapacityDiagnostics. Distinguish capacity or user error from engine or container failure before changing the payload. Bucket the failure class before recommending a fix:SCHEDULING_FAILUREwhen scheduler traces show capacity, quota, or unschedulable placement issues.CONTAINER_CRASHorHEALTH_CHECK_TIMEOUTwhendeploymentNameexists but the deployment never becomes healthy.OOM,CUDA_ERROR, orMODEL_LOAD_FAILUREwhen container traces show engine startup failure signatures.UNKNOWNonly when neithercapacityDiagnosticsnor traces give a usable cause.
PreparingorPausingstuck for a long time Check deployment-groupshadowState, planner behavior, and the adminoplogbefore retrying blindly. Ifoplogonly showsCreatingand deployment-groupshadowState.currentStateInstanceCountis still below goal, treat it as underlying compute still provisioning, not as payload validation failure. IfdeploymentNameis already set, inspect scheduler and container traces before retrying the API call.InvalidShadowTestDeploymentGroupNameTreat this as a selector problem, not an engine problem. Retry withoutDeploymentGroupNameandTrafficGroupwhile preservingControlModelDefinitionId,InstanceCount,Sku, andAllotmentId.delete workflow Pause first, wait for
Paused, then delete.admin cleanup Use the admin
forceTerminatepath only for explicit DRI or livesite cleanup, not as the default delete path.
Ask strategy
Do not ask for every payload field at once.
When the user asks what else is needed, reply with the unresolved subset from Required user inputs using those exact normalized field names.
Keep the reply to the smallest sufficient subset.
Start with:
prod_engine_definitionshadow_engine_definitionregiondeployment_group_nameormodel_pool_nameallotment_idskutraffic_percentage
Then mine the repo for control_model_definition_id, request-template candidates, and naming patterns.
If the user's stated goal is "only test engine definition", default to preserving prod-aligned sku, traffic_group, instance_count, and request-header behavior unless the diff proves they must change.
Before asking the user for a new shadow MD id, check whether the shadow ED name already exists as a model definition in model-pool. If the explicit deployment group is missing or rejected, prefer the no-DG fallback before asking the user for a replacement DG.
Ask for shadow_test_id, traffic_group, shadow_model_definition_id, shadow_model_definition_version, or header_override_notes only when they remain unresolved.