name: multi-resource-allocation-validation description: Validate and repair proposed resource allocations by replaying them against temporary capacity. Use when actions consume several resource dimensions such as CPU, memory, GPUs, or accelerators.
Multi-Resource Allocation Validation
Use this skill before returning a batch of resource allocation actions, and after building a feasible schedule to make small objective improvements.
Core Workflow
Replay every proposed action against a temporary resource state. A placement is valid only if each required resource remains non-negative after applying all earlier placements in the same batch. Do not validate each placement only against the original observation.
Example field names vary by task, but common reminders include cpu_free, memory_free, and gpu_slots[*].free_gpu_units.
Replay Skeleton
Use replay validation as the final gate before returning actions:
temporary_state = copy_resources(original_observation)
repaired_actions = []
for action in actions:
if action is not a placement:
repaired_actions.append(action)
continue
find the work item, target machine, and target slot/resource
check compatibility
check every required resource is available
if any check fails:
action = repair_or_replace_with_defer_or_reject(action, temporary_state)
if action is still a placement:
subtract consumed resources from temporary_state
repaired_actions.append(action)
The combined action list must be feasible after all earlier actions in the same batch have consumed resources. Each work item should appear in at most one action, and deferred or rejected work should not consume resources.
Repair Order
When a placement fails validation, repair it in this order:
- Try an alternate slot or resource on the same target.
- Try an alternate active machine or target that already has compatible allocations.
- Try an alternate inactive machine or empty target.
- Defer the work if waiting is allowed and still useful.
- Reject the work only when no valid placement or defer decision is appropriate.
In shorthand: alternate slot -> alternate active machine -> alternate inactive machine -> defer -> reject.
Feasible-Solution Improvement Pass
After the action list is feasible, optional improvements should also be evaluated by weighted marginal score. An improvement is useful only if the full action list remains feasible after replay and the weighted marginal score improves.
for pass_id in deterministic_range(1 or 2):
for started_job in stable_order(started_jobs):
temporary_state = replay_actions_without(started_job)
current_score = weighted_marginal_score(started_job.current_placement)
alternatives = enumerate_feasible_placements(started_job, temporary_state)
best = min(alternatives, key=weighted_marginal_score)
if weighted_marginal_score(best) + tolerance < current_score:
move started_job to best
replay and validate the full action list
Move an item only when the alternative lowers the same weighted score used during construction. Feasibility is still mandatory; a lower score does not justify an invalid action list.
If two pending actions both fit a resource slot in isolation, the first accepted action may consume enough capacity that the second no longer fits. For GPU-style APIs, a machine-level CPU or memory field may be independent from slot-level accelerator fields, so satisfy both shared and slot-level resources.