delta-aware-multi-agent-orchestration - SKILL.md Agent Skill

name: delta-aware-multi-agent-orchestration description: "DAOEF framework for scaling multi-agent edge systems beyond 100 agents without synergistic collapse. Three co-designed mechanisms: differential neural caching (delta-aware activation reuse), criticality-based action space pruning (O(n log n) coordination), and learned hardware affinity matching (GPU/CPU/NPU task routing). Activation: multi-agent edge orchestration, synergistic collapse, MADDPG scaling, differential caching, action space pruning, hardware affinity, edge computing latency, vision task scheduling, camera network coordination."

Delta-Aware Multi-Agent Edge Orchestration (DAOEF)

A co-designed framework that prevents synergistic collapse in multi-agent edge deployments (>100 agents) through three mechanisms: delta-aware neural caching, criticality-based action space pruning, and hardware affinity matching.

Metadata

Source: arXiv:2604.20129v1 [cs.LG, cs.DC, cs.PF, cs.SE]
Authors: Samaresh Kumar Singh, Joyjit Roy
Published: 2026-04-22
Title: A Delta-Aware Orchestration Framework for Scalable Multi-Agent Edge Computing

Core Problem: Synergistic Collapse

Scaling multi-agent reinforcement learning (e.g., MADDPG) beyond ~100 agents on edge infrastructure causes superlinear performance degradation — not merely additive slowdown, but cascading failures where multiple bottlenecks compound.

Observed failure case (Smart City, 150 cameras):

Deadline Satisfaction Rate: 78% → 34% (drop of 44 points)
Annual cost overruns: ~$180K

Three Interacting Failure Factors

Factor	Mechanism	Impact
Exponential action-space growth	K^N for N agents, K nodes	+85-120ms decision delay per step
Computational redundancy	65% similar features in adjacent cameras	35-42% cache hit with naive caching
Task-agnostic scheduling	Vision tasks routed to CPUs not GPUs	2-5x slowdown, compounding across fleet

These factors are synergistic: each amplifies the others. Fixing one in isolation yields sublinear gains; all three must be co-designed.

DAOEF: Three Co-Designed Mechanisms

1. Differential Neural Caching

Problem: Output-level caching misses 65% of reuse opportunities because adjacent camera frames differ slightly but produce different final outputs.

Solution: Cache intermediate layer activations instead. Compute only the input delta (difference between current and cached input), then forward propagate the delta through remaining layers rather than recomputing from scratch.

How it works:

Store intermediate activations at a calibrated layer boundary (e.g., after ResNet block 3, before final classification head)
On new input, compute similarity score against cached input (e.g., SSIM or L2 distance on normalized features)
If similarity > threshold θ: compute delta = current_input - cached_input, forward only the delta through remaining layers
If similarity ≤ θ: full forward pass, update cache

Calibrating the similarity threshold θ:

Too high → few cache hits, wasted storage
Too low → accuracy degradation from accumulated approximation error
Empirical approach: sweep θ on validation set, plot hit rate vs accuracy loss, select θ where accuracy loss stays within 2% tolerance
Typical effective range: θ ∈ [0.85, 0.95] for cosine similarity on feature maps

Expected improvement: 2.1x higher cache hit ratios (72% vs 35% for output- level caching) with <2% accuracy loss.

2. Criticality-Based Action Space Pruning

Problem: Full multi-agent coordination requires O(n²) pairwise interactions. At 150+ agents, the decision space becomes intractable.

Solution: Organize agents into priority tiers based on task criticality, then restrict full coordination to high-criticality agents only. Lower tiers use simplified or greedy policies.

Three-tiered priority filtering:

Tier 1 (Critical)    → Full MARL coordination with all Tier 1 peers
                       O(k²) where k << n (typically 5-15% of agents)
Tier 2 (Important)   → Coordination with Tier 1 + local greedy optimization
                       O(k log k) local grouping
Tier 3 (Best-effort) → Pure greedy / rule-based, no inter-agent coordination
                       O(1) per agent

Complexity reduction: O(n²) → O(n log n) overall coordination cost. Optimality loss: <6% vs full coordination on standard benchmarks.

Assigning criticality tiers:

Tier 1: Safety-critical tasks, SLA-bounded deadlines, high-value cameras (e.g., traffic intersections, emergency corridors)
Tier 2: Quality-of-service tasks, medium-priority monitoring (e.g., pedestrian zones, parking lots)
Tier 3: Best-effort analytics, deferred processing acceptable (e.g., historical traffic analysis, periodic environment checks)

Implementation pattern:

# Criticality assignment (static or dynamic)
def assign_tier(agent, workload_metrics):
    if agent.task.is_safety_critical or agent.sla_deadline < 500:
        return Tier.CRITICAL
    elif agent.task.qos_weight > 0.7:
        return Tier.IMPORTANT
    else:
        return Tier.BEST_EFFORT

# Tier-aware coordination
def coordinate(agents):
    critical = [a for a in agents if a.tier == Tier.CRITICAL]
    important = [a for a in agents if a.tier == Tier.IMPORTANT]

    # Full MARL for critical agents only
    critical_actions = marl_policy(critical)

    # Important agents coordinate with critical + local optimization
    important_actions = greedy_with_critical(important, critical_actions)

    # Best-effort agents: pure greedy
    best_effort_actions = greedy_local(
        [a for a in agents if a.tier == Tier.BEST_EFFORT]
    )

    return merge_actions(critical_actions, important_actions, best_effort_actions)

Dynamic tier reassignment: Re-evaluate tiers periodically (e.g., every 500 steps) based on changing workload conditions. Use hysteresis to prevent thrashing: require sustained metric change over multiple windows before moving an agent between tiers.

3. Learned Hardware Affinity Matching

Problem: Task-agnostic scheduling sends compute-intensive vision tasks to CPUs, causing 2-5x slowdowns. Simple heuristics (e.g., "always GPU for vision") fail for mixed workloads with heterogeneous accelerators.

Solution: Learn a hardware affinity model that maps task features to optimal accelerator type (GPU, CPU, NPU, FPGA) based on historical performance data.

Feature space for affinity model:

Task type (classification, detection, segmentation, tracking)
Input resolution and frame rate
Model architecture (ResNet-50, YOLOv8, ViT, etc.)
Batch size requirements
Deadline slack (time remaining until SLA breach)
Current accelerator utilization levels

Training approach:

Collect execution traces: (task_features, hardware) → (latency, energy)
Train a lightweight classifier (e.g., gradient-boosted trees or small MLP) to predict optimal accelerator
Deploy as a pre-scheduling filter before the MARL decision layer

Expected improvement: Prevents compounding mismatch penalties. Combined with the other two mechanisms, contributes to the 1.45x multiplicative gain.

Combined Results

When all three mechanisms are deployed together (not independently):

Metric	Without DAOEF	With DAOEF	Improvement
Latency (200 agents)	735ms	280ms	62% reduction
Latency growth	Superlinear	Sub-linear (up to 250 agents)	Scalability
Energy consumption	117.5 MWh/yr	44.7 MWh/yr	62% savings
Multiplicative gain	1.0x (baseline)	1.45x over independent	Synergy confirmed

The 1.45x multiplicative gain is key evidence that the three mechanisms are genuinely co-designed: their combined effect exceeds the sum of individual gains.

Implementation Guide

Prerequisites

Multi-agent RL framework (e.g., PyMARL, PettingZoo, custom MADDPG)
Edge cluster with heterogeneous accelerators (GPU, CPU, NPU, FPGA)
Model serving infrastructure supporting intermediate activation access (e.g., TorchServe, Triton with custom hooks)
Telemetry pipeline for collecting execution traces

Step-by-Step Deployment

Phase 1: Instrumentation (1-2 weeks)

Add hooks to model serving pipeline to expose intermediate activations
Deploy telemetry collection for latency, energy, and cache hit metrics
Baseline: measure current performance without any DAOEF mechanism

Phase 2: Differential Caching (2-3 weeks)

Identify caching layer boundary (experiment with 2-3 layer splits)
Implement delta computation: store (input_hash, activation, timestamp)
Calibrate similarity threshold θ on validation workload
Deploy caching, measure hit rate improvement and accuracy impact

Phase 3: Action Space Pruning (2-3 weeks)

Classify agents into criticality tiers (start with static assignment)
Implement tier-aware coordination in the MARL policy
A/B test: compare full coordination vs tiered coordination
Add dynamic tier reassignment with hysteresis

Phase 4: Hardware Affinity (2-4 weeks)

Collect execution traces across all accelerator types
Train affinity classifier on historical data
Deploy as pre-scheduler before the MARL decision layer
Monitor and retrain affinity model as workload patterns shift

Key Design Decisions

Caching layer boundary selection:

Earlier layer → more reuse opportunity, but larger activation size
Later layer → smaller activation, but less reuse (more task-specific)
Recommendation: split after the feature extraction backbone, before task heads

Tier size ratios (starting point for 150-agent deployment):

Tier 1: 10-20 agents (7-13%) — critical coordination group
Tier 2: 40-60 agents (27-40%) — semi-coordinated group
Tier 3: remaining agents — independent greedy

Affinity model retraining cadence:

Retrain weekly or when workload drift detected (KL divergence > threshold)
Maintain a shadow model for safe rollout testing

Pitfalls

Cascade risk from tier misclassification: If too many agents are assigned to Tier 3, quality degradation may be unacceptable. Start conservative (larger Tier 1 and 2) and shrink tiers as you validate performance.
Cache staleness in high-churn workloads: If input distribution shifts rapidly (e.g., sudden weather change affecting camera feeds), cached activations become invalid. Implement TTL-based eviction (e.g., 30-60s).
Affinity model overfitting: If trained on narrow workload patterns, the model may make poor predictions for edge cases. Maintain a fallback heuristic (e.g., "vision tasks → GPU unless NPU utilization > 90%").
Multiplicative gain is not guaranteed: The 1.45x result depends on all three mechanisms being correctly implemented and tuned. Deploying only one or two mechanisms yields sublinear improvements.
Threshold calibration drift: The similarity threshold θ for caching may need recalibration as models are updated or input distributions shift. Monitor cache hit rate and accuracy loss continuously.
Coordination overhead at tier boundaries: Agents at the boundary between Tier 1 and Tier 2 may experience inconsistent policies. Use smooth transitions (weighted combination) rather than hard cutoffs.

Related Skills

knowledge-graph-ops
arxiv-to-skill-research-workflow