probabilistic-compositional-inference

name: probabilistic-compositional-inference description: Probabilistic Compositional Inference methodology for coupled engineered systems - graph-based architecture for uncertainty-aware inverse inference version: 1.0 created: 2026-05-29 source: arXiv:2605.27544 authors: Esmaeil Ghorbani, Jürgen Hackl (Princeton University) tags: - systems-engineering - digital-twins - uncertainty-quantification - distributed-inference - message-passing - coupled-systems - infrastructure activation_keywords: - coupled systems - subsystem inference - distributed estimation - uncertainty propagation - digital twins - message passing - compositional inference - interface messages related_skills: - digital-twin-multi-agent-consensus - distributed-control-prototyping-framework - equation-free-digital-twins

Probabilistic Compositional Inference for Coupled Engineered Systems

Core Contribution

Probabilistic Compositional Inference (PCI) is a graph-based architecture for solving inverse problems in coupled engineered infrastructure systems. It enables scalable, uncertainty-aware inference across heterogeneous subsystems by representing systems as directed graphs of interacting components, each maintaining local models and estimators while coupling is handled through probabilistic interface messages.

Key Innovation: Exploits subsystem structure as an inferential resource rather than treating it as a computational obstacle, transforming coupled inverse problems from monolithic global estimation to distributed message passing.

Methodology

1. Graph-Based System Representation

Directed Graph Structure:

Nodes: Subsystems with local models $\mathcal{M}_i$ and estimators maintaining posteriors over states and parameters
Edges: Interface laws $h_{ji}$ transforming outgoing interface variables to incoming coupling effects
Interface variables: physically meaningful quantities (forces, flows, currents, torques)

Local Model Diversity:

Mechanistic models (physics-based equations)
Data-driven surrogates (learned from data)
Hybrid models (combining mechanistic + learned)
Different estimator classes per subsystem (UKF, EKF, particle filters)

2. Probabilistic Interface Messages

Three Interface Variants:

Deterministic: Mean-only messages (uncertainty-blind)
Probabilistic: Mean + covariance (uncertainty-aware)
Learned: Data-driven interface laws when physics incomplete

Uncertainty Propagation Mechanism:

# Interface force variance from posterior covariance
var(F_b) = a^T (P_s1 + P_s2) a

# Where:
# - a: interface coefficients (stiffness, damping)
# - P_s1, P_s2: interface-state covariance submatrices
# - F_b: coupling force exchanged across interface

Key Insight: Deterministic messages recover point estimates but yield miscalibrated posteriors; probabilistic messages restore calibration by propagating uncertainty through interface laws.

3. Message Passing Architecture

Jacobi Schedule:

Each subsystem updates independently at each time step
Exchanges posterior means AND covariances of interface variables
Reconstructs coupling force with propagated uncertainty
No global augmented state or system-wide covariance matrix

Computational Scaling:

Centralized UKF: $O(n^3)$ where $n$ = total system size
PCI: $O(N \cdot S_{max})$ where $N$ = subsystems, $S_{max}$ = max subsystem size
Empirical scaling: approximately linear for infrastructure networks (9-300 buses)

4. Hierarchical Composition

Multi-Level Graph Embedding:

Subsystem graphs can embed within larger system graph nodes
Same Jacobi message-passing operates at all levels
Intra-subsystem and inter-subsystem interfaces handled uniformly
Cross-scale uncertainty propagation preserved

Example Architecture:

Level 1: Power grid network (generator buses as subsystems)
  └── Level 2: Each generator = turbine system (5 subsystems)
       ├── Hydraulic NARX surrogate
       ├── PID governor (deterministic)
       ├── EKF rotational dynamics
       ├── Kalman filter generator vibration
       └── UKF runner dynamics

Implementation Steps

Step 1: System Decomposition

Identify subsystem boundaries based on:
- Physical interfaces (forces, flows, currents)
- Computational boundaries (bounded state dimension)
- Observational boundaries (sparse sensing locations)
Partition strategy:
- Generator-seeded partitioning (for power grids)
- Admittance-weighted greedy expansion
- Max subsystem size constraint: $S_{max} \approx 5-15$ buses

Step 2: Local Model Specification

For each subsystem $i$:

class SubsystemModel:
    def __init__(self, physics_type):
        self.model_type = physics_type  # 'mechanistic', 'data-driven', 'hybrid'
        self.estimator_class = select_estimator(physics_type, data_availability)
        self.interface_vars = identify_interface_variables()
        
    def local_update(self, measurements, interface_messages):
        # 1. Predict using local model
        # 2. Incorporate interface messages as inputs
        # 3. Update posterior with local measurements
        # 4. Extract interface posterior (mean + covariance)
        return posterior_states, posterior_params, interface_posterior

Step 3: Interface Law Specification

Known Physics (analytical interface law):

# Linear spring-damper interface
def interface_force(s1_state, s2_state):
    # Relative displacement and velocity
    delta_x = s1_state.x_interface - s2_state.x_interface
    delta_v = s1_state.v_interface - s2_state.v_interface
    
    # Coupling force
    F_b = k * delta_x + c * delta_v
    return F_b

def interface_uncertainty(P_s1, P_s2, k, c):
    a = [k, c]  # Interface coefficients
    var_F = a^T @ (P_s1 + P_s2) @ a
    return var_F

Incomplete Physics (learned interface):

# SINDy-based interface identification
def learn_interface(interface_data):
    library = ['linear', 'cubic', 'dissipative']
    sparse_model = SINDy(interface_data, library)
    dominant_coeffs = sparse_model.active_coefficients()
    # Returns: k_hat, c_hat, higher_order_corrections
    return interface_model

Step 4: Message Passing Loop

def probabilistic_compositional_inference(system_graph, measurements):
    # Initialize subsystem posteriors
    for subsystem in system_graph.nodes:
        subsystem.initialize_prior()
    
    # Jacobi iteration over time
    for t in time_steps:
        # Phase 1: Predict step
        for subsystem in system_graph.nodes:
            subsystem.predict()
        
        # Phase 2: Interface message exchange
        messages = {}
        for edge in system_graph.edges:
            sender = edge.source
            receiver = edge.target
            
            # Extract interface posterior from sender
            interface_mean = sender.interface_state_mean()
            interface_cov = sender.interface_state_covariance()
            
            # Apply interface law
            coupling_effect = edge.interface_law(interface_mean)
            coupling_uncertainty = propagate_uncertainty(interface_cov, edge.coefficients)
            
            messages[receiver] = {
                'mean': coupling_effect,
                'variance': coupling_uncertainty
            }
        
        # Phase 3: Update step
        for subsystem in system_graph.nodes:
            incoming_messages = messages[subsystem]
            subsystem.update(
                local_measurements[t],
                interface_input=incoming_messages['mean'],
                interface_noise_var=incoming_messages['variance']
            )
    
    return all_posteriors

Step 5: Uncertainty Calibration Verification

def check_calibration(posteriors, ground_truth):
    # Empirical coverage vs nominal credible level
    nominal_levels = [0.68, 0.95, 0.99]
    empirical_coverage = compute_coverage(posteriors, ground_truth, nominal_levels)
    
    # Well-calibrated: empirical ≈ nominal
    # Undercovered: empirical < nominal (deterministic messages)
    # Overconservative: empirical > nominal
    
    return calibration_metrics

Validation Results

Case Study 1: 4-DOF Mass-Spring-Damper Chain

Setup:

Two subsystems connected by spring-damper interface
Sparse boundary sensing (only boundary acceleration)
Unknown parameter: $k_4$ stiffness

Results (three matched estimators):

Metric	Centralized UKF	Deterministic Jacobi	Probabilistic Jacobi
State RMSE	4.07e-5	2.02e-4	1.05e-4
Parameter NRMSE	2.47e-2	4.16e-3	4.65e-3
95% Coverage	1.00	0.83 ⚠️	1.00 ✓
68% Coverage	1.00	0.66 ⚠️	1.00 ✓
Pred. NLL	-4.29e1	4.98e3 ⚠️	-5.25e1 ✓

Learned Interface:

SINDy recovery: $\hat{k} = 5.61 \times 10^4$ N/m (12.2% error), $\hat{c} = 3.31 \times 10^2$ Ns/m (10.3% error)
Preserves calibration: 95% coverage = 1.00 ✓

Case Study 2: IEEE Power Grid Benchmarks (9-300 buses)

Partitioning:

Generator buses seed subsystems
Greedy expansion up to $S_{max}=5$ buses
Local UKF on 15-dimensional augmented state

Results:

State/parameter accuracy matches centralized UKF across all sizes
Runtime: centralized $O(n^3)$ vs distributed $O(N \cdot S_{max})$
Parallel projection: sequential cost ÷ subsystem count

Case Study 3: Multi-Physics Turbine + Grid

Hierarchical Composition:

5-subsystem turbine: hydraulic (NARX), governor (PID), rotation (EKF), vibration (KF), runner (UKF)
Embedded in IEEE 9-bus network (replacing generator buses)
Cross-scale disturbance propagation: load step → grid torque → turbine speed

Results:

Embedded vs standalone trajectory RMSE: $1.02 \times 10^{-3}$
Seal stiffness posterior indistinguishable ✓
Calibrated uncertainty survives hierarchical embedding ✓

Advantages vs Existing Methods

Approach	Heterogeneity	Sparse Sensing	Distributed Uncertainty	Scalability	Inverse Inference
Monolithic DA	❌	✓	❌ (global)	❌ $O(n^3)$	✓
Co-simulation	✓	❌	❌	✓	❌ forward only
Factor graphs	❌ homogeneous	✓	✓	✓	❌ fixed interactions
Graph NNs	✓	✓	❌	✓	❌ no posteriors
PCI	✓	✓	✓	✓ $O(N \cdot S_{max})$	✓

Applications

Digital Twins for Infrastructure

Power grid state estimation + parameter identification
Water distribution networks (flow + pressure inference)
Transportation systems (traffic + control)
Industrial process plants (multi-physics coupling)

Key Benefits

Real-time operation: Linear scaling enables infrastructure-scale inference
Heterogeneity: Different physics domains coexist (hydraulic, electrical, mechanical)
Calibration: Uncertainty-aware predictions for risk assessment
Modularity: Plug-and-play subsystem models (mechanistic → learned → hybrid)
Hierarchical: Multi-scale systems compose without posterior collapse

When to Use

Use probabilistic compositional inference when:

Coupled systems with physically meaningful interfaces (forces, flows, currents)
Sparse sensing leaving most states unobserved
Distributed uncertainty across subsystem boundaries
Heterogeneous components (different physics, model classes, fidelities)
Infrastructure scale (hundreds to thousands of components)
Inverse problems (state estimation, parameter identification, coupling inference)
Digital twin requirements (uncertainty-aware, real-time, modular)

Avoid when:

Single homogeneous system (use centralized methods)
Dense sensing (global state observable)
Forward simulation only (use co-simulation)
No uncertainty requirement (use deterministic methods)

Technical Pitfalls

Interface Covariance Neglect: Deterministic messages lose calibration
- Fix: Always propagate variance through interface laws
Cross-Covariance Assumption: Jacobi assumes independent subsystem interface states
- Fix: Accept approximation or use more sophisticated schedule
Interface Law Completeness: Unknown physics requires learned laws
- Fix: SINDy, neural networks, or hybrid identification
Partitioning Strategy: Too large subsystems lose scalability
- Fix: Bound $S_{max}$, use physics-guided partitioning
Hierarchical Collapse: Embedding can degrade local posteriors
- Fix: Verify calibration at each level, use probabilistic messages

References

Ghorbani & Hackl (2026) - "Subsystem Structure as an Inferential Resource for Coupled Engineered Systems" arXiv:2605.27544
Kapteyn et al. (2021) - "Probabilistic digital twins"
Willcox (2021) - "Imperative for predictive digital twins"
Brunton et al. (2016) - SINDy for sparse dynamics identification
Julier & Uhlmann (2004) - Unscented Kalman Filter