name: data-center-ai-workload-power description: "Data center AI workload power profiling and infrastructure planning. Methods for measuring generative AI workload power consumption at high resolution, scaling to whole-facility energy demand, and planning infrastructure for grid connection, microgrids, and on-site generation. Triggers: data center power, AI energy consumption, GPU power profiling, facility infrastructure planning, generative AI workload, H100 power measurement, MLCommons benchmark power."
Data Center AI Workload Power Profiling
Methods for measuring generative AI workload power consumption and scaling to whole-facility energy demand for infrastructure planning.
Overview
Paper: "Measurement of Generative AI Workload Power Profiles for Whole-Facility Data Center Infrastructure Planning" (arXiv: 2604.07345v1, April 2026)
Key contribution: Bridges the gap between high-resolution workload power measurements and whole-facility energy demand estimation.
Workload Power Measurement
Benchmark-Based Profiling
Use standardized benchmarks for reproducible profiling:
- MLCommons benchmarks for model training and fine-tuning
- vLLM benchmarks for inference workloads
Measurement Setup
- NVIDIA H100 GPUs (representative hardware)
- 0.1-second resolution power sampling
- Capture temporal fluctuations during training/fine-tuning/inference
Power Profile Components
- GPU power - Primary computation energy
- Memory power - Data movement energy
- Cooling overhead - Thermal management energy
- Auxiliary systems - Network, storage, management
Whole-Facility Energy Modeling
Bottom-Up Event-Driven Model
Scale workload profiles to facility level:
- Aggregate individual workload profiles
- Model temporal distribution of user requests
- Include infrastructure overhead (cooling, power distribution)
- Capture realistic fluctuations
Energy Profile Characteristics
- Temporal fluctuations - Driven by AI workloads and user behavior
- Peak demand estimation - Maximum power draw periods
- Average demand - Baseline energy consumption
- Demand variability - Range of power fluctuations
Infrastructure Planning Applications
Grid Connection Planning
Use energy profiles to determine:
- Required power capacity from grid
- Peak demand management strategies
- Grid stability considerations
On-Site Energy Generation
Evaluate options:
- Solar/wind capacity sizing
- Battery storage requirements
- Backup generation sizing
Distributed Microgrids
Design resilient power infrastructure:
- Local generation capacity
- Load balancing strategies
- Failover mechanisms
Implementation Approach
Data Collection
# Power measurement workflow
1. Configure H100 GPU measurement setup
2. Run MLCommons/vLLM benchmark workloads
3. Record power at 0.1s resolution
4. Capture full workload duration
5. Export power profile data
Scaling Methodology
Workload Power → Facility Energy:
1. Sum individual workload profiles
2. Apply temporal user-behavior model
3. Add infrastructure overhead factors
4. Generate facility-level time series
Key Metrics
| Metric | Description | Application |
|---|---|---|
| Peak Power (W) | Maximum GPU power draw | Grid capacity planning |
| Average Power (W) | Mean power consumption | Energy cost estimation |
| Energy (J) | Total energy per workload | Operating cost analysis |
| Power Variance | Power fluctuation range | Infrastructure stability |
Practical Applications
- New data center design - Size infrastructure for AI workloads
- Capacity expansion - Plan for additional GPU deployments
- Cost estimation - Predict energy costs for AI operations
- Sustainability - Evaluate renewable energy integration
Reference
- Paper: arXiv:2604.07345v1
- PDF: https://arxiv.org/pdf/2604.07345v1
- Category: eess.SY (Systems and Control)