name: adding-hardware-sku description: Use when adding a new accelerator (GPU/TPU/Trainium/Gaudi), interconnect fabric, or multi-accelerator system to the llm-calc database (calc/src/data/{accelerators,interconnects,systems}.ts). Routes to the right sub-procedure based on whether the SKU is a new chip, a new fabric, or a new product composition. Invoke whenever hardware is being added — the failure modes (sparsity-inflated TFLOPS, confused per-direction vs aggregate BW, wrong variant in a system) all produce plausible-looking perf numbers that are silently wrong.
Adding a Hardware SKU
Three things live under hardware. Pick the right sub-procedure:
- Accelerator — new chip generation (H200, MI400, TPU v7). Adds an
AcceleratorSpecwith variants and operating points. - Interconnect — new fabric (NVLink 6, NVL Switch tray, IB-XDR, EFA v4). Adds an
InterconnectSpec. - System — new product (DGX Spark, GB300 NVL72, p6e-class cloud SKU). Composes existing accelerator + interconnect.
If multiple apply (a new chip + new fabric + new system land together), do them in order: accelerator → interconnect → system. Later layers reference earlier by id.
Source priority (all hardware)
- Vendor whitepaper / architecture brief — primary truth. NVIDIA architecture whitepaper PDFs, AMD CDNA briefs, Intel Gaudi product briefs, Google TPU papers, AWS Neuron docs.
- Vendor datasheet — HBM capacity, package power, form factor.
- Independent microbenchmark paper — for
achievableoperating points only. Seecalc/src/data/sources.tsfor the registry (arxiv-2501-12084 for Hopper, arxiv-2510-27583 for MI300X, etc.). - Cloud SKU page — for
availability.cloudson systems and to disambiguate variant labels (e.g. "AWS P5e" → which exact accelerator+variant).
Aggregator/marketing sites are not acceptable primary sources for TFLOPS, HBM BW, or fabric specs. See docs/data-philosophy.md for the reasoning.
A. New Accelerator
Files: calc/src/data/accelerators.ts, calc/src/data/sources.ts.
Schema in calc/src/engine/types.ts (AcceleratorSpec / AcceleratorVariant / AcceleratorOperatingPoint).
Shape:
{
id: 'mi400', // lowercase-kebab; stable across renames
name: 'AMD MI400',
vendor: 'AMD',
family: 'CDNA Next',
variants: [
{
id: 'oam-256', // form-factor + HBM capacity
label: 'OAM 256GB',
hbmCapacityGB: 256,
operatingPoints: [
{
id: 'peak', label: 'Peak',
tflops: { fp16: ..., bf16: ..., fp8: ..., int8: ..., fp4: ... },
hbmBandwidthGBs: ...
},
// Optional — only with a citable source:
{
id: 'achievable', label: 'Achievable',
tflops: { ... },
hbmBandwidthGBs: ...,
tflopsSources: ['arxiv-...'], // keys into SOURCES
bandwidthSources: ['nvbandwidth'],
asOf: '2026-Q3',
notes: 'one-line measurement context'
}
]
}
]
}
TFLOPS table — pitfalls (most common bugs land here)
- Sparsity multipliers: NVIDIA quotes TensorCore numbers at 2:1 sparse. The schema uses dense values. If a spec sheet says "989/1979 TFLOPS BF16 (with sparsity)", record the unsparse half.
- FP4 / FP6: Blackwell-class chips quote these with sparsity assumed. Same rule.
- INT8 vs FP8: identical throughput on most modern chips; record both if the vendor lists both.
- TF32: skip — not a serving dtype.
- Boost vs base clock: most vendor TFLOPS are boost-clock peak. Record those (matches the rest of the database).
Achievable operating points
Optional but valuable. Acceptable sources:
- mamf-finder MAMF table (PyTorch torch.mm sweeps)
- Microbenchmark papers on arxiv (arxiv-2501-12084 Hopper, arxiv-2510-27583 MI300X, arxiv-2512-02189 Blackwell, etc.)
- AMD MAFs blog, NVIDIA cuBLAS perf posts
Don't fabricate achievable numbers. Either cite a source or skip the operating point — peak-only entries are fine.
If you add a new source, register it in sources.ts first, then reference its key.
B. New Interconnect
File: calc/src/data/interconnects.ts.
Schema in calc/src/engine/types.ts (InterconnectSpec).
Shape:
{
id: 'nvlink-6',
name: 'NVLink 6',
vendor: 'NVIDIA',
generation: 'Gen6 (Rubin)',
perGpuBandwidthGBs: ..., // bidirectional aggregate per chip (vendor headline)
perDirectionGBs: ..., // half of perGpuBandwidthGBs unless quoted separately
linksPerGpu: ...,
perLinkGBs: ..., // per direction
topology: 'switched', // see InterconnectTopology
scale: 'intra-node', // see InterconnectScale
maxScaleUpGpus: 8,
sources: ['nvidia-nvlink'],
notes: '...'
}
Bandwidth conventions (read types.ts:58-72)
perGpuBandwidthGBsis the bidirectional aggregate number on vendor slides ("900 GB/s NVLink 4"). Halve it for ring all-reduce math.- For point-to-point fabrics (direct NVLink, IB):
perLinkGBs × linksPerGpushould equal the per-direction aggregate. Check arithmetic before committing. - For switched (NVSwitch, NVL72): the aggregate is what each chip can pump into the switch; bisection is a separate property (
contention.bisectionFactor).
Optional: contention model
Add contention: { bisectionFactor, oversubscription?, hopCostModel, singleHopUtilization } only with data. Hand-waving guesses are worse than omitting; the engine falls back gracefully. Tier overrides (tiers) are for measured collective performance — only with a citable source.
C. New System
File: calc/src/data/systems.ts.
Schema in calc/src/engine/types.ts (MultiAcceleratorSystem).
Pure composition — pick existing accelerator id+variant, existing interconnect id, set form factor, fill aggregates.
{
id: 'dgx-spark',
name: 'NVIDIA DGX Spark',
vendor: 'NVIDIA',
generation: 'GB10',
formFactor: 'node', // see SystemFormFactor
accelerator: { id: 'gb10', variantId: 'unified-128', count: 1 },
interconnectId: '...', // InterconnectSpec.id
scaleOutInterconnectId: '...',
scaleOutNicsPerNode: 1,
aggregate: {
totalHbmGB: 128, // = accelerator.count × variant.hbmCapacityGB
fabricBidirectionalTBs: ... // = interconnect.perGpuBandwidthGBs × count / 1000
},
availability: { onPrem: true, clouds: [...] },
notes: '...'
}
Sanity checks (where systems go wrong)
aggregate.totalHbmGBmust equalcount × hbmCapacityGB. Off-by-factor-of-ten is the most common bug.aggregate.fabricBidirectionalTBsmust equalperGpuBandwidthGBs × count / 1000. Same kind of math, easy to flub.availability.cloudsshould match the cloud SKU page exactly —aws,azure,gcp,oci,coreweave,lambda,crusoeare the common ones (full list inCloudProviderin types.ts).- Confirm
accelerator.variantIdexists in the referenced accelerator. TS won't catch this — it'd require a const-keyed lookup; reviewer needs to check.
Validation (all hardware)
npm test— won't catch numeric errors (no test asserts specific TFLOPS), but ensures imports resolve.npm run check— TS catches schema mismatches.npm run dev— open the UI, select the new accelerator / system, verify the perf panel renders without NaN and numbers are in the right order of magnitude vs. siblings (e.g. a new "H300" should land between H200 and B200; if it lands at 10× B200, you have a sparsity bug).
No TDD pattern here — data-only changes don't get unit tests. Schema changes do, but those go through the model-skill TDD flow because they touch the engine.
Anti-patterns
- Recording sparsity-inflated TFLOPS without halving. The most common bug, by a wide margin.
- Confusing
perGpuBandwidthGBs(aggregate) vsperDirectionGBs(half) on interconnects. - Inventing achievable numbers without a source citation.
- Picking the wrong accelerator variant in a system (SXM vs PCIe, 80GB vs 96GB).
- Skipping
sources/asOfbecause "the data is obvious". It rots; provenance is the only defense. - Adding a tier / contention model based on intuition rather than measurement. Empty is better than wrong.