name: titan-rendering-debugging description: Use when diagnosing a Titan renderer artifact — flicker, ghosting, fireflies, banding, light leaks, seams, boil, NaN/Inf, aliasing, smear/trails, dark voids, shadow artifacts, GI instability, denoiser/TAA issues. The frame-isolation-first debugging method: capture 60Hz, diff frames / A-B vs classical, consult literature, isolate via systematic-debugging, propose (and prove) a fix, record empirically. NOT for landing the production fix (hand to titan-rendering-3d-agent), 2D rendering, or editor panels.
Titan Rendering Debugging
The method for diagnosing a Titan renderer artifact. Stable; the empirical record lives in
docs/rendering/RENDERING_BUG_HISTORY.md and the bench/instrument detail in references/.
Required skill chain (load all three, in order, every time)
titan-rendering-core— frame-graph, DevicePool, RenderTarget, HDR, tonemap conventions.titan-rendering-3d— deferred chain, Stratum GI, ReGIR, SVGF/A-SVGF, TAA, shadows.realtime-rendering— generic real-time rendering literature + clean-room provenance rule.
These are hard dependencies. Skipping any = ungrounded work. The provenance rule from
realtime-rendering is load-bearing for IP posture (never paste proprietary source into committed files).
The debug loop
Reproduce + characterize. Get the artifact on screen with a known bench command (see
references/bench-catalog.md). State what is seen, where, and under what motion.Frame isolation FIRST (lead technique — do this before trusting any aggregate metric). Capture a 60Hz frame sequence and:
- Diff consecutive frames to localize temporal artifacts (flicker, boil, swim, ghosting).
- A/B against classical lighting (or against the feature toggled off) to localize which pass owns the artifact.
- Test across MOTION REGIMES, not one frame — for any denoiser / temporal / sampling change,
evaluate the static hold AND the rotation (yaw) at minimum (bench: static = Phase 6
t12.5–15; yaw = Phases 7–8 t15–25; dolly = Phase 4–5 t7.5–12.5). A change can be worse in one
regime and dramatically better in another: 2026-06-05 the de-grid bilateral read as a
null/slightly-worse on a static converged frame (temporal already did the work) but was a clear
win during yaw (temporal reprojection is weakest in rotation, so the spatial pass fills the gap).
A single-regime A/B can invert the truth. During rotation, per-frame frame-diff metrics are
invalid (geometric motion dominates ~100:1) → use a frame-aligned side-by-side video
(perceive-together) as the judge, plus per-frame within-frame spatial stipple (motion-free).
Aggregate metrics (mean luma, global flicker score) are spatially blind to narrow artifacts,
and per-frame magnitude metrics are blind to motion swim/crawl (the eye grades temporal
coherence under motion, not per-frame std) — both have produced repeated misdiagnoses here (see
RENDERING_BUG_HISTORY.md2026-05-28 F2 seam; 2026-06-05 de-grid B). Per-ROI / per-pixel inspection and perceive-together beat a global number.
Consult the literature. Use
realtime-renderingfor the canonical algorithm and its known failure modes. Match the observed artifact to a documented failure class before hypothesizing.Isolate via
systematic-debugging. One hypothesis → gather empirical evidence (frame-diffs, VIZ modes, dumped scalars, toggle A/B) → confirm or kill → next. Do not skip to a fix.Consult the literature again to derive candidate fixes for the confirmed cause.
Prove the fix, then record + hand off. Prove a candidate empirically in a throwaway worktree (capture 60Hz + classical A/B + bench numbers), then DISCARD the worktree. Append a
RENDERING_BUG_HISTORY.mdentry. The production landing istitan-rendering-3d-agent's job, not yours.
Durable symptom patterns (consult before hypothesizing)
pre-TAA tap is the denoiser-vs-estimator discriminator.
--capture-target pre-taataps theradiance_rawedge after StratumComposite, BEFORE TAA. If the artifact is ABSENT pre-TAA → TAA/denoiser owns it (e.g. 2026-06-20 static near-wall dark annulus = TAA YCoCg neighbourhood-stats corruption by a bright-core outlier; fix = depth/normal-gate the 3×3 stats, Bitterli 2020 §5). If the artifact is PRESENT pre-TAA → it is upstream (estimator / shadow / shading). Run this tap EARLY; it splits the pipeline in one shot and kills whole hypothesis branches.Concentric RING shadows / a dark-then-light halo as a light nears or passes THROUGH thin geometry = SDF sphere-march thin-occluder overstep, NOT reuse bias and NOT TAA.
shadow_sdf_global.wgslsteps bymax(d, MIN_STEP); the nearest-seed distance reports ≈voxel_size one voxel from a ~1-voxel wall, so a phase-dependent step jumps clean over it → iso-phase rings (2026-06-20, FIXEDc5af38b8). Isolate:TITAN_STRATUM_SHADOW_TERM_DISABLED=1(rings gone ⇒ shadow term),--denoiser-tier low(no SDF march ⇒ gone),TITAN_SHADOW_SDF_HIT_THRESHOLD_Msweep (rings→leak is the overstep tell). Fix = MAX_STEP clamp (clamp(max(d,MIN_STEP), MIN_STEP, vox*0.5), Nyquist / Quilez 2010) + forward-only golden-ratio-dithered march start. A dark annulus is the textbook ReSTIR biased-reuse signature (Bitterli 2020 §4), which makes reuse the seductive-but-wrong lead — theREGIR_DIRECT=0+--stratum-spatial-reuse offA/Bs falsified it here. Run the falsifying toggle before committing to the literature's first match.--wall-pierce(probe) is the repro for through-wall shadow artifacts: camera held head-on 4 m from the wall, primary light swept in Z straight through the wall plane (crosses at the timeline midpoint). Decouples light-through-geometry from camera motion — far cleaner than the orbiting fleet.Through-wall LIGHT LEAK (floor lit where a wall should shadow it) = the screen→SDF FORWARD-SKIP HANDOFF.
shadow_screen.wgslcovers [0,2 m] then handsstart_dist_mtoshadow_sdf_global.wgsl, which skipped forward byHANDOFF_EPSILON_SCALE×voxel→ flank dead-zone (skip oversteps real crossings) + center no-op (dist_to_light<2 m→ SDF starts at the light → 0 steps). FIX = drop the forward-skip, full-march from a receiver normal-offset origin (render_pos + n×voxel) — the normal-offset is mandatory or the bare full-march self-occludes the whole floor (RAY_BIAS 0.05 m < 0.25 m voxel). Canon = Wright 2015 (R1/R3). MERGED649c1cb3(PR #25, perceptual gate ACCEPTED 2026-06-21 — the user accepted that R1+R3 reopens the grazing-corner over-occlusion; signed-SDF canonical fix is a future milestone). The five canon-ranged march params are now tier-gatedRenderQualityknobs (MAX_MARCH_STEPS/HIT_THRESHOLD_M/MIN_STEP_M/SDF_NORMAL_OFFSET_VOXELS/RAY_BIAS_M);TITAN_SHADOW_SDF_*envs override on top.SDF corner/grazing OVER-occlusion (false dark band; SDF-off recovers it) is an UNSIGNED-SDF DATA artifact, NOT a march-parameter bug. Titan's unsigned SDF + positive
HIT_THRESHOLDfalse-hits grazing rays (parallel-near a wall, d>0 but <ε); Wright canon uses a SIGNED SDF with sign-crossing termination (d≤0 = inside) so grazing never false-hits. Tell: ALL march-side single-signal discriminators (distance skip, near-hit rejection, grazing normal-vs-ray) fail to separate the false corner hit from a real wall crossing — they are geometrically identical in the same cascade. That identity IS the signature that the fix is in the DATA (signed SDF / mesh-voxelized seeding), not the march. Don't chase distance/angle levers once they're shown to overlap.VIZ-decode hazard: some scalar taps (
sdf_march_debug, etc.) store channels LINEARLY in the.ppm, not Reinhard+gamma. Applying the Rgba16Float Reinhard inversion to a linear channel gives a wrong reading (here 16× too small → a false "near-receiver" distance). Confirm the tap encoding before trusting a decode; cross-checkR/255×range.
Debug-tool durability rules (apply to every instrument you add)
- Durable / reusable. No throwaway hacks. Build a VIZ mode, an env-tunable
override, or a metric script the next session can re-run. Follow the existingTITAN_*VIZ-mode convention (seereferences/instrument-catalog.md). - Never in compiled / editor / release builds without explicit permission. Gate behind the
debug-dumpscargo feature OR aTITAN_*env, default-off, zero-cost and DCE'd when absent. A no-feature build must be byte-identical with the instrument absent. If you cannot keep it out of the runtime build, STOP and ask before adding it. Watch the gated-off trap: anif FLAG {…} else {…}is NOT inert whenFLAGis false — theelsestill runs. - Document on exit. Every new/changed instrument gets an entry in
references/instrument-catalog.md(orbench-catalog.md) before the cycle closes.
Bug-history protocol
- Before diagnosing: read
docs/rendering/RENDERING_BUG_HISTORY.md— the artifact may be known. - After proving: append an entry using the schema there. CONFIRMED = isolation-proven only;
everything else goes under UNCONFIRMED with its evidence gap. Empirical findings go to the
bug-history; implementation-pattern / test learnings go to
docs/agent-kb/titan-rendering-3d-agent/.
Pass manifest — first stop for per-pass I/O
Before hypothesizing about what a pass reads/writes, open docs/rendering/PASS_CATALOGUE.md
(generated from crates/titan-rendering-3d/src/manifest.rs, debug-dumps-gated). It lists
confirmed inputs, outputs, and dispatch kind for all 32 passes. Regenerate with:
UPDATE_GOLDEN=1 cargo nextest run -p titan-rendering-3d --features debug-dumps export
Full manifest invariants (catalogue design rules, verified pass order, SVGF defaults):
KB docs/agent-kb/titan-rendering-3d-agent/sp-a-pass-manifest.md.
SP-D tap operator recipe (2026-06-03)
dump_stratum_stages and --dump-stages are DELETED. Use --tap / TITAN_TAP for all stage inspection.
Quick start:
# Tap diffuse temporal history + pre-firefly radiance, frames 60-65 only, once per id:
TITAN_TAP=diffuse_history,radiance_pre_firefly TITAN_TAP_WINDOW=60..65 TITAN_TAP_ONCE=1 \
cargo run -p titan-render-bench-probe --features debug-dumps -- --cell deferred-stratum --denoiser-tier mid
# Bench CLI equivalent (TapArgs):
cargo run -p titan-render-bench-probe --features debug-dumps -- \
--cell deferred-stratum --denoiser-tier mid \
--tap diffuse_history,radiance_pre_firefly --dump-window 60..65 --dump-once --dump-label my-run
Env vars (SP-D additions):
| env var | effect |
|---|---|
TITAN_TAP_WINDOW=start..end |
Half-open [start, end) frame window on absolute Renderer3d::frame_index. Supersedes TITAN_DUMP_AT_FRAME for range control. |
TITAN_TAP_ONCE=1 |
Fire each selected id at most once per run. Keyed by id; fires exactly once. |
TITAN_DUMP_SUBDIR=<name> |
Output subdir: titan-dumps/<name>/{basename}-frame{N}.{ext}. Absent = flat. |
Bench CLI flags (TapArgs, all 3 benches):
--tap <ids>— comma-separated resource ids or"all"--dump-buffers— selects all 11BUFFER_RESOURCES(union with--tap)--dump-window <s..e>— frame window on absoluteframe_index--dump-once— single-shot per id--dump-label <name>— output subdir
Numpy decode by format:
| format | decode |
|---|---|
Rgba16Float (PPM) |
Reinhard+gamma-encoded; PIL.Image.open(path) gives 8-bit sRGB |
Rgba16Float (.bin) |
np.frombuffer(open(f,'rb').read(), dtype=np.float16).reshape(H,W,4).astype(np.float32) |
R32Float / R32Uint (.bin) |
np.frombuffer(..., dtype=np.float32 / np.uint32).reshape(H,W) |
Rg16Float (.bin) |
np.frombuffer(..., dtype=np.float16).reshape(H,W,2) |
Rg8Unorm (.bin) |
np.frombuffer(..., dtype=np.uint8).reshape(H,W,2) |
Rg32Float (.bin) |
np.frombuffer(..., dtype=np.float32).reshape(H,W,2) — SP-D moments |
Hook-internal tap ids (SP-D): radiance_pre_firefly, probe_field_viz, diffuse_history,
diffuse_moments, diffuse_variance. Mid/High only for temporal ids; None at Low/Potato = silent no-op.
SVGF temporal is diffuse-only. à-trous per-iteration: already tapped under diffuse_raw (no separate ids).
--dump-window is ABSOLUTE frame_index. Probe auto-maps --capture-start/end/fps to a
matching window when --dump-window absent. Empirically: capture-start=1.0, fps=60 → frame 60.
SVGF engine defaults (verified 2026-06-02)
atrous_passesdefault = 4 (NOT 5; changed at Artifact-A fix2f19515b). Source:quality.rs:1003.- σ_L default = 6.0 (NOT 4.0; raised at m49-l-cellflicker-fix-g). Source:
quality.rs:1011. - σ_L_specular default = 2.5 (NOT 1.5). Source:
quality.rs:1012.
The bench-catalog.md table rows show the stale (5 / 4.0 / 1.5) defaults in the flag
descriptions; the SP-E note at the top of that file documents the correct values. Trust
the code (quality.rs) and the SP-E note, not the per-flag table defaults.
wgpu / naga gotchas
No installed wgpu skill. For wgpu/naga issues: known traps live in the KB (e.g. R8Unorm/R8Uint/
R16Float are NOT in the WebGPU baseline writable-storage set — use R32*; >4 storage textures
per stage needs Limits::default() not downlevel_defaults()). For anything beyond, use
WebSearch/WebFetch against the wgpu/naga docs + issue tracker and cite the source.
draw_indexed_indirectwithfirst_instance != 0is silently dropped unless the device enablesFeatures::INDIRECT_FIRST_INSTANCE. Signature: in a GPU-driven path that draws each cull/draw group with a separate indirect command carryingfirst_instance = group base, only the first group (base 0) renders; every 2nd+ group produces zero fragments. Decisive isolation = G-buffer depth tap (--tap gbuffer_depth …): the missing group's region stays at the cleared depth (1.0) = never rasterized, vs written = drawn-but-dark. Confirm by forcingfirst_instance=0on all commands → everything renders. (2026-06-20,a2b679d3.) Enabled now, but watch for it on any new indirect path / a different device. Pairs with: cull AABBs must use the real mesh extent (group.mesh.aabb()), never a unit cube scaled by the instance matrix — identity-transform meshes (extent baked in vertices) otherwise cull against a 1 m origin box.
Anti-scope
- Landing the production fix →
titan-rendering-3d-agent. - 2D rendering / editor panels → out of scope.
- IP: never cite or paste proprietary engine source per the
realtime-renderingprovenance rule.