titan-rendering-debugging

star 0

Use when diagnosing a Titan renderer artifact — flicker, ghosting, fireflies, banding, light leaks, seams, boil, NaN/Inf, aliasing, smear/trails, dark voids, shadow artifacts, GI instability, denoiser/TAA issues. The frame-isolation-first debugging method: capture 60Hz, diff frames / A-B vs classical, consult literature, isolate via systematic-debugging, propose (and prove) a fix, record empirically. NOT for landing the production fix (hand to titan-rendering-3d-agent), 2D rendering, or editor panels.

Solidor777 By Solidor777 schedule Updated 6/7/2026

name: titan-rendering-debugging description: Use when diagnosing a Titan renderer artifact — flicker, ghosting, fireflies, banding, light leaks, seams, boil, NaN/Inf, aliasing, smear/trails, dark voids, shadow artifacts, GI instability, denoiser/TAA issues. The frame-isolation-first debugging method: capture 60Hz, diff frames / A-B vs classical, consult literature, isolate via systematic-debugging, propose (and prove) a fix, record empirically. NOT for landing the production fix (hand to titan-rendering-3d-agent), 2D rendering, or editor panels.

Titan Rendering Debugging

The method for diagnosing a Titan renderer artifact. Stable; the empirical record lives in docs/rendering/RENDERING_BUG_HISTORY.md and the bench/instrument detail in references/.

Required skill chain (load all three, in order, every time)

  1. titan-rendering-core — frame-graph, DevicePool, RenderTarget, HDR, tonemap conventions.
  2. titan-rendering-3d — deferred chain, Stratum GI, ReGIR, SVGF/A-SVGF, TAA, shadows.
  3. realtime-rendering — generic real-time rendering literature + clean-room provenance rule.

These are hard dependencies. Skipping any = ungrounded work. The provenance rule from realtime-rendering is load-bearing for IP posture (never paste proprietary source into committed files).

The debug loop

  1. Reproduce + characterize. Get the artifact on screen with a known bench command (see references/bench-catalog.md). State what is seen, where, and under what motion.

  2. Frame isolation FIRST (lead technique — do this before trusting any aggregate metric). Capture a 60Hz frame sequence and:

    • Diff consecutive frames to localize temporal artifacts (flicker, boil, swim, ghosting).
    • A/B against classical lighting (or against the feature toggled off) to localize which pass owns the artifact.
    • Test across MOTION REGIMES, not one frame — for any denoiser / temporal / sampling change, evaluate the static hold AND the rotation (yaw) at minimum (bench: static = Phase 6 t12.5–15; yaw = Phases 7–8 t15–25; dolly = Phase 4–5 t7.5–12.5). A change can be worse in one regime and dramatically better in another: 2026-06-05 the de-grid bilateral read as a null/slightly-worse on a static converged frame (temporal already did the work) but was a clear win during yaw (temporal reprojection is weakest in rotation, so the spatial pass fills the gap). A single-regime A/B can invert the truth. During rotation, per-frame frame-diff metrics are invalid (geometric motion dominates ~100:1) → use a frame-aligned side-by-side video (perceive-together) as the judge, plus per-frame within-frame spatial stipple (motion-free). Aggregate metrics (mean luma, global flicker score) are spatially blind to narrow artifacts, and per-frame magnitude metrics are blind to motion swim/crawl (the eye grades temporal coherence under motion, not per-frame std) — both have produced repeated misdiagnoses here (see RENDERING_BUG_HISTORY.md 2026-05-28 F2 seam; 2026-06-05 de-grid B). Per-ROI / per-pixel inspection and perceive-together beat a global number.
  3. Consult the literature. Use realtime-rendering for the canonical algorithm and its known failure modes. Match the observed artifact to a documented failure class before hypothesizing.

  4. Isolate via systematic-debugging. One hypothesis → gather empirical evidence (frame-diffs, VIZ modes, dumped scalars, toggle A/B) → confirm or kill → next. Do not skip to a fix.

  5. Consult the literature again to derive candidate fixes for the confirmed cause.

  6. Prove the fix, then record + hand off. Prove a candidate empirically in a throwaway worktree (capture 60Hz + classical A/B + bench numbers), then DISCARD the worktree. Append a RENDERING_BUG_HISTORY.md entry. The production landing is titan-rendering-3d-agent's job, not yours.

Durable symptom patterns (consult before hypothesizing)

  • pre-TAA tap is the denoiser-vs-estimator discriminator. --capture-target pre-taa taps the radiance_raw edge after StratumComposite, BEFORE TAA. If the artifact is ABSENT pre-TAA → TAA/denoiser owns it (e.g. 2026-06-20 static near-wall dark annulus = TAA YCoCg neighbourhood-stats corruption by a bright-core outlier; fix = depth/normal-gate the 3×3 stats, Bitterli 2020 §5). If the artifact is PRESENT pre-TAA → it is upstream (estimator / shadow / shading). Run this tap EARLY; it splits the pipeline in one shot and kills whole hypothesis branches.

  • Concentric RING shadows / a dark-then-light halo as a light nears or passes THROUGH thin geometry = SDF sphere-march thin-occluder overstep, NOT reuse bias and NOT TAA. shadow_sdf_global.wgsl steps by max(d, MIN_STEP); the nearest-seed distance reports ≈voxel_size one voxel from a ~1-voxel wall, so a phase-dependent step jumps clean over it → iso-phase rings (2026-06-20, FIXED c5af38b8). Isolate: TITAN_STRATUM_SHADOW_TERM_DISABLED=1 (rings gone ⇒ shadow term), --denoiser-tier low (no SDF march ⇒ gone), TITAN_SHADOW_SDF_HIT_THRESHOLD_M sweep (rings→leak is the overstep tell). Fix = MAX_STEP clamp (clamp(max(d,MIN_STEP), MIN_STEP, vox*0.5), Nyquist / Quilez 2010) + forward-only golden-ratio-dithered march start. A dark annulus is the textbook ReSTIR biased-reuse signature (Bitterli 2020 §4), which makes reuse the seductive-but-wrong lead — the REGIR_DIRECT=0 + --stratum-spatial-reuse off A/Bs falsified it here. Run the falsifying toggle before committing to the literature's first match.

  • --wall-pierce (probe) is the repro for through-wall shadow artifacts: camera held head-on 4 m from the wall, primary light swept in Z straight through the wall plane (crosses at the timeline midpoint). Decouples light-through-geometry from camera motion — far cleaner than the orbiting fleet.

  • Through-wall LIGHT LEAK (floor lit where a wall should shadow it) = the screen→SDF FORWARD-SKIP HANDOFF. shadow_screen.wgsl covers [0,2 m] then hands start_dist_m to shadow_sdf_global.wgsl, which skipped forward by HANDOFF_EPSILON_SCALE×voxel → flank dead-zone (skip oversteps real crossings) + center no-op (dist_to_light<2 m → SDF starts at the light → 0 steps). FIX = drop the forward-skip, full-march from a receiver normal-offset origin (render_pos + n×voxel) — the normal-offset is mandatory or the bare full-march self-occludes the whole floor (RAY_BIAS 0.05 m < 0.25 m voxel). Canon = Wright 2015 (R1/R3). MERGED 649c1cb3 (PR #25, perceptual gate ACCEPTED 2026-06-21 — the user accepted that R1+R3 reopens the grazing-corner over-occlusion; signed-SDF canonical fix is a future milestone). The five canon-ranged march params are now tier-gated RenderQuality knobs (MAX_MARCH_STEPS/HIT_THRESHOLD_M/MIN_STEP_M/ SDF_NORMAL_OFFSET_VOXELS/RAY_BIAS_M); TITAN_SHADOW_SDF_* envs override on top.

  • SDF corner/grazing OVER-occlusion (false dark band; SDF-off recovers it) is an UNSIGNED-SDF DATA artifact, NOT a march-parameter bug. Titan's unsigned SDF + positive HIT_THRESHOLD false-hits grazing rays (parallel-near a wall, d>0 but <ε); Wright canon uses a SIGNED SDF with sign-crossing termination (d≤0 = inside) so grazing never false-hits. Tell: ALL march-side single-signal discriminators (distance skip, near-hit rejection, grazing normal-vs-ray) fail to separate the false corner hit from a real wall crossing — they are geometrically identical in the same cascade. That identity IS the signature that the fix is in the DATA (signed SDF / mesh-voxelized seeding), not the march. Don't chase distance/angle levers once they're shown to overlap.

  • VIZ-decode hazard: some scalar taps (sdf_march_debug, etc.) store channels LINEARLY in the .ppm, not Reinhard+gamma. Applying the Rgba16Float Reinhard inversion to a linear channel gives a wrong reading (here 16× too small → a false "near-receiver" distance). Confirm the tap encoding before trusting a decode; cross-check R/255×range.

Debug-tool durability rules (apply to every instrument you add)

  1. Durable / reusable. No throwaway hacks. Build a VIZ mode, an env-tunable override, or a metric script the next session can re-run. Follow the existing TITAN_* VIZ-mode convention (see references/instrument-catalog.md).
  2. Never in compiled / editor / release builds without explicit permission. Gate behind the debug-dumps cargo feature OR a TITAN_* env, default-off, zero-cost and DCE'd when absent. A no-feature build must be byte-identical with the instrument absent. If you cannot keep it out of the runtime build, STOP and ask before adding it. Watch the gated-off trap: an if FLAG {…} else {…} is NOT inert when FLAG is false — the else still runs.
  3. Document on exit. Every new/changed instrument gets an entry in references/instrument-catalog.md (or bench-catalog.md) before the cycle closes.

Bug-history protocol

  • Before diagnosing: read docs/rendering/RENDERING_BUG_HISTORY.md — the artifact may be known.
  • After proving: append an entry using the schema there. CONFIRMED = isolation-proven only; everything else goes under UNCONFIRMED with its evidence gap. Empirical findings go to the bug-history; implementation-pattern / test learnings go to docs/agent-kb/titan-rendering-3d-agent/.

Pass manifest — first stop for per-pass I/O

Before hypothesizing about what a pass reads/writes, open docs/rendering/PASS_CATALOGUE.md (generated from crates/titan-rendering-3d/src/manifest.rs, debug-dumps-gated). It lists confirmed inputs, outputs, and dispatch kind for all 32 passes. Regenerate with:

UPDATE_GOLDEN=1 cargo nextest run -p titan-rendering-3d --features debug-dumps export

Full manifest invariants (catalogue design rules, verified pass order, SVGF defaults): KB docs/agent-kb/titan-rendering-3d-agent/sp-a-pass-manifest.md.

SP-D tap operator recipe (2026-06-03)

dump_stratum_stages and --dump-stages are DELETED. Use --tap / TITAN_TAP for all stage inspection.

Quick start:

# Tap diffuse temporal history + pre-firefly radiance, frames 60-65 only, once per id:
TITAN_TAP=diffuse_history,radiance_pre_firefly TITAN_TAP_WINDOW=60..65 TITAN_TAP_ONCE=1 \
  cargo run -p titan-render-bench-probe --features debug-dumps -- --cell deferred-stratum --denoiser-tier mid

# Bench CLI equivalent (TapArgs):
cargo run -p titan-render-bench-probe --features debug-dumps -- \
  --cell deferred-stratum --denoiser-tier mid \
  --tap diffuse_history,radiance_pre_firefly --dump-window 60..65 --dump-once --dump-label my-run

Env vars (SP-D additions):

env var effect
TITAN_TAP_WINDOW=start..end Half-open [start, end) frame window on absolute Renderer3d::frame_index. Supersedes TITAN_DUMP_AT_FRAME for range control.
TITAN_TAP_ONCE=1 Fire each selected id at most once per run. Keyed by id; fires exactly once.
TITAN_DUMP_SUBDIR=<name> Output subdir: titan-dumps/<name>/{basename}-frame{N}.{ext}. Absent = flat.

Bench CLI flags (TapArgs, all 3 benches):

  • --tap <ids> — comma-separated resource ids or "all"
  • --dump-buffers — selects all 11 BUFFER_RESOURCES (union with --tap)
  • --dump-window <s..e> — frame window on absolute frame_index
  • --dump-once — single-shot per id
  • --dump-label <name> — output subdir

Numpy decode by format:

format decode
Rgba16Float (PPM) Reinhard+gamma-encoded; PIL.Image.open(path) gives 8-bit sRGB
Rgba16Float (.bin) np.frombuffer(open(f,'rb').read(), dtype=np.float16).reshape(H,W,4).astype(np.float32)
R32Float / R32Uint (.bin) np.frombuffer(..., dtype=np.float32 / np.uint32).reshape(H,W)
Rg16Float (.bin) np.frombuffer(..., dtype=np.float16).reshape(H,W,2)
Rg8Unorm (.bin) np.frombuffer(..., dtype=np.uint8).reshape(H,W,2)
Rg32Float (.bin) np.frombuffer(..., dtype=np.float32).reshape(H,W,2) — SP-D moments

Hook-internal tap ids (SP-D): radiance_pre_firefly, probe_field_viz, diffuse_history, diffuse_moments, diffuse_variance. Mid/High only for temporal ids; None at Low/Potato = silent no-op. SVGF temporal is diffuse-only. à-trous per-iteration: already tapped under diffuse_raw (no separate ids).

--dump-window is ABSOLUTE frame_index. Probe auto-maps --capture-start/end/fps to a matching window when --dump-window absent. Empirically: capture-start=1.0, fps=60 → frame 60.

SVGF engine defaults (verified 2026-06-02)

  • atrous_passes default = 4 (NOT 5; changed at Artifact-A fix 2f19515b). Source: quality.rs:1003.
  • σ_L default = 6.0 (NOT 4.0; raised at m49-l-cellflicker-fix-g). Source: quality.rs:1011.
  • σ_L_specular default = 2.5 (NOT 1.5). Source: quality.rs:1012.

The bench-catalog.md table rows show the stale (5 / 4.0 / 1.5) defaults in the flag descriptions; the SP-E note at the top of that file documents the correct values. Trust the code (quality.rs) and the SP-E note, not the per-flag table defaults.

wgpu / naga gotchas

No installed wgpu skill. For wgpu/naga issues: known traps live in the KB (e.g. R8Unorm/R8Uint/ R16Float are NOT in the WebGPU baseline writable-storage set — use R32*; >4 storage textures per stage needs Limits::default() not downlevel_defaults()). For anything beyond, use WebSearch/WebFetch against the wgpu/naga docs + issue tracker and cite the source.

  • draw_indexed_indirect with first_instance != 0 is silently dropped unless the device enables Features::INDIRECT_FIRST_INSTANCE. Signature: in a GPU-driven path that draws each cull/draw group with a separate indirect command carrying first_instance = group base, only the first group (base 0) renders; every 2nd+ group produces zero fragments. Decisive isolation = G-buffer depth tap (--tap gbuffer_depth …): the missing group's region stays at the cleared depth (1.0) = never rasterized, vs written = drawn-but-dark. Confirm by forcing first_instance=0 on all commands → everything renders. (2026-06-20, a2b679d3.) Enabled now, but watch for it on any new indirect path / a different device. Pairs with: cull AABBs must use the real mesh extent (group.mesh.aabb()), never a unit cube scaled by the instance matrix — identity-transform meshes (extent baked in vertices) otherwise cull against a 1 m origin box.

Anti-scope

  • Landing the production fix → titan-rendering-3d-agent.
  • 2D rendering / editor panels → out of scope.
  • IP: never cite or paste proprietary engine source per the realtime-rendering provenance rule.
Install via CLI
npx skills add https://github.com/Solidor777/Titan --skill titan-rendering-debugging
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator