name: lvsa-tuning description: Tune LVSA for quality vs speed. Use when adjusting sparsity_scale, choosing window_size and n_first_frames, deciding when --rotate-keyframes pays off, composing LVSA with RIFLEx, or hitting a quality regression and needing to back off sparsity.
LVSA Tuning
The three primary knobs
| Knob | What it controls | When to touch it |
|---|---|---|
reference_latent_frames |
Per-query attention budget anchor | Set once per model (Wan=21, HV=33, Cog=13). Don't change at runtime. |
sparsity_scale |
Multiplier on the budget | The runtime quality/speed dial. Default 1.0; lower = sparser. |
window_size, n_first_frames |
Local-window geometry | Usually leave at defaults (12 frames / 4 frames). Only touch if you want a tighter floor. |
sparsity_scale — the headline dial
LVSA_SPARSITY_SCALE (env var) or --sparsity-scale (CLI). Scales the auto-keyframe scheduler's per-query budget.
scaled_ref = max(n_first + 1, int(reference_frames × sparsity_scale))
target_attended = min(scaled_ref, T_lat)
Empirical results on HunyuanVideo at 129 frames (training reference, single-prompt "dog"):
sparsity_scale |
Per-step | Speedup vs dense | VQeval composite | VQeval loop |
|---|---|---|---|---|
| (dense baseline) | 44.0 s | — | 57.6 | 32.6 |
0.5 (aggressive) |
18.3 s | 2.40× | 65.2 (+7.6) | 73.6 (+41.0) |
1.0 (default) |
22.5 s | 1.96× | 61.3 (+3.7) | 63.0 (+30.4) |
Rule of thumb
| Goal | sparsity_scale |
Why |
|---|---|---|
| Match dense quality at training reference, take implementation speedup | 1.0 |
At T_lat ≤ ref this collapses to kfi=1 (fully dense). Speedup comes from bypassing native attention overhead. |
| Maximum speedup at training reference | 0.5 |
Engages pattern-driven sparsity even at T_lat=ref. Big loop-quality gains; ~5pt drop on dynamic_quality. |
| Aggressive extrapolation, OOM-prevention at 3×+ horizon | 0.5 |
Shrinks compact-K buffer, helps fit on 80 GB. |
| Conservative quality at extrapolation | 0.75 |
Reduces sparsity gradient; less speedup but keeps motion intact. |
Important nuances
- At T_lat ≤ reference, any
sparsity_scale ≥ 1.0collapses tokfi=1(fully dense). The visible speedup is implementation efficiency only. sparsity_scale = 2.0is equivalent to1.0at T_lat ≤ reference (both give kfi=1). The conservative knob is meaningful only at extrapolation lengths.sparsity_scale = 0.5activates real pattern sparsity even at training reference: HV's budget shrinks from 33 to 16 latents at 1×, giving ~52% coverage.- The large loop_quality gain at
s=0.5comes from--rotate-keyframesdithering the attention pattern each step. Disable rotation and the loop gain disappears.
window_size + n_first_frames
Defaults:
window_size = 12video frames =3latent frames (W=3)n_first_frames = 4video frames =1latent frame (n_first=1)
Floor of attended frames per query: 2W+1 + n_first = 8 latent frames.
When to reduce W: never, unless your reference_latent_frames is below the floor. The defaults are tuned for current models.
When to increase W (e.g. W=4):
- Motion-heavy prompts losing
dynamic_qualityat extension — bigger window = more long-range mixing inside each query's attended set. - Costs ~10% wall time per
W += 1.
Should I use --rotate-keyframes?
| At length | Without rotation | With rotation |
|---|---|---|
| T_lat ≤ reference | No effect (kfi=1 means every frame is a global anyway) | No effect |
| Slight extension (T ≈ 1.5×) | Static keyframes can introduce period artifacts | Smoother |
| Heavy extension (T ≥ 3×) | Output starts to loop / freeze | Strongly preferred — this is the mechanism that prevents the "frozen video" failure mode |
Default --rotate-keyframes on whenever you're extending. Off at training horizon adds nothing.
Composing with RIFLEx
RIFLEx rescales the RoPE frequencies to extrapolate beyond the training horizon. It's orthogonal to LVSA (RoPE-only, no attention compute change) and stacks cleanly:
python examples/wan_generate.py \
--model /path/to/Wan2.1-T2V-1.3B-Diffusers \
--prompt "..." \
--num-frames 321 \
--lvsa --flashinfer --rotate-keyframes --auto-keyframes \
--riflex --riflex-s 4.0
At extension lengths RIFLEx + LVSA-FI is the recommended recipe. On the SotA grid (Wan 1.3B, 5 prompts):
| Horizon | LVSA-FI alone | LVSA-FI + RIFLEx |
|---|---|---|
| 2× | 1.43× faster than Dense | ~same speed, slight quality bump |
| 4× | 2.41× faster than Dense | ~same speed, +1 VQeval |
RIFLEx adds zero measurable wall-time overhead (verified: 0.99–1.00× Dense).
Verifying engagement
After every run, the [LVSA] log line tells you exactly what the scheduler did:
[LVSA] kfi=6 global_count=14 attended_per_frame=21/81
kfi=6— every 6th frame is a periodic global anchor (auto-derived)global_count=14— total global frames in the pattern (n_first + periodic)attended_per_frame=21/81— each query attends to 21 frames out of 81 → 74% sparsity
For non-default geometry, use the inline helper in docs/tuning.md to compute the budget yourself.
Diagnostics
| Symptom | Likely root cause | Fix |
|---|---|---|
| No quality improvement vs Dense | sparsity_scale too high at training horizon |
Drop to 0.5 |
| Motion quality regressed | Window too small for fast-motion prompt | Try --window-size 16 (W=4) |
| Video loops at extension | --rotate-keyframes not set |
Add the flag |
attended_per_frame=N/T shows N==T at extension |
reference_latent_frames too high |
Verify per-model value |
See lvsa-troubleshooting for the full failure-mode catalog.