name: wan-flf-video description: Build WAN 2.2 First-Last-Frame video workflows — native dual hi-lo (required), and WanVideoWrapper VACE approaches globs: - "**/*.json"
WAN 2.2 First-Last-Frame (FLF) Video Workflows
Overview
First-Last-Frame (FLF) video generation takes a start image and an end image and generates a smooth video transition between them. WAN 2.2 I2V (Image-to-Video) 14B model excels at this.
CRITICAL: Dual Hi-Lo Architecture (REQUIRED)
WAN 2.2 I2V uses a split-noise architecture. Unlike WAN 2.1, the 2.2 model was trained with separate HighNoise and LowNoise components that handle different denoising ranges. You MUST use both models in a two-pass KSamplerAdvanced setup. Using a single model produces low-quality, broken output.
- HighNoise model (pass 1, steps 0→N/2): Establishes structure, motion, and composition
- LowNoise model (pass 2, steps N/2→N): Refines details and ensures fidelity to input frames
- Both passes share the same conditioning from
WanFirstLastFrameToVideo - Pass 1 returns noisy latent → Pass 2 continues from there
NEVER use a single KSampler with only one model for WAN 2.2 I2V.
Two native approaches are available:
- Native Dual Hi-Lo (Default) —
WanFirstLastFrameToVideo+ dualKSamplerAdvancedtwo-pass - WanVideoWrapper —
WanVideoVACEStartToEndFrame+WanVideoVACEEncode+WanVideoSampler(VACE, caching, context windows)
Models
UNET Pairs (Always load BOTH Hi and Lo)
Remix NSFW (Recommended — built-in lightning, fp16):
| Model | Loader | Notes |
|---|---|---|
Wan2.2_Remix_NSFW_i2v_14b_high_lighting_fp16_v2.1.safetensors |
UNETLoader |
HighNoise, built-in lightning acceleration |
Wan2.2_Remix_NSFW_i2v_14b_low_lighting_fp16_v2.1.safetensors |
UNETLoader |
LowNoise, built-in lightning acceleration |
GGUF Q8 (Alternative — needs external lightning LoRAs):
| Model | Loader | Notes |
|---|---|---|
Wan2.2-I2V-A14B-HighNoise-Q8_0.gguf |
UnetLoaderGGUF |
HighNoise, quantized |
Wan2.2-I2V-A14B-LowNoise-Q8_0.gguf |
UnetLoaderGGUF |
LowNoise, quantized |
Official fp8:
| Model | Loader | Notes |
|---|---|---|
wan2.2_i2v_high_noise_14B_fp8_scaled.safetensors |
UNETLoader |
HighNoise, needs lightning LoRA |
wan2.2_i2v_low_noise_14B_fp8_scaled.safetensors |
UNETLoader |
LowNoise, needs lightning LoRA |
Text Encoder
| Model | Node | Notes |
|---|---|---|
nsfw_wan_umt5-xxl_bf16_fixed.safetensors |
CLIPLoaderGGUF (type=wan) |
NSFW-tuned, pair with Remix models |
umt5_xxl_fp8_e4m3fn_scaled.safetensors |
CLIPLoader (type=wan) |
Standard UMT5-XXL fp8 |
CLIP Vision + VAE
| Component | Node | Model |
|---|---|---|
| CLIP Vision | CLIPVisionLoader |
clip_vision_h.safetensors |
| VAE | VAELoader |
wan_2.1_vae.safetensors |
ModelSamplingSD3 (REQUIRED)
WAN 2.2 uses flow matching and requires ModelSamplingSD3 applied to each UNET:
{"class_type": "ModelSamplingSD3", "inputs": {"model": ["<unet>", 0], "shift": 5}}
shift=5 for lightning/Remix models. shift=8 for standard (non-lightning) models.
Lightning LoRAs
Remix NSFW models have lightning baked in — no external LoRA needed.
For GGUF/fp8 models, use paired hi/lo lightning LoRAs:
wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors→ HighNoise UNETwan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors→ LowNoise UNET
LoRA Stacks (rgthree)
Each model path has two stacked loaders (Common + Specific), each supporting 4 LoRA slots:
Hi path: UNETLoader(HN) → ModelSamplingSD3(shift=5) → Hi Common Stack → Hi Lora Stack → MODEL_HI
Lo path: UNETLoader(LN) → ModelSamplingSD3(shift=5) → Lo Common Stack → Lo Lora Stack → MODEL_LO
Common stacks hold shared LoRAs (quality/style). Specific stacks hold model-variant LoRAs. Set slots to "None" when unused. Even with no LoRAs, include the stacks — they pass CLIP through for text encoding.
Image Resizing (ImageResizeKJv2)
Input frames MUST be resized to the target video resolution before FLF and CLIPVisionEncode. The end frame inherits width/height from the start frame's resize to ensure matching dimensions.
{"class_type": "ImageResizeKJv2", "inputs": {
"image": ["<load_image>", 0], "width": 480, "height": 720,
"upscale_method": "nearest-exact", "keep_proportion": "crop",
"pad_color": "0, 0, 0", "crop_position": "center", "divisible_by": 2
}}
KSamplerAdvanced Two-Pass Settings
| Parameter | Pass 1 (Hi) | Pass 2 (Lo) |
|---|---|---|
| model | Hi LoRA stack output | Lo LoRA stack output |
| add_noise | enable | disable |
| steps | 4 | 4 |
| cfg | 1 | 1 |
| sampler_name | uni_pc | uni_pc |
| scheduler | beta | beta |
| start_at_step | 0 | 2 |
| end_at_step | 2 | 4 |
| return_with_leftover_noise | enable | disable |
| latent_image | WanFLF output[2] | Pass 1 output[0] |
Both passes share the same positive/negative conditioning from WanFirstLastFrameToVideo outputs [0] and [1].
For standard (non-lightning) models: steps=20, split at step 10, cfg=4, sampler=euler, scheduler=simple, shift=8.
Negative Prompt (REQUIRED)
Always include a quality negative prompt:
The tones are vibrant, overexposed, static, details are unclear, subtitles, style, work, painting, image, still, overall grayish, worst quality, low quality, JPEG compression artifacts, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, distorted limbs, merged fingers, motionless image, cluttered background, three legs, many people in the background, walking backwards
Node: WanFirstLastFrameToVideo
Required Inputs:
- positive: CONDITIONING (from CLIPTextEncode)
- negative: CONDITIONING (from CLIPTextEncode with negative prompt)
- vae: VAE
- width: INT (from ImageResizeKJv2 end frame output[1])
- height: INT (from ImageResizeKJv2 end frame output[2])
- length: INT (default 81, step 4) — number of frames
- batch_size: INT (default 1)
Optional Inputs:
- clip_vision_start_image: CLIP_VISION_OUTPUT (from CLIPVisionEncode)
- clip_vision_end_image: CLIP_VISION_OUTPUT (from CLIPVisionEncode)
- start_image: IMAGE (resized start frame)
- end_image: IMAGE (resized end frame)
Outputs:
- [0] positive: CONDITIONING → feed to BOTH Hi and Lo KSamplerAdvanced
- [1] negative: CONDITIONING → feed to BOTH Hi and Lo KSamplerAdvanced
- [2] latent: LATENT → feed to Hi Pass only (Lo Pass gets Hi Pass output)
Pipeline Flow
UNETLoader (HighNoise) → ModelSamplingSD3 (shift=5) → Hi Common Stack → Hi Lora Stack → MODEL_HI
UNETLoader (LowNoise) → ModelSamplingSD3 (shift=5) → Lo Common Stack → Lo Lora Stack → MODEL_LO
CLIPLoaderGGUF (wan) → CLIP
├─ CLIPTextEncode (positive) → CONDITIONING
└─ CLIPTextEncode (negative) → CONDITIONING
CLIPVisionLoader → CLIPVisionEncode (start) + CLIPVisionEncode (end)
VAELoader → VAE
LoadImage (start) → ImageResizeKJv2 (480x720) → resized start
LoadImage (end) → ImageResizeKJv2 (match dims) → resized end
WanFirstLastFrameToVideo (positive, negative, vae, clip_vision_start, clip_vision_end,
start_image, end_image, width/height from resize)
→ modified positive [0], modified negative [1], latent [2]
KSamplerAdvanced (Hi: MODEL_HI, steps 0→2, add_noise=enable, return_leftover=enable)
→ noisy LATENT
KSamplerAdvanced (Lo: MODEL_LO, steps 2→4, add_noise=disable, return_leftover=disable)
→ final LATENT
VAEDecode → IMAGE → VHS_VideoCombine (raw output)
→ VRAM_Debug → SeedVR2VideoUpscaler (1080p) → VHS_VideoCombine (upscaled)
Complete Workflow: Native FLF (Remix NSFW + Lightning)
{
"1": { "class_type": "UNETLoader", "inputs": { "unet_name": "Wan2.2_Remix_NSFW_i2v_14b_high_lighting_fp16_v2.1.safetensors", "weight_dtype": "default" }, "_meta": { "title": "UNET HighNoise" }},
"2": { "class_type": "UNETLoader", "inputs": { "unet_name": "Wan2.2_Remix_NSFW_i2v_14b_low_lighting_fp16_v2.1.safetensors", "weight_dtype": "default" }, "_meta": { "title": "UNET LowNoise" }},
"3": { "class_type": "CLIPLoaderGGUF", "inputs": { "clip_name": "nsfw_wan_umt5-xxl_bf16_fixed.safetensors", "type": "wan" }},
"4": { "class_type": "CLIPVisionLoader", "inputs": { "clip_name": "clip_vision_h.safetensors" }},
"5": { "class_type": "VAELoader", "inputs": { "vae_name": "wan_2.1_vae.safetensors" }},
"6": { "class_type": "LoadImage", "inputs": { "image": "<start_image.png>" }, "_meta": { "title": "Start Frame" }},
"7": { "class_type": "LoadImage", "inputs": { "image": "<end_image.png>" }, "_meta": { "title": "End Frame" }},
"8": { "class_type": "ModelSamplingSD3", "inputs": { "model": ["1", 0], "shift": 5 }, "_meta": { "title": "Hi Shift" }},
"9": { "class_type": "ModelSamplingSD3", "inputs": { "model": ["2", 0], "shift": 5 }, "_meta": { "title": "Lo Shift" }},
"10": { "class_type": "Lora Loader Stack (rgthree)", "inputs": {
"model": ["8", 0], "clip": ["3", 0],
"lora_01": "None", "strength_01": 1, "lora_02": "None", "strength_02": 1,
"lora_03": "None", "strength_03": 1, "lora_04": "None", "strength_04": 1
}, "_meta": { "title": "Hi Common" }},
"11": { "class_type": "Lora Loader Stack (rgthree)", "inputs": {
"model": ["10", 0], "clip": ["10", 1],
"lora_01": "None", "strength_01": 1, "lora_02": "None", "strength_02": 1,
"lora_03": "None", "strength_03": 1, "lora_04": "None", "strength_04": 1
}, "_meta": { "title": "Hi Lora" }},
"12": { "class_type": "Lora Loader Stack (rgthree)", "inputs": {
"model": ["9", 0], "clip": ["3", 0],
"lora_01": "None", "strength_01": 1, "lora_02": "None", "strength_02": 1,
"lora_03": "None", "strength_03": 1, "lora_04": "None", "strength_04": 1
}, "_meta": { "title": "Low Common" }},
"13": { "class_type": "Lora Loader Stack (rgthree)", "inputs": {
"model": ["12", 0], "clip": ["12", 1],
"lora_01": "None", "strength_01": 1, "lora_02": "None", "strength_02": 1,
"lora_03": "None", "strength_03": 1, "lora_04": "None", "strength_04": 1
}, "_meta": { "title": "Low Lora" }},
"14": { "class_type": "ImageResizeKJv2", "inputs": {
"image": ["6", 0], "width": 480, "height": 720,
"upscale_method": "nearest-exact", "keep_proportion": "crop",
"pad_color": "0, 0, 0", "crop_position": "center", "divisible_by": 2
}, "_meta": { "title": "Resize Start" }},
"15": { "class_type": "ImageResizeKJv2", "inputs": {
"image": ["7", 0], "width": ["14", 1], "height": ["14", 2],
"upscale_method": "nearest-exact", "keep_proportion": "crop",
"pad_color": "0, 0, 0", "crop_position": "center", "divisible_by": 2
}, "_meta": { "title": "Resize End" }},
"16": { "class_type": "CLIPVisionEncode", "inputs": { "clip_vision": ["4", 0], "image": ["14", 0], "crop": "center" }},
"17": { "class_type": "CLIPVisionEncode", "inputs": { "clip_vision": ["4", 0], "image": ["15", 0], "crop": "center" }},
"18": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["11", 1], "text": "<positive prompt>" }, "_meta": { "title": "Positive" }},
"19": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["11", 1], "text": "The tones are vibrant, overexposed, static, details are unclear, subtitles, style, work, painting, image, still, overall grayish, worst quality, low quality, JPEG compression artifacts, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, distorted limbs, merged fingers, motionless image, cluttered background, three legs, many people in the background, walking backwards" }, "_meta": { "title": "Negative" }},
"20": { "class_type": "WanFirstLastFrameToVideo", "inputs": {
"positive": ["18", 0], "negative": ["19", 0], "vae": ["5", 0],
"clip_vision_start_image": ["16", 0], "clip_vision_end_image": ["17", 0],
"start_image": ["14", 0], "end_image": ["15", 0],
"width": ["15", 1], "height": ["15", 2], "length": 81, "batch_size": 1
}},
"21": { "class_type": "KSamplerAdvanced", "inputs": {
"model": ["11", 0], "positive": ["20", 0], "negative": ["20", 1], "latent_image": ["20", 2],
"add_noise": "enable", "noise_seed": 0, "steps": 4, "cfg": 1,
"sampler_name": "uni_pc", "scheduler": "beta",
"start_at_step": 0, "end_at_step": 2, "return_with_leftover_noise": "enable"
}, "_meta": { "title": "Hi Pass" }},
"22": { "class_type": "KSamplerAdvanced", "inputs": {
"model": ["13", 0], "positive": ["20", 0], "negative": ["20", 1], "latent_image": ["21", 0],
"add_noise": "disable", "noise_seed": 0, "steps": 4, "cfg": 1,
"sampler_name": "uni_pc", "scheduler": "beta",
"start_at_step": 2, "end_at_step": 4, "return_with_leftover_noise": "disable"
}, "_meta": { "title": "Lo Pass" }},
"23": { "class_type": "VAEDecode", "inputs": { "samples": ["22", 0], "vae": ["5", 0] }},
"24": { "class_type": "VHS_VideoCombine", "inputs": {
"images": ["23", 0], "frame_rate": 16, "loop_count": 0,
"filename_prefix": "wan_flf", "format": "video/h264-mp4",
"pingpong": false, "save_output": true,
"pix_fmt": "yuv420p", "crf": 19, "save_metadata": true, "trim_to_audio": false
}}
}
Optional: Video Upscaling with SeedVR2
Add after VAEDecode for AI-powered video upscaling to 1080p. Use VRAM_Debug to free VRAM between generation and upscaling:
{
"25": { "class_type": "VRAM_Debug", "inputs": {
"image_pass": ["23", 0], "empty_cache": true, "gc_collect": true, "unload_all_models": true
}},
"26": { "class_type": "SeedVR2LoadDiTModel", "inputs": {
"model": "seedvr2_ema_3b_fp8_e4m3fn.safetensors", "device": "cuda:0",
"blocks_to_swap": 0, "swap_io_components": false, "cache_model": false, "attention_mode": "sdpa"
}},
"27": { "class_type": "SeedVR2LoadVAEModel", "inputs": {
"model": "ema_vae_fp16.safetensors", "device": "cuda:0",
"encode_tiled": false, "decode_tiled": false, "cache_model": false
}},
"28": { "class_type": "SeedVR2VideoUpscaler", "inputs": {
"image": ["25", 1], "dit": ["26", 0], "vae": ["27", 0],
"seed": 0, "resolution": 1080, "max_resolution": 0,
"batch_size": 5, "uniform_batch_size": false, "color_correction": "lab"
}},
"29": { "class_type": "VHS_VideoCombine", "inputs": {
"images": ["28", 0], "frame_rate": 16, "loop_count": 0,
"filename_prefix": "wan_flf_upscaled", "format": "video/h264-mp4",
"pingpong": false, "save_output": true,
"pix_fmt": "yuv420p", "crf": 19, "save_metadata": true, "trim_to_audio": false
}}
}
Alternative: GGUF Models with Lightning LoRAs
When using GGUF Q8 models instead of Remix, add paired lightning LoRAs:
Hi path: UnetLoaderGGUF(HN Q8) → ModelSamplingSD3(shift=5) → LoraLoaderModelOnly(hi_noise_lightning) → Hi Common Stack → Hi Lora Stack
Lo path: UnetLoaderGGUF(LN Q8) → ModelSamplingSD3(shift=5) → LoraLoaderModelOnly(lo_noise_lightning) → Lo Common Stack → Lo Lora Stack
LoRA files:
Unknown\no tags\wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensorsUnknown\no tags\wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors
Approach 2: WanVideoWrapper (Advanced Control)
Uses the WanVideoWrapper custom node pack for more control over conditioning, caching, context windows, and advanced features.
Key Differences from Native
- Uses
WANVIDEOMODELtype instead of genericMODEL - Uses
WANVIDIMAGE_EMBEDSfor conditioning instead ofCONDITIONING - Has own sampler (
WanVideoSampler) with shift parameter and scheduler options - Supports TeaCache, MagCache, EasyCache for speed optimization
- Supports context windows for longer videos
- VACE module provides more flexible frame conditioning
VACE-Based FLF Pipeline
WanVideoModelLoader → WANVIDEOMODEL
WanVideoVAELoader → WANVAE
WanVideoTextEncode → WANVIDEOTEXTEMBEDS
WanVideoClipVisionEncode (start + end images) → WANVIDIMAGE_CLIPEMBEDS
WanVideoVACEStartToEndFrame (start_image, end_image, num_frames=81)
→ images batch, masks
WanVideoVACEEncode (vae, input_frames, input_masks, width, height, num_frames)
→ WANVIDIMAGE_EMBEDS (vace_embeds)
WanVideoSampler (model, image_embeds, text_embeds, steps, cfg, shift, scheduler)
→ LATENT
WanVideoDecode (vae, samples) → IMAGE → VHS_VideoCombine → MP4
WanVideoSampler Settings
| Parameter | Standard | Lightning | Notes |
|---|---|---|---|
| steps | 30 | 4 | |
| cfg | 6.0 | 1.0 | |
| shift | 5.0 | 5.0 | Flow matching shift |
| scheduler | unipc | euler | WanVideoWrapper has own schedulers |
| force_offload | true | true | Move model to CPU after sampling |
When to Use WanVideoWrapper vs Native
| Feature | Native | WanVideoWrapper |
|---|---|---|
| Simplicity | Simpler | More complex |
| Dual Hi-Lo | Manual two-pass | May handle internally |
| LoRA loading | Lora Loader Stack (rgthree) | WanVideoLoraSelect / WanVideoSetLoRAs |
| Caching (TeaCache) | Not available | Built-in |
| Context windows | Not available | WanVideoContextOptions |
| Block swap (VRAM) | Not available | WanVideoBlockSwap |
| VACE conditioning | Not available | Full VACE support |
| Long video (>81 frames) | Limited | InfiniteTalk / context windows |
Recommendation: Use Native dual hi-lo for standard FLF transitions. Use WanVideoWrapper when you need caching, context windows, VRAM management, or advanced conditioning.
Resolution & Frame Count
Standard Resolutions
| Aspect | Resolution | Megapixels |
|---|---|---|
| Portrait 2:3 | 480x720 | 0.35MP (recommended default) |
| Landscape 16:9 | 832x480 | 0.4MP |
| Portrait 9:16 | 480x832 | 0.4MP |
| Square | 640x640 | 0.4MP |
Width and height must be divisible by 16. Use ImageResizeKJv2 with divisible_by: 2 and keep_proportion: crop.
Frame Count
- 81 frames at 16fps = ~5 seconds (default, recommended)
- 49 frames at 16fps = ~3 seconds (faster, less motion)
- 121 frames at 16fps = ~7.5 seconds (longer, more VRAM)
- Frame count should be
4n + 1(1, 5, 9, ..., 49, 81, 121)
Frame Rate
Standard: 16 fps for WAN 2.2 output.
Video Output
VHS_VideoCombine
{
"class_type": "VHS_VideoCombine",
"inputs": {
"images": ["<vae_decode>", 0],
"frame_rate": 16,
"loop_count": 0,
"filename_prefix": "wan_flf",
"format": "video/h264-mp4",
"pingpong": false,
"save_output": true,
"pix_fmt": "yuv420p",
"crf": 19,
"save_metadata": true,
"trim_to_audio": false
}
}
VRAM Considerations
Dual Hi-Lo with Remix fp16
- Two UNETs loaded sequentially (ComfyUI offloads between passes): ~14GB each
- NSFW UMT5-XXL bf16: ~8GB (offloaded after text encoding)
- CLIP Vision H: ~1.5GB (offloaded after encoding)
- VAE: ~200MB
- Latent (81 frames at 480x720): ~1-2GB
ComfyUI manages VRAM by offloading models between passes. The Hi UNET is offloaded before the Lo UNET loads.
Tips
- Always
clear_vrambefore switching to WAN from another model family - Use
VRAM_Debugnode between generation and SeedVR2 upscaling to free all VRAM - For 24GB GPUs, 81 frames at 480x720 is the practical maximum
- Remix NSFW models have lightning baked in — no separate LoRA needed, 4 steps total
Morph LoRAs (Smooth Metamorphosis)
By default, FLF produces a transition/dissolve between frames. For true morphing (one shape seamlessly reshaping into another), use a morph LoRA on both Hi and Lo paths.
Magical Morph (Recommended)
| Variant | File | Strength | Notes |
|---|---|---|---|
| HighNoise | wan2.2_i2v_magical_morph_highnoise.safetensors |
0.7-1.0 | Apply to Hi Common stack |
| LowNoise | wan2.2_i2v_magical_morph_lownoise.safetensors |
0.7-1.0 | Apply to Lo Common stack |
- Source: NikolaSigmoid/wan2.2-i2v-loras-magical-morph
- No trigger word needed — the LoRA modifies the denoising behavior
- Strength 1.0 can add visual sparkle/particle effects. Reduce to 0.7-0.8 for cleaner morphs
- Works with Remix NSFW models (no conflict with built-in lightning)
SkinMorph Redmond (Alternative — Face/Body Focus)
For person-to-person morphs (identity, gender transforms):
- Trigger word:
Skin morph - Strength: 0.8-1.0
- Source: CivitAI
Prompt Tips
Describe the transition motion, not just the start/end states:
Good: "A small cat sitting on the ground smoothly transforms and grows into a woman standing tall, seamless transformation, cinematic"
Bad: "A cat and a girl"
IMPORTANT — Prompt language affects visuals:
- AVOID words like "magical", "enchanted", "mystical" — they cause literal sparkle/particle effects
- USE clean motion language: "smoothly transforms", "gradually reshapes", "seamlessly morphs", "transitions into"
- The morph LoRA handles the morphing effect — the prompt should describe motion and form change, not style
- Include scale/position cues when subjects differ in size: "grows into", "expands upward", "shrinks down"
Settings Quick Reference
| Config | Lightning (Remix) | Standard |
|---|---|---|
| Models | Remix NSFW Hi+Lo fp16 | Official Hi+Lo fp8 |
| CLIP | nsfw_wan_umt5-xxl_bf16_fixed | umt5_xxl_fp8_e4m3fn_scaled |
| ModelSamplingSD3 shift | 5 | 8 |
| Total steps | 4 | 20 |
| Hi pass end_at_step | 2 | 10 |
| CFG | 1 | 4 |
| Sampler | uni_pc | euler |
| Scheduler | beta | simple |
| External LoRA needed | No (built-in) | Yes (paired hi/lo) |
Multi-Step Pipeline Pattern
Anchor Frame Strategy (Proportions)
When the start and end frames have different subject sizes (e.g., small cat → tall person), generate the "anchor" frame first — the one with the most complex composition — then use Qwen Edit to create the other frame from it. This ensures:
- Consistent background/scene between frames
- Correct relative proportions (the edit inherits the scene scale)
- Better FLF results since both frames share the same visual context
Example — Cat-to-Girl Morph:
- Generate girl standing in front of barn with Z-Image (she fills the frame)
- Qwen Edit: "Replace the woman with a small cat sitting at the bottom of the image"
- FLF: cat (start) → girl (end) — proportions are correct because the barn establishes scale
Anti-pattern: Generating cat and girl independently produces mismatched scale.
Full Pipeline
- Generate anchor frame with Z-Image/SDXL/Flux (portrait orientation for standing subjects)
- Qwen Edit to create second frame — the edit preserves scene context
- Clear VRAM between model families
- Upload both frames with
upload_image - Run dual hi-lo FLF with morph LoRA if morphing is desired
- Optionally upscale with SeedVR2 to 1080p
Proven timing on RTX 4090: Z-Image (35s) → Qwen Edit (78s) → WAN FLF 81 frames (139s) = ~4 minutes total.
Working with Saved Workflows
Use analyze_workflow to understand any saved WAN FLF workflow before modifying or executing it. It returns a structured summary with sections, node IDs, key settings, and virtual wire connections — no raw JSON needed.
analyze_workflow("Wan FirstLastFrame Advanced.json") # summary view (default)
analyze_workflow("Wan FirstLastFrame Advanced.json", view="flat") # mermaid diagram
Only use get_workflow when you need the raw JSON for enqueue_workflow or modify_workflow.