sparseforge-hessian-mask - SKILL.md Agent Skill

name: sparseforge-hessian-mask description: Efficient semi-structured LLM sparsification via annealing of Hessian-mask guided pruning. Achieves high sparsity with minimal accuracy loss using second-order importance estimation. category: deep-learning tags: [LLM, sparsification, pruning, Hessian, compression, efficiency] trigger: sparseforge, hessian mask, semi-structured pruning, LLM sparsification, 2:4 sparsity, model compression

SparseForge: Hessian-Mask Guided Semi-Structured Pruning

Overview

SparseForge achieves efficient semi-structured (e.g., 2:4) sparsity in LLMs by using Hessian-based second-order importance estimation with an annealing schedule to guide pruning decisions.

Core Technique

Hessian-Driven Importance: Compute diagonal Hessian (or Fisher information) to estimate parameter sensitivity
Mask Annealing: Instead of one-shot pruning, gradually anneal a binary mask over training steps
Semi-Structured Constraints: Enforce N:M sparsity patterns (e.g., 2 out of every 4 weights = 0) for hardware efficiency
Recovery Fine-Tuning: Brief fine-tuning with the annealed mask to recover accuracy

Key Benefits

Higher accuracy at same sparsity vs. magnitude-based pruning
Hardware-friendly 2:4 sparsity patterns compatible with Tensor Cores
Annealing prevents catastrophic accuracy drops from aggressive one-shot pruning

Implementation Steps

Compute per-parameter Hessian diagonal via Fisher approximation (gradient outer product)
Initialize soft mask with sigmoid parametrization
Anneal mask temperature over training: soft → hard binary decisions
Apply N:M structural constraints during mask hardening
Fine-tune with masked weights, optionally with distillation loss from dense model

Pitfalls

Hessian computation is expensive — use diagonal/Fisher approximation for LLM-scale models
Annealing schedule is critical: too fast → accuracy collapse; too slow → wasted compute
N:M constraints require specialized kernel support or careful indexing for inference speedup

Activation Keywords

sparseforge, hessian mask, semi-structured pruning, LLM sparsification, 2:4 sparsity, model compression, Fisher information, mask annealing