Skill instructions (SKILL.md) could not be loaded from local cache or raw GitHub repository.
kernelwiki
star 254
Use when the user asks about optimizing NVIDIA Blackwell (SM100, B200) or Hopper (SM90, H100) GPU kernels — tcgen05/TMEM/CLC/NVFP4/2-SM cooperative, warp specialization, FlashAttention-4, DeepGEMM, FlashMLA, MoE, grouped GEMM, CuTe-DSL/PTX/Triton on Blackwell, or wants concrete PR references from CUTLASS/SGLang/vLLM/FlashInfer/PyTorch. Do NOT use for generic CUDA Q&A that is not Blackwell/Hopper-specific, host-side framework integration, or distributed systems (DeepEP/EPLB/DualPipe).
Install via CLI
npx skills add https://github.com/mit-han-lab/KernelWiki --skill kernelwiki
Repository Details
star Stars
254
call_split Forks
28
navigation Branch
main
article Path
SKILL.md
More from Creator
mit-han-lab Explore all skills →