fp8-gemm-tuning-sglang-aiter

star 104

Use when trying to optimize end-to-end SGLang performance with gemm tuning for FP8 models on AMD HIP/ROCm by replacing the default Triton GEMM backend with a tuned Composable Kernel (CK) path through aiter; this skill is the verified playbook for that entire process, using FP8 block-wise GEMM (gemm_a8w8_blockscale) as the primary worked example—GEMM shape/dispatch logging in SGLang, CK composable-kernel tuning, and AITER_CONFIG_GEMM_A8W8_BLOCKSCALE CSV integration. FP8 blockscale and bpreshuffle should also apply by switch the place for dumping gemm and the ck tool used for tuning.

AMD-AGI By AMD-AGI schedule Updated 5/18/2026

Skill instructions (SKILL.md) could not be loaded from local cache or raw GitHub repository.

Install via CLI
npx skills add https://github.com/AMD-AGI/GEAK --skill fp8-gemm-tuning-sglang-aiter
Repository Details
star Stars 104
call_split Forks 27
navigation Branch main
article Path SKILL.md
More from Creator