Skill instructions (SKILL.md) could not be loaded from local cache or raw GitHub repository.
fp8-gemm-tuning-sglang-aiter
star 104
Use when trying to optimize end-to-end SGLang performance with gemm tuning for FP8 models on AMD HIP/ROCm by replacing the default Triton GEMM backend with a tuned Composable Kernel (CK) path through aiter; this skill is the verified playbook for that entire process, using FP8 block-wise GEMM (gemm_a8w8_blockscale) as the primary worked example—GEMM shape/dispatch logging in SGLang, CK composable-kernel tuning, and AITER_CONFIG_GEMM_A8W8_BLOCKSCALE CSV integration. FP8 blockscale and bpreshuffle should also apply by switch the place for dumping gemm and the ck tool used for tuning.
Install via CLI
npx skills add https://github.com/AMD-AGI/GEAK --skill fp8-gemm-tuning-sglang-aiter
Repository Details
star Stars
104
call_split Forks
27
navigation Branch
main
article Path
SKILL.md
More from Creator
AMD-AGI Explore all skills →