add-afm-model - SKILL.md Agent Skill

name: add-afm-model description: Add support for a new HuggingFace MLX model to AFM. Use when user wants to add, onboard, or check compatibility of a model — handles everything from "already supported" to implementing new architectures. user_invocable: true

Investigate and add support for a HuggingFace MLX model to the AFM server.

/add-afm-model <model-id> — e.g., /add-afm-model mlx-community/Qwen3-8B-4bit
/add-afm-model <url> — e.g., /add-afm-model https://huggingface.co/mlx-community/Qwen3-8B-4bit

Extract the HuggingFace model ID from the user's input:

Fetch https://huggingface.co/<model-id>/resolve/main/config.json using WebFetch.

Not MLX check: If config.json has no quantization or quantization_config field, the model is not MLX-quantized. Inform the user:

"This model is not in MLX format. Look for an MLX-quantized version on huggingface.co/mlx-community, or quantize it yourself with mlx_lm.convert."

Stop here if not MLX.

Read the model_type field from config.json. This is the key that maps to a Swift model implementation.

Also note: architectures, num_experts/num_local_experts (MoE indicator), image_token_id/vision_config (VLM indicator).

Search Scripts/patches/LLMModelFactory.swift for the model_type string in the LLMTypeRegistry.shared dictionary (lines ~25-80).

If found → Already supported. Tell the user:

"This model is already supported! Run it with:
MACAFM_MLX_MODEL_CACHE=/Volumes/edata/models/vesta-test-cache afm mlx -m <model-id> --port 9999
```"

Suggest running /test-macafm for validation if this is a new model variant.

Stop here if already registered.

The model_type is NOT in the registry. Now determine if an existing Swift implementation can handle it.

List files in vendor/mlx-swift-lm/Libraries/MLXLLM/Models/
Search for the model's base architecture name (e.g., if model_type is foo_moe, check for Foo.swift or similar)
Read the model's config.json fields and compare against existing implementations — some architectures handle variants (e.g., DeepseekV3 handles kimi_k2, Qwen2 handles acereason)
Check if an existing model has a dense fallback (e.g., numExperts == 0 path)

If a compatible architecture exists → Registry-only fix. Proceed to Tier: Registry Addition.

Search for the model_type in the Python mlx-lm library:

Fetch https://github.com/ml-explore/mlx-lm/tree/main/mlx_lm/models (or search GitHub)
Look for a Python file matching the model_type

If Python implementation exists → Port to Swift. Proceed to Tier: Port from Python.

If no Python implementation → Implement from scratch. Proceed to Tier: New Architecture.

The simplest case — architecture exists, just needs a type alias.

Read references/implementation-guide.md for the patch system details
Add the model_type to LLMTypeRegistry.shared in Scripts/patches/LLMModelFactory.swift, mapping to the correct Configuration/Model pair
If the model is a VLM (has vision_config/image_token_id), also add to Scripts/patches/VLMModelFactory.swift
If the model has a new tool call format, update Scripts/patches/ToolCallFormat.swift infer() method
Apply patches: ./Scripts/apply-mlx-patches.sh
Build: swift build (or /build-afm for full rebuild)
Verify: start server and test with a simple prompt

Port the Python mlx-lm implementation to Swift.

Read references/implementation-guide.md for the full implementation pattern
Read references/model-investigation.md for config.json field mapping
Fetch and study the Python implementation from mlx-lm
Find the closest existing Swift model to use as a template (check vendor/mlx-swift-lm/Libraries/MLXLLM/Models/)
Create the new Swift file in Scripts/patches/<ModelName>.swift
Implement: Configuration (Codable) → Attention → MLP → TransformerBlock → Model
Add to PATCH_FILES, TARGET_PATHS, NEW_FILES in Scripts/apply-mlx-patches.sh
Register in LLMTypeRegistry (and VLM if needed)
Add weight sanitization if needed (in the Configuration's sanitize())
Apply, build, verify

No existing implementation anywhere. Research and implement from scratch.

Read references/implementation-guide.md and references/model-investigation.md
Find architecture documentation: paper, blog post, or reference implementation
Study config.json thoroughly for all architecture-specific fields
Find the closest existing Swift model as a starting template
Follow the same implementation steps as "Port from Python" above
Pay special attention to: attention patterns, normalization layers, MoE routing, positional embeddings
Test extensively — new architectures often have subtle bugs

NEVER edit files in vendor/ directly — all changes go through Scripts/patches/
Always check VLM registry too if the model has vision capabilities
Use MACAFM_MLX_MODEL_CACHE=/Volumes/edata/models/vesta-test-cache to avoid re-downloading
After adding a new model, suggest running /test-macafm for full validation