name: huawei-cloud-msmodelslim-model-adapt description: |- Create basic Transformers model adapters for msModelSlim. Implements required interfaces and completes a four-step verification workflow: generate test model -> full fallback quantization -> weight verification -> quantization description validation. Use this skill when the user wants to: (1) create msModelSlim adapters for decoder-only LLM, (2) adapt understanding VLM text backbones for quantization, (3) implement W8A8/W4A16 quantization workflow for new models. Trigger: user mentions "msModelSlim", "adapter", "model adapter","quantization", "W8A8","W4A16", "transformers", "LLM", "VLM", "adapter creation", "适配器","模型适配", "量化", "模型适配器", "LLM量化" compatibility: - transformers >= 4.40.0 - msmodelslim >= 1.0.0 tags: [msModelSlim, adapter, quantization, model] allowed-tools: - python3 - bash
Huawei Cloud msModelSlim Model Adapter
Overview
This skill guides how to create basic adapters for new models to run W8A8/W4A16 quantization workflows in msModelSlim.
Architecture: Model Analysis -> Adapter Creation -> Registration -> Verification (4 Steps)
Related Skills:
huawei-cloud-msmodelslim-model-analysis- Model structure analysis before adapter implementationhuawei-cloud-ascend-profiler-db-explorer- Optional: Performance analysis after deployment
Scope
Supported:
- Decoder-only LLM
- Understanding VLM (text/LLM backbone only)
Not supported:
- Multimodal generation (Stable Diffusion/Flux/Wan)
- Encoder-only models
- Non-Transformers architectures
Architecture
┌─────────────────────────────────────────────────────────────┐
│ msModelSlim Model Adapter Skill │
├─────────────────────────────────────────────────────────────┤
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ Model Analysis │───▶│ Adapter Creation │ │
│ │ - config.json │ │ - LLM Adapter Template │ │
│ │ - modeling_*.py│ │ - VLM Adapter Template │ │
│ └──────────────────┘ │ - Required Interfaces │ │
│ └──────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ Registration │ │
│ │ & Installation │ │
│ └──────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Verification (4 Steps) │ │
│ │ 1. Generate Test Model → 2. Full Fallback Quant │ │
│ │ 3. Weight Verification → 4. Quant Description │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Architecture Components
This skill involves the following cloud services and components:
- msModelSlim: Huawei Cloud's model quantization framework for efficient model compression
- Transformers Library: Hugging Face Transformers for model loading and processing
- ModelScope: Model download and management platform
- Ascend NPU: Target hardware for quantized model deployment
Use Cases
Typical Problem Scenarios:
- Need to deploy LLM models with reduced memory footprint on Ascend NPU
- Want to optimize inference speed without significant accuracy loss
- Migrating models that don't have built-in msModelSlim support
- Need W8A8/W4A16 quantization for decoder-only LLM or VLM text backbones
Typical User Phrases:
- "How to quantize my custom LLM model for Ascend?"
- "Create msModelSlim adapter for Qwen model"
- "Implement W4A16 quantization workflow"
- "Adapt my VLM text backbone for quantization"
- "How to add quantization support for new models?"
Core Workflow
1. Preparation
- Download Model: Recommended to use
modelscope downloadfor non-weight files.- Example:
modelscope download --model <org>/<model> --local_dir ./models/<name> --exclude '*.safetensors'
- Example:
- Analyze Model: Read
config.jsonandmodeling_*.pyto confirm structure and implementation.- See: Model Analysis Guide
2. Create Adapter
- Use Templates:
- LLM:
assets/model_adapter_template.py - VLM:
assets/vlm_model_adapter_template.py
- LLM:
- Implement Interfaces: Implement
handle_dataset,init_model,generate_model_visit,generate_model_forward,enable_kv_cache. - Key Principles:
visitandforwardmust be strictly consistent.- MoE models recommended to unpack to pure linear layers.
- See: Implementation Guide
3. Registration & Installation
- Register model and entry in
config/config.ini, then executebash install.sh. - See: Registration Guide
4. Verify Adapter (Required)
- Must execute four-step verification: Generate test model -> Full fallback quantization -> Verify full fallback model matches float weights exactly and can load/save completely -> Verify actual quantization workflow works (including description file rule validation).
- See: Verification Guide
Common Scripts
Scripts located in scripts/ directory:
scripts/step1_generate_test_model.pyscripts/step2_run_quantization.pyscripts/step3_verify_weights.pyscripts/step4_verify_quant_description.py
Prerequisites
System Requirements
- Python 3.8+
- transformers >= 4.40.0
- msmodelslim >= 1.0.0
Environment Check
Prerequisite check: Python3 + transformers + msmodelslim required
python3 --version # Python3 >= 3.8 python3 -c "import transformers; print('OK')" # Transformers library python3 -c "import msmodelslim; print('OK')" # msModelSlim libraryIf not installed:
pip3 install --user transformers msmodelslim
Reference Documents
| Document | Description |
|---|---|
| Model Analysis Guide | Model structure analysis guide |
| Implementation Guide | Adapter implementation instructions |
| Registration Guide | Registration and installation guide |
| Verification Guide | Four-step verification workflow |
| Interface Checklist | Required interface implementation checklist |
| Core Workflow | Core workflow documentation |
| Acceptance Criteria | Functional acceptance criteria |
| Troubleshooting | Common issues and solutions |
Requirements
- transformers >= 4.40.0 installed
- msmodelslim >= 1.0.0 installed
- Transformers model to be adapted
- Understanding of target quantization scheme (W8A8/W4A16)
Core Commands
# Create model adapter
python3 scripts/create_adapter.py \
--model Qwen2-7B \
--quantization W8A8
# Run four-step verification
python3 scripts/verify_adapter.py --adapter ./adapter.py
Parameter Confirmation
| Parameter | Description | Required |
|---|---|---|
| model | Model name or path | Yes |
| quantization | Quantization scheme (W8A8/W4A16) | Yes |
| output | Adapter output path | No |