deploy-deepseek-mlc - SKILL.md Agent Skill

name: deploy-deepseek-mlc description: Deploy DeepSeek on Jetson Orin using MLC (Machine Learning Compilation) for optimized edge inference. Uses Docker/jetson-containers. Requires Jetson with >8GB RAM and JetPack 5.1.1+.

Deploy DeepSeek on Jetson with MLC

Execution model

Run one phase at a time. After each phase:

Relay all command output to the user.
If output contains [STOP] → stop immediately, consult the failure decision tree below.
If output ends with [OK] → tell the user "Phase N complete" and proceed to the next phase.

Prerequisites

Requirement	Minimum
Hardware	reComputer J4012 (Jetson Orin NX 16GB) or equivalent
RAM	>8 GB (16 GB recommended for DeepSeek-R1 7B+)
JetPack	5.1.1+ (JetPack 6.x preferred)
Storage	SSD strongly recommended — model weights are large
Internet	Required for Docker pull and model download

Phase 1 — Preflight

Verify JetPack version, available RAM, and disk space before touching Docker.

cat /etc/nv_tegra_release
free -h
df -h /
df -h /ssd 2>/dev/null || true

Expected: L4T R35.x (JP5) or R36.x (JP6), ≥8 GB RAM free, ≥50 GB disk available. [OK] when all three pass. [STOP] if RAM or disk is insufficient.

Phase 2 — Install Docker + nvidia-container

sudo apt update

# JetPack 5.x
sudo apt install -y nvidia-container

# JetPack 6.x — also install curl, then Docker
sudo apt install -y nvidia-container curl
curl https://get.docker.com | sh
sudo systemctl --now enable docker

# Add current user to docker group
sudo usermod -aG docker $USER
newgrp docker

Verify:

docker --version
docker run --rm --runtime nvidia --gpus all ubuntu:22.04 nvidia-smi

Expected: nvidia-smi output shows the Jetson GPU. [OK] when GPU is visible inside the container.

Move Docker storage to SSD (strongly recommended)

Edit /etc/docker/daemon.json:

{
  "data-root": "/ssd/docker",
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

sudo systemctl restart docker
docker info | grep "Docker Root Dir"

[OK] when Docker Root Dir points to your SSD path.

Phase 3 — Pull MLC container and download DeepSeek model

# JP5.x:
docker pull dustynv/mlc-llm:r35.4.1

# JP6.x:
docker pull dustynv/mlc-llm:r36.2.0

docker images | grep mlc-llm

Download model weights inside the container:

docker run -it --rm \
  --runtime nvidia \
  --network host \
  -v /ssd/models:/models \
  dustynv/mlc-llm:r36.2.0 \
  bash -c "huggingface-cli download deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --local-dir /models/deepseek-r1-7b"

[OK] when model files are present under /ssd/models/. [STOP] if download fails — see failure decision tree.

Phase 4 — Launch inference

docker run -it --rm \
  --runtime nvidia \
  --network host \
  -v /ssd/models:/models \
  dustynv/mlc-llm:r36.2.0 \
  python3 -m mlc_llm serve /models/deepseek-r1-7b \
    --device cuda \
    --host 0.0.0.0 \
    --port 8080

Test the endpoint:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-r1-7b","messages":[{"role":"user","content":"Hello"}]}'

[OK] when the API returns a JSON response with a completion.

For full step-by-step commands, screenshots, and model configuration options, read references/source.body.md.

Failure decision tree

Symptom	Action
`docker: command not found`	Re-run the `curl https://get.docker.com \| sh` step. Confirm `sudo systemctl enable --now docker`.
`nvidia-container` install fails	Confirm JetPack version with `cat /etc/nv_tegra_release`. JP5 and JP6 have different package names — check `references/source.body.md` for the exact apt source.
`nvidia-smi` not visible inside container	nvidia-container-runtime not configured. Verify `/etc/docker/daemon.json` has the `nvidia` runtime entry and restart Docker.
OOM / killed during inference	Model too large for available RAM. Try a smaller distill variant (1.5B or 7B). Ensure no other heavy processes are running.
Model download fails / times out	Check internet connectivity. Retry with `huggingface-cli download --resume-download`. If HuggingFace is blocked, use a mirror or pre-download on another machine.
`docker pull` fails with no space	Docker root is on eMMC. Move Docker data root to SSD (Phase 2 SSD step).
Inference endpoint returns 500	Model path inside container may be wrong. Verify the `-v` mount and the path passed to `mlc_llm serve`.

Reference files

references/source.body.md — full original Seeed tutorial with complete MLC configuration, model options, and effect demonstration (reference only)