deploy-deepseek-mlc

star 50

Deploy DeepSeek on Jetson Orin using MLC (Machine Learning Compilation) for optimized edge inference. Uses Docker/jetson-containers. Requires Jetson with >8GB RAM and JetPack 5.1.1+.

Seeed-Projects By Seeed-Projects schedule Updated 3/11/2026

name: deploy-deepseek-mlc description: Deploy DeepSeek on Jetson Orin using MLC (Machine Learning Compilation) for optimized edge inference. Uses Docker/jetson-containers. Requires Jetson with >8GB RAM and JetPack 5.1.1+.

Deploy DeepSeek on Jetson with MLC


Execution model

Run one phase at a time. After each phase:

  • Relay all command output to the user.
  • If output contains [STOP] → stop immediately, consult the failure decision tree below.
  • If output ends with [OK] → tell the user "Phase N complete" and proceed to the next phase.

Prerequisites

Requirement Minimum
Hardware reComputer J4012 (Jetson Orin NX 16GB) or equivalent
RAM >8 GB (16 GB recommended for DeepSeek-R1 7B+)
JetPack 5.1.1+ (JetPack 6.x preferred)
Storage SSD strongly recommended — model weights are large
Internet Required for Docker pull and model download

Phase 1 — Preflight

Verify JetPack version, available RAM, and disk space before touching Docker.

cat /etc/nv_tegra_release
free -h
df -h /
df -h /ssd 2>/dev/null || true

Expected: L4T R35.x (JP5) or R36.x (JP6), ≥8 GB RAM free, ≥50 GB disk available. [OK] when all three pass. [STOP] if RAM or disk is insufficient.


Phase 2 — Install Docker + nvidia-container

sudo apt update

# JetPack 5.x
sudo apt install -y nvidia-container

# JetPack 6.x — also install curl, then Docker
sudo apt install -y nvidia-container curl
curl https://get.docker.com | sh
sudo systemctl --now enable docker

# Add current user to docker group
sudo usermod -aG docker $USER
newgrp docker

Verify:

docker --version
docker run --rm --runtime nvidia --gpus all ubuntu:22.04 nvidia-smi

Expected: nvidia-smi output shows the Jetson GPU. [OK] when GPU is visible inside the container.

Move Docker storage to SSD (strongly recommended)

Edit /etc/docker/daemon.json:

{
  "data-root": "/ssd/docker",
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}
sudo systemctl restart docker
docker info | grep "Docker Root Dir"

[OK] when Docker Root Dir points to your SSD path.


Phase 3 — Pull MLC container and download DeepSeek model

# JP5.x:
docker pull dustynv/mlc-llm:r35.4.1

# JP6.x:
docker pull dustynv/mlc-llm:r36.2.0

docker images | grep mlc-llm

Download model weights inside the container:

docker run -it --rm \
  --runtime nvidia \
  --network host \
  -v /ssd/models:/models \
  dustynv/mlc-llm:r36.2.0 \
  bash -c "huggingface-cli download deepseek-ai/DeepSeek-R1-Distill-Qwen-7B --local-dir /models/deepseek-r1-7b"

[OK] when model files are present under /ssd/models/. [STOP] if download fails — see failure decision tree.


Phase 4 — Launch inference

docker run -it --rm \
  --runtime nvidia \
  --network host \
  -v /ssd/models:/models \
  dustynv/mlc-llm:r36.2.0 \
  python3 -m mlc_llm serve /models/deepseek-r1-7b \
    --device cuda \
    --host 0.0.0.0 \
    --port 8080

Test the endpoint:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"deepseek-r1-7b","messages":[{"role":"user","content":"Hello"}]}'

[OK] when the API returns a JSON response with a completion.

For full step-by-step commands, screenshots, and model configuration options, read references/source.body.md.


Failure decision tree

Symptom Action
docker: command not found Re-run the curl https://get.docker.com | sh step. Confirm sudo systemctl enable --now docker.
nvidia-container install fails Confirm JetPack version with cat /etc/nv_tegra_release. JP5 and JP6 have different package names — check references/source.body.md for the exact apt source.
nvidia-smi not visible inside container nvidia-container-runtime not configured. Verify /etc/docker/daemon.json has the nvidia runtime entry and restart Docker.
OOM / killed during inference Model too large for available RAM. Try a smaller distill variant (1.5B or 7B). Ensure no other heavy processes are running.
Model download fails / times out Check internet connectivity. Retry with huggingface-cli download --resume-download. If HuggingFace is blocked, use a mirror or pre-download on another machine.
docker pull fails with no space Docker root is on eMMC. Move Docker data root to SSD (Phase 2 SSD step).
Inference endpoint returns 500 Model path inside container may be wrong. Verify the -v mount and the path passed to mlc_llm serve.

Reference files

  • references/source.body.md — full original Seeed tutorial with complete MLC configuration, model options, and effect demonstration (reference only)
Install via CLI
npx skills add https://github.com/Seeed-Projects/Seeed-Jetson-DevelopTool --skill deploy-deepseek-mlc
Repository Details
star Stars 50
call_split Forks 3
navigation Branch main
article Path SKILL.md
More from Creator
Seeed-Projects
Seeed-Projects Explore all skills →