name: legacy_libreyolo-add-native-detection-model description: >- Legacy guide for adding a detection model to LibreYOLO before the explicit task-paradigm refactor. Covers both YOLO-grid families (YOLOX, YOLOv9, YOLO-NAS) and DETR-style families (RF-DETR, D-FINE; future RT-DETR), but should be treated as pre-task-architecture guidance.
Legacy: adding a detection model to LibreYOLO
This skill documents the pre-task-refactor detection-model workflow. Prefer a new task-aware skill for future detection, segmentation, pose, or classification model additions.
1. What LibreYOLO is
A single LibreYOLO("Libre<Family><size>.pt") factory dispatches to family-local
implementations. Each family lives entirely inside libreyolo/models/<family>/
and plugs into 4 shared ABCs. The factory iterates BaseModel._registry (auto-
populated via __init_subclass__) and the first can_load(state_dict) match wins.
libreyolo/
├── models/<family>/ ← all family code lives here
├── training/ ← BaseTrainer, schedulers, EMA, TrainConfig
├── validation/ ← DetectionValidator + BaseValPreprocessor subclasses
├── backends/ ← ONNX / TensorRT / OpenVINO / NCNN / TorchScript runtime
├── export/ ← BaseExporter and per-format export helpers
└── data/ ← YOLODataset / COCODataset / dataloader / collate
2. Currently supported
| Family | Sizes | Paradigm | Native code | License (upstream) |
|---|---|---|---|---|
| YOLOX | n / t / s / m / l / x | YOLO-grid | yes | Apache-2.0 |
| YOLOv9 | t / s / m / c | YOLO-grid | yes | MIT (MultimediaTechLab/YOLO) |
| YOLO-NAS | s / m / l | YOLO-grid | yes | Apache-2.0 code; weights non-redistributable (Deci CDN) |
| RF-DETR | n / s / m / l | DETR | wrapper | Apache-2.0 |
| D-FINE | n / s / m / l / x | DETR | yes | Apache-2.0 |
⚠️ YOLOv9 license is not the same as the original paper repo. LibreYOLO's
YOLOv9 port follows MultimediaTechLab/YOLO (MIT, by Kin-Yiu Wong and Hao-Tang Tsui),
not WongKinYiu/yolov9 (GPL-3.0, the paper's reference implementation). When
you encounter a model with multiple upstream forks, check each — the permissive
fork is what makes integration into MIT-licensed LibreYOLO clean.
"Native code" means the model + loss + trainer live in libreyolo/models/<family>/.
"Wrapper" means the family delegates to a separate PyPI package (RF-DETR uses
rfdetr directly; the LibreYOLO code is mostly an adapter).
3. License check — do this first
LibreYOLO core is MIT, and the project explicitly tries to stay MIT-compatible. Before porting anything, check both licenses on the upstream:
- The code — read the upstream's
LICENSEfile (don't trust the README). - The weights — check the HuggingFace model card / release page / project site. Code and weights can have different licenses (e.g. Apache code + GPL weights, or permissive code + a custom commercial-restriction license on weights).
| Upstream license | Code in main? | Rehost weights under LibreYOLO/ HF? |
Action |
|---|---|---|---|
| MIT / Apache-2.0 / BSD | ✅ | ✅ | normal integration; ship NOTICE per upload-hf-model skill |
| GPL-3.0 | ⚠️ ship as plugin only | ❌ never (forces GPL on users) | libreyolo-<family> separate package; user downloads weights from upstream |
| AGPL-3.0 | ⚠️ plugin only | ❌ never (network-use clause is broader) | same as GPL but stricter |
| Custom / non-redistributable | case-by-case | ❌ usually | link to upstream CDN, like YOLO-NAS does for Deci's bucket |
Code-license rationale: GPL is "viral" — putting GPL code inside MIT-licensed LibreYOLO would force the entire library to become GPL, breaking every downstream user who relies on MIT terms.
Weights-license rationale: the legal status of weights under GPL is unsettled, but the conservative interpretation says any code that links a loaded GPL weight becomes a combined work and inherits GPL on distribution. Don't expose users to that without an explicit opt-in.
Already-shipped examples:
- D-FINE — Apache-2.0 code + Apache-2.0 weights → clean integration in core, weights rehosted on
LibreYOLO/LibreDFINE*. - RF-DETR — Apache-2.0 → clean.
- YOLOv9 — MIT code + MIT weights, via
MultimediaTechLab/YOLO(Kin-Yiu Wong & Hao-Tang Tsui). Ported from the permissive fork, notWongKinYiu/yolov9(GPL-3.0). - YOLO-NAS — Apache-2.0 code + custom Deci CDN for weights → linked rather than rehosted.
- YOLO-World — GPL-3.0 code + GPL-3.0 weights → flagged in #108; plugin-only is the right call even though wondervictor (paper first author) actively distributes.
When unsure, fetch the actual LICENSE file:
curl -sL https://raw.githubusercontent.com/<org>/<repo>/<branch>/LICENSE | head -3
The first line is canonical. "GNU GENERAL PUBLIC LICENSE Version 3" or "GNU AFFERO GENERAL PUBLIC LICENSE Version 3" → copyleft, treat carefully. "MIT License" or "Apache License Version 2.0" → permissive, proceed.
For weights, check the HF model card YAML at the top:
license: gpl-3.0 # ← copyleft
license: apache-2.0 # ← permissive
license: cc-by-nc-4.0 # ← non-commercial: usually a hard "no" for inclusion
4. The 4 ABCs every family plugs into
| ABC | File | Required overrides |
|---|---|---|
BaseModel |
models/base/model.py |
can_load, detect_size, detect_nb_classes, _init_model, _get_available_layers, _preprocess, _forward, _postprocess, _get_preprocess_numpy |
BaseTrainer |
training/trainer.py |
_config_class, get_model_family, get_model_tag, create_transforms, create_scheduler, get_loss_components |
TrainConfig |
training/config.py |
dataclass subclass with kw_only=True, override only fields that differ |
BaseValPreprocessor |
validation/preprocessors.py |
__call__, normalize, optionally uses_letterbox and custom_normalization |
BaseModel also exposes a set of ClassVars every family must set
(missing or empty values silently break the factory's auto-routing):
| ClassVar | What it controls |
|---|---|
FAMILY: str |
family identifier (e.g. "yolox", "deim"); used by the factory to gate per-family kwargs and by the conversion script's model_family metadata field |
FILENAME_PREFIX: str |
e.g. "LibreYOLOX", "LibreDFINE"; drives detect_size_from_filename and the rehosted-weights filename convention |
WEIGHT_EXT: str |
usually ".pt"; only override if your weights need a different extension |
INPUT_SIZES: dict[str, int] |
size code -> input resolution; used to validate the size= arg and to drive the val preprocessor's expected canvas |
TRAIN_CONFIG: type[TrainConfig] | None |
wires the family's dataclass to the model class so model.train(...) builds the right config; set this to None only for inference-only ports |
SUPPORTS_SEG: bool |
default False; flip to True if your family also ships a segmentation head. The factory routes task="seg" requests via this flag (replacing the old FAMILY == "rfdetr" special-case in commit d300f1c) |
val_preprocessor_class |
BaseValPreprocessor subclass; defaults to StandardValPreprocessor if unset |
Auto-registration kicks in on import: models/__init__.py adds one line per
family. Import order = can_load priority when heuristics overlap.
Checkpoint loading goes through libreyolo.utils.serialization.load_untrusted_torch_file,
not raw torch.load. That helper centralises the weights_only=False
choice (PyTorch 2.6 default change) and adds an explanatory error
context. Anything that loads a .pt outside BaseModel._load_weights
should use the same helper rather than bypassing it.
5. Two architectural patterns
Pick one. The contracts diverge non-trivially.
YOLO-grid pattern (YOLOX, YOLOv9, YOLO-NAS)
- Model output: per-scale tensor list. Shape and contents differ across families: YOLOX is
(B, 5+nc, H, W)per scale (4 reg + 1 objectness + nc classes), with grid offsets and exp applied only in export mode; YOLOv9 / YOLO-NAS drop the objectness channel and emit(B, 4+nc, N)-shaped tensors. Confirm your family's exact shape against the upstream head. - Training targets:
(B, max_labels, 5)padded[class, cx, cy, w, h]pixel coords. - Loss: per-anchor + assignment (SimOTA / TaskAlignedAssigner / DFL).
- Augmentation: numpy/cv2, mosaic + mixup central. Lives in
training/augment.py(MosaicMixupDataset,random_affine, etc.) — reused across YOLO families. - ONNX: 1 output named
"output", opset 13 default works. - Detection head exposes
self.export: boolflipped by the exporter atexport/exporter.py:_model_context. - NCNN / OpenVINO / TensorRT / TorchScript all work out of the box.
DETR pattern (RF-DETR, D-FINE; future RT-DETR)
- Model output: dict
{"pred_logits": (B, Q, nc), "pred_boxes": (B, Q, 4)}cxcywh in [0, 1]. - Training targets:
list[dict{labels, boxes_cxcywh_normalized}]per image — no padding. - Loss: Hungarian matching + auxiliary outputs across decoder layers (FGL, GO-LSD for D-FINE).
- Augmentation: torchvision v2 transforms with
tv_tensors.Image+tv_tensors.BoundingBoxes. - Multi-scale: per-batch random resize via a custom collate (
BatchImageCollateFunction-style). - Backbone LR multiplier (0.1× or 0.5×) is standard. Implies per-group LR application
in
_train_epoch— needs to be overridden, hooks aren't enough. - Gradient clipping (
max_norm=0.1) is standard. - ONNX: 2 outputs
["pred_logits", "pred_boxes"], opset ≥ 16 (forgrid_sample). - Export wrapper: a small
nn.Modulethat callsmodel.deploy()(recursiveconvert_to_deployon every submodule that defines it) and flattens dict→tuple. - NCNN does not work for DETR-family models — its op registry lacks
topk. - EMA mid-training decay change (
set_decay) is sometimes used to stabilize the final phase after augmentation stops.
Sibling architectures
When your family is the same architecture as an existing family with a
different training objective (e.g. a port whose architecture is identical
to an existing port, with only the loss / matcher changed), landmine #3's
"match on tokens unique to your architecture" advice fails because the
architectures are literally identical and both can_load checks fire on
the same checkpoint. The disambiguation pattern:
- Embed an explicit
model_familyfield in the converted checkpoint's metadata (the conversion script's job). This is the strongest signal. - Use the family's
FILENAME_PREFIX(e.g.LibreDFINE,LibreDEIM) indetect_size_from_filenameas a fallback hint when metadata is absent, e.g. someone hands you a raw upstream.pththey renamed. Each sibling abstains on the other's prefix. - Order the registry imports in
libreyolo/models/__init__.pyso the more-specific family loads first;BaseModel._registryis walked in import order and the firstcan_loadmatch wins. - Raise an explicit "ambiguous between {A, B}" error on a true tie (architecture-equal checkpoint, no metadata, no filename hint) rather than silently picking one.
This pattern is reusable for any descendant family that inherits an existing port's architecture.
6. The training-recipe trade-off (read this before claiming a port is "done")
LibreYOLO has explicitly chosen not to reproduce upstream paper recipes for from-scratch training. From-scratch reproduction would require sponsoring hundreds of GPU-hours per family and matching every augmentation, EMA quirk, loss weight, and warmup detail. That's not what 99% of users want.
What LibreYOLO does aim for:
- Inference parity — bit-equivalent outputs vs. upstream on the released checkpoints. This is non-negotiable.
- Best-possible fine-tuning — a user can load the upstream pretrained
checkpoint and fine-tune on a small custom dataset, getting within ~1 mAP of
what
python -m upstream.train ...would have produced.
Concretely, this means every existing family has gaps relative to its upstream training recipe, and that's by design. Examples we've already accepted:
- D-FINE skips Objects365 pretraining (not relevant for fine-tune users).
- D-FINE used to skip
RandomZoomOut/RandomIoUCrop/RandomPhotometricDistortin v1 (added in a later commit, ~+1.5 mAP gain). - YOLOv9 doesn't ship the auxiliary "branch" head used during from-scratch training.
- YOLO-NAS uses LibreYOLO's letterbox preprocessing (close but not identical to SuperGradients' exact pipeline) — documented as a known parity gap.
When you port a new model:
- Spend agentic time reading the upstream training code first. The model's forward pass is the easy 30%. Augmentations, optimizer param groups, LR schedule shape, EMA dynamics, multi-scale collation, gradient clipping — that's the other 70%, and it dominates fine-tune quality.
- Decide explicitly which pieces you skip. Document them in a commit message or in the family's docstring.
- Aim for fine-tune parity, not paper parity. Test by loading upstream
weights, fine-tuning on
coco128ormarblesfor ~10 epochs, and verifying mAP improves. - Don't pretend gaps don't exist. If the augmentation chain is half what
upstream uses, say so. A 5-line
transforms.pythat says "TODO: portRandomZoomOut" is more honest than silently shipping a degraded recipe.
A useful agent prompt:
"In <upstream-repo>/, identify every augmentation, loss weight, optimizer
param group, LR schedule, and EMA behavior used during training. Output a
concrete checklist of what would need to be ported."
The minimum bar for "this port loads upstream weights correctly"
Before claiming inference is correct, run a tensor-equivalence check:
- Import the upstream model class side-by-side with yours.
- Build both with the same config / size; cross-load the upstream
state_dictinto yours and inspect the missing/unexpected key diff. Usestrict=Trueonly if your port loads the full upstream state dict (this is the case for D-FINE/DEIM-style DETR ports that mirror upstream attribute naming exactly). For ports that intentionally drop upstream layers (YOLOX strips training-state buffers; YOLOv9 drops the auxiliary head and remaps legacydetect.*->head.*keys; YOLO-NAS unwraps SuperGradients EMA buffers), usestrict=Falseand assert that the missing/unexpected key set matches a documented expected set. Silent unexpected drift, not the strict mode itself, is the thing to catch. See landmine #14 for_strict_loading(). - Run identical inputs through both at FP32 and assert
max_abs_diff == 0on the output tensors that come from layers present in both models, in botheval()andtrain()modes.
This recipe validates architectural fidelity, attribute naming, and state-dict compatibility in one shot. Save the script as a one-off under the family's test directory; it pays for itself on every future upstream version bump. Fine-tune sanity (load -> 10 epochs on coco128 or marbles -> mAP improves) is the second gate, not the first.
If your family is a wrapper around an upstream PyPI package (RF-DETR pattern), this check reduces to "import the upstream package and verify it produces the documented outputs"; substitute accordingly.
7. Per-family integration: what each one actually shipped
YOLOX (models/yolox/)
- Files:
__init__.py,model.py,nn.py,trainer.py,utils.py,loss.py(6 files). - Pattern: YOLO-grid. Pixel cxcywh targets, BGR 0–255 inference, letterbox preprocessing.
- Augmentations: reuses shared
libreyolo/training/augment.py(mosaic + mixup); no family-localtransforms.py. - Recipe gaps: minimal. Closest to upstream of any family.
YOLOv9 (models/yolo9/)
- Files: 7 files. Largest YOLO-grid port (~2.3k LoC) due to ELAN/RepNCSPELAN modules.
- Pattern: YOLO-grid. RGB 0-1 normalisation. Validation path letterboxes; the default
inference path uses plain
Image.resize(utils.py:_postprocessdefaultsletterbox=False), a known asymmetry called out in the val-preprocessor docstring. - Recipe gaps: from-scratch auxiliary head dropped; mixup disabled; trainer builds
three param groups (BN / Conv / Bias) at the same
lrrather than upstream's three at distinct LRs (no backbone-LR split).
YOLO-NAS (models/yolonas/)
- Files: 7 files. Native nn but state-dict-compatible with SuperGradients' SG checkpoints.
- Pattern: YOLO-grid.
- Recipe gaps: SG's exact augmentation pipeline replaced by LibreYOLO's standard letterbox path (documented).
- Quirks: weights download from Deci's CDN, not LibreYOLO's HF org (license).
RF-DETR (models/rfdetr/)
- Files: 6 files (
__init__.py,config.py,model.py,nn.py,trainer.py,utils.py). RF-DETR is the only family that keeps its<Family>Configfamily-local (inmodels/rfdetr/config.py) rather than appending it tolibreyolo/training/config.py. - Pattern: DETR. Wrapper, not native — delegates to the
rfdetrPyPI package. - Subprocess isolation lives in
tests/e2e/test_rf1_training.py, not in the trainer itself; the trainer calls upstreammodel.train()directly. - Recipe gaps: training is upstream's, so few. Inference path adapts upstream's postprocessor (cxcywh → xyxy, COCO 91→80 class remap).
D-FINE (models/dfine/)
- Files: 16 files. Largest port (~4k LoC).
- Architecture:
nn.py,backbone.py,encoder.py,decoder.py,common.py,ms_deform.py - Numerical core:
fdr.py(FDR math, separately parity-tested),box_ops.py - Loss:
loss.py,matcher.py,denoising.py - Wrapper + IO:
model.py,utils.py,transforms.py,trainer.py
- Architecture:
- Pattern: DETR.
- Recipe gaps: from-scratch HGNetV2 pretrained backbone download disabled (users start from D-FINE's own COCO/obj2coco checkpoints). Backbone-LR multiplier added in v2. Fine-tune now closely matches upstream's recipe.
Shared conversion helpers (weights/_conversion_utils.py)
When a family does need a conversion script, use the shared helpers rather than reinventing the plumbing. They cover the parts that every script ends up needing:
| Helper | Purpose |
|---|---|
add_repo_root_to_path() |
so python weights/convert_*.py can import libreyolo.* cleanly |
load_checkpoint(path) |
torch.load with map_location="cpu" and weights_only=False; centralises the post-PyTorch-2.6 default change so it cannot bite per-script |
extract_state_dict(ckpt, *, prefer_ema=True) |
unwraps the common upstream layouts: {"ema": {"module": ...}}, {"model": ...}, {"state_dict": ...}, raw dicts, or anything with a .state_dict() method |
strip_state_dict_prefix(state_dict, prefix) |
drops a leading prefix when the upstream wrapper is model.model.<...> |
wrap_libreyolo_checkpoint(state_dict, *, model_family, size, nc, names=None) |
builds the canonical metadata-wrapped LibreYOLO format; build_class_names(nc) falls back to COCO-80 names for nc=80, generic class_<i> otherwise |
save_checkpoint(checkpoint, output_path) |
creates parent dirs and writes |
If upstream ships .safetensors (DEIMv2 was the first family to surface
this), don't try to feed it through load_checkpoint — torch.load
won't read it. Dispatch on Path(input).suffix == ".safetensors",
construct a fresh native model instance (e.g. LibreDEIMv2Model(...)),
load with safetensors.torch.load_model(model, path, strict=True), and
read .state_dict() back. strict=True is the right default here: it
fails the conversion loudly on any structural drift between upstream's
safetensors layout and your port, which is exactly the bug class a
silent unwrap would hide. safetensors is added as a runtime dep in
pyproject.toml rather than an extra, since DETR families are
increasingly publishing weights in this format.
weights/convert_deimv2_weights.py is the reference implementation.
weights/README.md classifies each shipped conversion as one of three
tiers, which is the right framing for a new one too:
- metadata-wrap (D-FINE, DEIM): module names already match,
extract_state_dict->wrap_libreyolo_checkpoint->save_checkpoint, ~50-100 LoC end-to-end. - light structural (RT-DETR HGNetv2): EMA unwrap + a small set of
encoder/decoder key remaps + drop-list for tensors absent in
LibreYOLO's port. Saves a flat converted
state_dict(no metadata wrap on this path historically). - heavy structural (YOLOv9): translate numbered upstream layer indices into LibreYOLO semantic module names, remap sublayer names for ELAN / RepNCSPELAN / AConv / ADown / SPP / heads, skip the auxiliary head, inject fixed DFL weights. Hundreds of LoC.
Reference test for the helpers: tests/unit/test_weight_conversion_utils.py.
Files-touched matrix (universal centralizing files)
Every family edits these:
| File | Why |
|---|---|
libreyolo/models/<family>/{__init__.py, model.py, nn.py, trainer.py, utils.py} |
family-local code |
libreyolo/models/__init__.py |
one-line family import (drives auto-registration order) |
libreyolo/__init__.py |
LibreYOLO<Family> export + __all__ |
libreyolo/training/config.py |
append <Family>Config(TrainConfig) (RF-DETR is the exception: keeps RFDETRConfig family-local at models/rfdetr/config.py) |
libreyolo/validation/preprocessors.py |
append <Family>ValPreprocessor |
tests/unit/test_<family>_*.py |
parity / shape / loss / smoke tests against upstream — done before claiming inference is correct |
tests/e2e/conftest.py |
append rows to MODEL_CATALOG |
Conditional edits depending on family:
| File | When |
|---|---|
libreyolo/models/<family>/loss.py |
non-trivial loss (everyone except RF-DETR) |
libreyolo/models/<family>/transforms.py |
augmentation diverges from the shared training/augment.py mosaic+mixup default (D-FINE adds tv2-based ops; YOLOv9 / YOLO-NAS subclass) |
libreyolo/training/scheduler.py |
paper recipe needs an LR shape that doesn't exist yet — e.g. D-FINE added FlatCosineScheduler (warmup → flat → cosine tail). Add a new generic BaseScheduler subclass; do not put schedulers under models/<family>/ |
libreyolo/training/ema.py |
EMA decay needs to change at runtime (e.g. mid-train restart). D-FINE added set_decay(decay, ramp=False) — generic enough to leave shared |
libreyolo/backends/base.py |
output shape diverges from YOLO grid (DETR families need it) |
libreyolo/backends/tensorrt.py |
output names differ from "output" (DETR families) |
libreyolo/export/exporter.py |
needs an _model_context branch (D-FINE has one for the deploy wrapper) |
libreyolo/export/onnx.py |
output count differs from 1 or 3 (DETR's 2-output case) |
weights/convert_<family>_weights.py |
needed when upstream ships a checkpoint format LibreYOLO can't load directly (extra wrapping, EMA buffer drops, key remaps, or just no model_family metadata). Skip it when (a) your family is a wrapper that consumes upstream checkpoints in-process (RF-DETR), (b) your top-level module attributes mirror upstream's so SG/upstream state_dicts load with _strict_loading=False plus an in-process unwrap helper (YOLO-NAS, YOLOX), or (c) the conversion is trivial enough to keep inside LibreYOLO("upstream.pt"). When you do write one: wrap with metadata (model_family, size, nc, names) so the factory routes without filename heuristics; print a missing/unexpected-key diff after loading the wrapped dict into a fresh model (silent drops are a frequent source of slow-burn fine-tune bugs); write atomically (.tmp + rename) so an interrupted run can't half-write a corrupt .pt; fail loudly on shape mismatches. The script can stay under ~100 LoC if your top-level attributes mirror upstream's names so no key remapping is needed (YOLO-NAS is the canonical example of this design). |
pyproject.toml |
mandatory for wrapper integrations (RF-DETR's [rfdetr] extra is required, not optional — the wrapper is non-functional without the dep). For native ports, only if you genuinely can't avoid a new dep. |
8. The integration-proof tests
You're integrated when both pass for every size of your family.
| Test | What it proves | Notes |
|---|---|---|
tests/e2e/test_val_coco128.py |
Inference loads + runs; preprocessing + class mapping + postprocessing are correct. | Runs model.val(data="coco128.yaml"), asserts mAP50-95 ≥ 0.18. |
tests/e2e/test_rf1_training.py |
Training improves the model on a real dataset (marbles). | Trains 10 epochs, asserts post-mAP > pre-mAP and post-mAP ≥ 0.05. |
To wire your family into both: append rows to MODEL_CATALOG in
tests/e2e/conftest.py. Both tests parametrize over the catalog.
Optional faithfulness gate
test_val_coco128's mAP50-95 ≥ 0.18 floor is a sanity check that
preprocessing and class mapping are wired correctly; it is not a
faithfulness check. A silent regression in the conversion script or
numerical drift in the model port can still leave the floor intact while
losing several mAP. The faithfulness gate is loading the converted
official checkpoint and asserting full-COCO mAP >= published - 0.5.
Recommended pattern when users care about matching upstream's published
numbers: tests/nightly/test_<family>_official_ckpt_map.py, gated on a
<FAMILY>_OFFICIAL_CKPT_DIR env var, kept out of the default suite to
avoid pulling multi-GB weights in CI. This is opt-in by design.
DETR families: skip the last_loss < first_loss assertion in test_rf1_training.
DETR total loss is the sum of ~38 weighted aux terms (per-decoder-layer + pre +
encoder-aux + DN paths) and is too noisy on small datasets for monotonic-decrease
to be reliable. RF-DETR's branch and the D-FINE branch both exempt themselves.
9. Silent-corruption landmines (in priority order)
The ones below have actually burned integrations in this repo. Each line is a one-shot: [which family hit it] — what to do.
- Color space mismatch between training transform and val preprocessor (YOLOX BGR vs YOLOv9 RGB) — pin the convention in both docstrings, cross-check.
- Target format mismatch (D-FINE) — DETR criteria want
list[dict]but the data pipeline yields padded(B, max_labels, 5); translate inon_forward. can_load()too greedy (RF-DETR almost stole D-FINE checkpoints) — match on tokens unique to your architecture; never"backbone"or"weight".- Backbone LR multiplier missing (DETR families) — silent ~0.5 mAP loss in
fine-tuning. Implies per-group LR +
_train_epochoverride. - Multi-scale collate epoch propagation (D-FINE) — collate needs
set_epoch()called from the trainer at each epoch start. - Stop-epoch augmentation policy (D-FINE) — disable
RandomZoomOut/RandomIoUCropetc. at epoch N. Different fromno_aug_epochs(which kills mosaic for last N). labels_getter=lambdais unpicklable under Python 3.14'sforkserver(D-FINE on macOS) — use a module-level function forSanitizeBoundingBoxes.RandomIoUCrophas nopparameter in torchvision v2 — wrap withRandomApply.- MPS-specific torch bugs in DETR backward (D-FINE) — provide a
_setup_deviceoverride that falls back to CPU; CUDA path stays unchanged. - Post-train device drift (D-FINE) — when the trainer fell back to CPU, the
wrapper's
self.deviceis still MPS;model.val()aftermodel.train()hits a device mismatch. Endtrain()withself.model.to(self.device). - ONNX opset 13 default — DETR families with deformable attention need ≥ 16.
Set per-family default in
BaseExporter.__call__. - NCNN can't handle DETR ops (D-FINE) — block the export early with
NotImplementedErrorinstead of producing a graph the runtime can't load. head.exportflag missing (YOLO-grid families) — without it, ONNX bakes static shapes that work only at the exact resolution exported.strict=Truestate-dict loading — override_strict_loading() = Falseif upstream checkpoints carry EMA buffers, profiling state, or auxiliary heads.- Cross-family rejection on cross-family transfer — when intentionally
splicing weights, pop
model_familyfrom the donor checkpoint dict. - EMA decay too low — if you lower it from 0.9999 without good reason, early-epoch evals show flat or decreasing mAP because the EMA hasn't settled.
- Letterbox vs. plain resize —
uses_letterboxproperty must match the training transform;DetectionValidatorreads it for target rescaling. _train_epochoverride drift (DETR families) — when you copy the parent loop to add per-group LR + grad clip + epoch propagation, leave a comment "kept in sync withBaseTrainer._train_epochas of" so drift is auditable. Promote to shared hooks if a third family needs the same overrides. - Per-size defaults copy-pasted from one size to all. When upstream
ships per-size YAMLs, build a side-by-side table of every override per
size before assuming the s config applies to n. DETR examples:
freeze_at,freeze_norm, backbone-LR multiplier, EMA decay,lr_gamma, peaklr. YOLO-grid examples: BN epsilon / momentum overrides on the smallest size,INPUT_SIZES, depth/width multipliers, head reg-max. The default is rarely uniform across sizes. A single-row table that's wrong on n/m is a silent ~1 mAP regression on those sizes. Cross-check each row against the actual upstream YAML, not just the first one you ported. <Family>Config.min_lr_ratiois one knob trying to cover several upstreamlr_gammavalues.FlatCosineSchedulercomputesmin_lr = lr * min_lr_ratio, so the ratio is the upstreamlr_gammaby another name. Different families pick different values (D-FINE:1.0= no cosine decay, by design; DEIM:0.5; ECDET:0.5); within a family, sizes can disagree too (DEIM-N overrides to1.0). The trap is picking any uniform value without cross-checking. A ratio of1.0is correct when upstream'slr_gammais1.0and a silent bug otherwise; the ratio is not the issue, the cross-check is.- Wasted ImageNet backbone download in
_init_model(RT-DETR fixed inbf16a2b). When a user constructs your wrapper from a LibreYOLO checkpoint,BaseModel.__init__will eventually load their weights and overwrite anything you initialised the backbone with. If_init_modelunconditionally fetches the upstream ImageNet-pretrained backbone, you pay a 50-100 MB download for nothing on every checkpoint load. Two related pieces guard against it: (a)BaseModel.__init__peeks at the checkpoint viacls.detect_sizebefore the first_init_modelcall so the right architecture is built from the start, which means yourdetect_sizehas to work on raw upstreamstate_dicts (not just LibreYOLO-wrapped ones); (b) inside_init_model, checkself._loading_pretrained_checkpointand passbackbone_pretrained=False(or the equivalent kwarg your model uses) when the flag is set.
10. Workflow
- Check both licenses — code (upstream
LICENSEfile) and weights (HF model card). If either is GPL/AGPL, plan for a plugin-only ship; if weights are non-redistributable, link to the upstream CDN like YOLO-NAS. - Pick the pattern: YOLO-grid or DETR. Skim the existing family that's closest.
- Audit upstream's training recipe with an agent. Decide what you skip.
- Implement family-local code (
models/<family>/) — the model, postprocess, and inference wrapper first. Verify inference parity using the recipe in §6 (cross-load upstreamstate_dict, assertmax_abs_diff == 0on identical inputs over the layers present in both models). For wrapper integrations like RF-DETR, substitute "verify the wrapped package produces its documented outputs." - Wire central files —
models/__init__.py,__init__.py,config.py,validation/preprocessors.py. Family must load viaLibreYOLO("Libre<Family>s.pt"). - Implement the trainer —
trainer.py,transforms.py,loss.py. Verify the loss matches upstream on synthetic inputs (parity test, 1e-5 tolerance). - Wire into
MODEL_CATALOGand runtest_val_coco128+test_rf1_training. - Test ONNX export at minimum. If the rest of the export formats are family- compatible, run them too.
- Upload weights to HuggingFace under
LibreYOLO/Libre<Family><size>/— see the separatelibreyolo-upload-hf-modelskill (skip if upstream license forbids redistribution; link to upstream CDN instead).