legacy-libreyolo-add-native-detection-model

name: legacy_libreyolo-add-native-detection-model description: >- Legacy guide for adding a detection model to LibreYOLO before the explicit task-paradigm refactor. Covers both YOLO-grid families (YOLOX, YOLOv9, YOLO-NAS) and DETR-style families (RF-DETR, D-FINE; future RT-DETR), but should be treated as pre-task-architecture guidance.

Legacy: adding a detection model to LibreYOLO

This skill documents the pre-task-refactor detection-model workflow. Prefer a new task-aware skill for future detection, segmentation, pose, or classification model additions.

1. What LibreYOLO is

A single LibreYOLO("Libre<Family><size>.pt") factory dispatches to family-local implementations. Each family lives entirely inside libreyolo/models/<family>/ and plugs into 4 shared ABCs. The factory iterates BaseModel._registry (auto- populated via __init_subclass__) and the first can_load(state_dict) match wins.

libreyolo/
├── models/<family>/      ← all family code lives here
├── training/             ← BaseTrainer, schedulers, EMA, TrainConfig
├── validation/           ← DetectionValidator + BaseValPreprocessor subclasses
├── backends/             ← ONNX / TensorRT / OpenVINO / NCNN / TorchScript runtime
├── export/               ← BaseExporter and per-format export helpers
└── data/                 ← YOLODataset / COCODataset / dataloader / collate

2. Currently supported

Family	Sizes	Paradigm	Native code	License (upstream)
YOLOX	n / t / s / m / l / x	YOLO-grid	yes	Apache-2.0
YOLOv9	t / s / m / c	YOLO-grid	yes	MIT (`MultimediaTechLab/YOLO`)
YOLO-NAS	s / m / l	YOLO-grid	yes	Apache-2.0 code; weights non-redistributable (Deci CDN)
RF-DETR	n / s / m / l	DETR	wrapper	Apache-2.0
D-FINE	n / s / m / l / x	DETR	yes	Apache-2.0

⚠️ YOLOv9 license is not the same as the original paper repo. LibreYOLO's YOLOv9 port follows MultimediaTechLab/YOLO (MIT, by Kin-Yiu Wong and Hao-Tang Tsui), not WongKinYiu/yolov9 (GPL-3.0, the paper's reference implementation). When you encounter a model with multiple upstream forks, check each — the permissive fork is what makes integration into MIT-licensed LibreYOLO clean.

"Native code" means the model + loss + trainer live in libreyolo/models/<family>/. "Wrapper" means the family delegates to a separate PyPI package (RF-DETR uses rfdetr directly; the LibreYOLO code is mostly an adapter).

3. License check — do this first

LibreYOLO core is MIT, and the project explicitly tries to stay MIT-compatible. Before porting anything, check both licenses on the upstream:

The code — read the upstream's LICENSE file (don't trust the README).
The weights — check the HuggingFace model card / release page / project site. Code and weights can have different licenses (e.g. Apache code + GPL weights, or permissive code + a custom commercial-restriction license on weights).

Upstream license	Code in main?	Rehost weights under `LibreYOLO/` HF?	Action
MIT / Apache-2.0 / BSD	✅	✅	normal integration; ship NOTICE per upload-hf-model skill
GPL-3.0	⚠️ ship as plugin only	❌ never (forces GPL on users)	`libreyolo-<family>` separate package; user downloads weights from upstream
AGPL-3.0	⚠️ plugin only	❌ never (network-use clause is broader)	same as GPL but stricter
Custom / non-redistributable	case-by-case	❌ usually	link to upstream CDN, like YOLO-NAS does for Deci's bucket

Code-license rationale: GPL is "viral" — putting GPL code inside MIT-licensed LibreYOLO would force the entire library to become GPL, breaking every downstream user who relies on MIT terms.

Weights-license rationale: the legal status of weights under GPL is unsettled, but the conservative interpretation says any code that links a loaded GPL weight becomes a combined work and inherits GPL on distribution. Don't expose users to that without an explicit opt-in.

Already-shipped examples:

D-FINE — Apache-2.0 code + Apache-2.0 weights → clean integration in core, weights rehosted on LibreYOLO/LibreDFINE*.
RF-DETR — Apache-2.0 → clean.
YOLOv9 — MIT code + MIT weights, via MultimediaTechLab/YOLO (Kin-Yiu Wong & Hao-Tang Tsui). Ported from the permissive fork, not WongKinYiu/yolov9 (GPL-3.0).
YOLO-NAS — Apache-2.0 code + custom Deci CDN for weights → linked rather than rehosted.
YOLO-World — GPL-3.0 code + GPL-3.0 weights → flagged in #108; plugin-only is the right call even though wondervictor (paper first author) actively distributes.

When unsure, fetch the actual LICENSE file:

curl -sL https://raw.githubusercontent.com/<org>/<repo>/<branch>/LICENSE | head -3

The first line is canonical. "GNU GENERAL PUBLIC LICENSE Version 3" or "GNU AFFERO GENERAL PUBLIC LICENSE Version 3" → copyleft, treat carefully. "MIT License" or "Apache License Version 2.0" → permissive, proceed.

For weights, check the HF model card YAML at the top:

license: gpl-3.0    # ← copyleft
license: apache-2.0 # ← permissive
license: cc-by-nc-4.0 # ← non-commercial: usually a hard "no" for inclusion

4. The 4 ABCs every family plugs into

ABC	File	Required overrides
`BaseModel`	`models/base/model.py`	`can_load`, `detect_size`, `detect_nb_classes`, `_init_model`, `_get_available_layers`, `_preprocess`, `_forward`, `_postprocess`, `_get_preprocess_numpy`
`BaseTrainer`	`training/trainer.py`	`_config_class`, `get_model_family`, `get_model_tag`, `create_transforms`, `create_scheduler`, `get_loss_components`
`TrainConfig`	`training/config.py`	dataclass subclass with `kw_only=True`, override only fields that differ
`BaseValPreprocessor`	`validation/preprocessors.py`	`__call__`, `normalize`, optionally `uses_letterbox` and `custom_normalization`

BaseModel also exposes a set of ClassVars every family must set (missing or empty values silently break the factory's auto-routing):

ClassVar	What it controls
`FAMILY: str`	family identifier (e.g. `"yolox"`, `"deim"`); used by the factory to gate per-family kwargs and by the conversion script's `model_family` metadata field
`FILENAME_PREFIX: str`	e.g. `"LibreYOLOX"`, `"LibreDFINE"`; drives `detect_size_from_filename` and the rehosted-weights filename convention
`WEIGHT_EXT: str`	usually `".pt"`; only override if your weights need a different extension
`INPUT_SIZES: dict[str, int]`	size code -> input resolution; used to validate the `size=` arg and to drive the val preprocessor's expected canvas
`TRAIN_CONFIG: type[TrainConfig] \| None`	wires the family's dataclass to the model class so `model.train(...)` builds the right config; set this to `None` only for inference-only ports
`SUPPORTS_SEG: bool`	default `False`; flip to `True` if your family also ships a segmentation head. The factory routes `task="seg"` requests via this flag (replacing the old `FAMILY == "rfdetr"` special-case in commit `d300f1c`)
`val_preprocessor_class`	`BaseValPreprocessor` subclass; defaults to `StandardValPreprocessor` if unset

Auto-registration kicks in on import: models/__init__.py adds one line per family. Import order = can_load priority when heuristics overlap.

Checkpoint loading goes through libreyolo.utils.serialization.load_untrusted_torch_file, not raw torch.load. That helper centralises the weights_only=False choice (PyTorch 2.6 default change) and adds an explanatory error context. Anything that loads a .pt outside BaseModel._load_weights should use the same helper rather than bypassing it.

5. Two architectural patterns

Pick one. The contracts diverge non-trivially.

YOLO-grid pattern (YOLOX, YOLOv9, YOLO-NAS)

Model output: per-scale tensor list. Shape and contents differ across families: YOLOX is (B, 5+nc, H, W) per scale (4 reg + 1 objectness + nc classes), with grid offsets and exp applied only in export mode; YOLOv9 / YOLO-NAS drop the objectness channel and emit (B, 4+nc, N)-shaped tensors. Confirm your family's exact shape against the upstream head.
Training targets: (B, max_labels, 5) padded [class, cx, cy, w, h] pixel coords.
Loss: per-anchor + assignment (SimOTA / TaskAlignedAssigner / DFL).
Augmentation: numpy/cv2, mosaic + mixup central. Lives in training/augment.py (MosaicMixupDataset, random_affine, etc.) — reused across YOLO families.
ONNX: 1 output named "output", opset 13 default works.
Detection head exposes self.export: bool flipped by the exporter at export/exporter.py:_model_context.
NCNN / OpenVINO / TensorRT / TorchScript all work out of the box.

DETR pattern (RF-DETR, D-FINE; future RT-DETR)

Model output: dict {"pred_logits": (B, Q, nc), "pred_boxes": (B, Q, 4)} cxcywh in [0, 1].
Training targets: list[dict{labels, boxes_cxcywh_normalized}] per image — no padding.
Loss: Hungarian matching + auxiliary outputs across decoder layers (FGL, GO-LSD for D-FINE).
Augmentation: torchvision v2 transforms with tv_tensors.Image + tv_tensors.BoundingBoxes.
Multi-scale: per-batch random resize via a custom collate (BatchImageCollateFunction-style).
Backbone LR multiplier (0.1× or 0.5×) is standard. Implies per-group LR application in _train_epoch — needs to be overridden, hooks aren't enough.
Gradient clipping (max_norm=0.1) is standard.
ONNX: 2 outputs ["pred_logits", "pred_boxes"], opset ≥ 16 (for grid_sample).
Export wrapper: a small nn.Module that calls model.deploy() (recursive convert_to_deploy on every submodule that defines it) and flattens dict→tuple.
NCNN does not work for DETR-family models — its op registry lacks topk.
EMA mid-training decay change (set_decay) is sometimes used to stabilize the final phase after augmentation stops.

Sibling architectures

When your family is the same architecture as an existing family with a different training objective (e.g. a port whose architecture is identical to an existing port, with only the loss / matcher changed), landmine #3's "match on tokens unique to your architecture" advice fails because the architectures are literally identical and both can_load checks fire on the same checkpoint. The disambiguation pattern:

Embed an explicit model_family field in the converted checkpoint's metadata (the conversion script's job). This is the strongest signal.
Use the family's FILENAME_PREFIX (e.g. LibreDFINE, LibreDEIM) in detect_size_from_filename as a fallback hint when metadata is absent, e.g. someone hands you a raw upstream .pth they renamed. Each sibling abstains on the other's prefix.
Order the registry imports in libreyolo/models/__init__.py so the more-specific family loads first; BaseModel._registry is walked in import order and the first can_load match wins.
Raise an explicit "ambiguous between {A, B}" error on a true tie (architecture-equal checkpoint, no metadata, no filename hint) rather than silently picking one.

This pattern is reusable for any descendant family that inherits an existing port's architecture.

6. The training-recipe trade-off (read this before claiming a port is "done")

LibreYOLO has explicitly chosen not to reproduce upstream paper recipes for from-scratch training. From-scratch reproduction would require sponsoring hundreds of GPU-hours per family and matching every augmentation, EMA quirk, loss weight, and warmup detail. That's not what 99% of users want.

What LibreYOLO does aim for:

Inference parity — bit-equivalent outputs vs. upstream on the released checkpoints. This is non-negotiable.
Best-possible fine-tuning — a user can load the upstream pretrained checkpoint and fine-tune on a small custom dataset, getting within ~1 mAP of what python -m upstream.train ... would have produced.

Concretely, this means every existing family has gaps relative to its upstream training recipe, and that's by design. Examples we've already accepted:

D-FINE skips Objects365 pretraining (not relevant for fine-tune users).
D-FINE used to skip RandomZoomOut / RandomIoUCrop / RandomPhotometricDistort in v1 (added in a later commit, ~+1.5 mAP gain).
YOLOv9 doesn't ship the auxiliary "branch" head used during from-scratch training.
YOLO-NAS uses LibreYOLO's letterbox preprocessing (close but not identical to SuperGradients' exact pipeline) — documented as a known parity gap.

When you port a new model:

Spend agentic time reading the upstream training code first. The model's forward pass is the easy 30%. Augmentations, optimizer param groups, LR schedule shape, EMA dynamics, multi-scale collation, gradient clipping — that's the other 70%, and it dominates fine-tune quality.
Decide explicitly which pieces you skip. Document them in a commit message or in the family's docstring.
Aim for fine-tune parity, not paper parity. Test by loading upstream weights, fine-tuning on coco128 or marbles for ~10 epochs, and verifying mAP improves.
Don't pretend gaps don't exist. If the augmentation chain is half what upstream uses, say so. A 5-line transforms.py that says "TODO: port RandomZoomOut" is more honest than silently shipping a degraded recipe.

A useful agent prompt: "In <upstream-repo>/, identify every augmentation, loss weight, optimizer param group, LR schedule, and EMA behavior used during training. Output a concrete checklist of what would need to be ported."

The minimum bar for "this port loads upstream weights correctly"

Before claiming inference is correct, run a tensor-equivalence check:

Import the upstream model class side-by-side with yours.
Build both with the same config / size; cross-load the upstream state_dict into yours and inspect the missing/unexpected key diff. Use strict=True only if your port loads the full upstream state dict (this is the case for D-FINE/DEIM-style DETR ports that mirror upstream attribute naming exactly). For ports that intentionally drop upstream layers (YOLOX strips training-state buffers; YOLOv9 drops the auxiliary head and remaps legacy detect.* -> head.* keys; YOLO-NAS unwraps SuperGradients EMA buffers), use strict=False and assert that the missing/unexpected key set matches a documented expected set. Silent unexpected drift, not the strict mode itself, is the thing to catch. See landmine #14 for _strict_loading().
Run identical inputs through both at FP32 and assert max_abs_diff == 0 on the output tensors that come from layers present in both models, in both eval() and train() modes.

This recipe validates architectural fidelity, attribute naming, and state-dict compatibility in one shot. Save the script as a one-off under the family's test directory; it pays for itself on every future upstream version bump. Fine-tune sanity (load -> 10 epochs on coco128 or marbles -> mAP improves) is the second gate, not the first.

If your family is a wrapper around an upstream PyPI package (RF-DETR pattern), this check reduces to "import the upstream package and verify it produces the documented outputs"; substitute accordingly.

7. Per-family integration: what each one actually shipped

YOLOX (`models/yolox/`)

Files: __init__.py, model.py, nn.py, trainer.py, utils.py, loss.py (6 files).
Pattern: YOLO-grid. Pixel cxcywh targets, BGR 0–255 inference, letterbox preprocessing.
Augmentations: reuses shared libreyolo/training/augment.py (mosaic + mixup); no family-local transforms.py.
Recipe gaps: minimal. Closest to upstream of any family.

YOLOv9 (`models/yolo9/`)

Files: 7 files. Largest YOLO-grid port (~2.3k LoC) due to ELAN/RepNCSPELAN modules.
Pattern: YOLO-grid. RGB 0-1 normalisation. Validation path letterboxes; the default inference path uses plain Image.resize (utils.py:_postprocess defaults letterbox=False), a known asymmetry called out in the val-preprocessor docstring.
Recipe gaps: from-scratch auxiliary head dropped; mixup disabled; trainer builds three param groups (BN / Conv / Bias) at the same lr rather than upstream's three at distinct LRs (no backbone-LR split).

YOLO-NAS (`models/yolonas/`)

Files: 7 files. Native nn but state-dict-compatible with SuperGradients' SG checkpoints.
Pattern: YOLO-grid.
Recipe gaps: SG's exact augmentation pipeline replaced by LibreYOLO's standard letterbox path (documented).
Quirks: weights download from Deci's CDN, not LibreYOLO's HF org (license).

RF-DETR (`models/rfdetr/`)

Files: 6 files (__init__.py, config.py, model.py, nn.py, trainer.py, utils.py). RF-DETR is the only family that keeps its <Family>Config family-local (in models/rfdetr/config.py) rather than appending it to libreyolo/training/config.py.
Pattern: DETR. Wrapper, not native — delegates to the rfdetr PyPI package.
Subprocess isolation lives in tests/e2e/test_rf1_training.py, not in the trainer itself; the trainer calls upstream model.train() directly.
Recipe gaps: training is upstream's, so few. Inference path adapts upstream's postprocessor (cxcywh → xyxy, COCO 91→80 class remap).

D-FINE (`models/dfine/`)

Files: 16 files. Largest port (~4k LoC).
- Architecture: nn.py, backbone.py, encoder.py, decoder.py, common.py, ms_deform.py
- Numerical core: fdr.py (FDR math, separately parity-tested), box_ops.py
- Loss: loss.py, matcher.py, denoising.py
- Wrapper + IO: model.py, utils.py, transforms.py, trainer.py
Pattern: DETR.
Recipe gaps: from-scratch HGNetV2 pretrained backbone download disabled (users start from D-FINE's own COCO/obj2coco checkpoints). Backbone-LR multiplier added in v2. Fine-tune now closely matches upstream's recipe.

Shared conversion helpers (`weights/_conversion_utils.py`)

When a family does need a conversion script, use the shared helpers rather than reinventing the plumbing. They cover the parts that every script ends up needing:

Helper	Purpose
`add_repo_root_to_path()`	so `python weights/convert_.py` can import `libreyolo.` cleanly
`load_checkpoint(path)`	`torch.load` with `map_location="cpu"` and `weights_only=False`; centralises the post-PyTorch-2.6 default change so it cannot bite per-script
`extract_state_dict(ckpt, *, prefer_ema=True)`	unwraps the common upstream layouts: `{"ema": {"module": ...}}`, `{"model": ...}`, `{"state_dict": ...}`, raw dicts, or anything with a `.state_dict()` method
`strip_state_dict_prefix(state_dict, prefix)`	drops a leading prefix when the upstream wrapper is `model.model.<...>`
`wrap_libreyolo_checkpoint(state_dict, *, model_family, size, nc, names=None)`	builds the canonical metadata-wrapped LibreYOLO format; `build_class_names(nc)` falls back to COCO-80 names for `nc=80`, generic `class_<i>` otherwise
`save_checkpoint(checkpoint, output_path)`	creates parent dirs and writes

If upstream ships .safetensors (DEIMv2 was the first family to surface this), don't try to feed it through load_checkpoint — torch.load won't read it. Dispatch on Path(input).suffix == ".safetensors", construct a fresh native model instance (e.g. LibreDEIMv2Model(...)), load with safetensors.torch.load_model(model, path, strict=True), and read .state_dict() back. strict=True is the right default here: it fails the conversion loudly on any structural drift between upstream's safetensors layout and your port, which is exactly the bug class a silent unwrap would hide. safetensors is added as a runtime dep in pyproject.toml rather than an extra, since DETR families are increasingly publishing weights in this format. weights/convert_deimv2_weights.py is the reference implementation.

weights/README.md classifies each shipped conversion as one of three tiers, which is the right framing for a new one too:

metadata-wrap (D-FINE, DEIM): module names already match, extract_state_dict -> wrap_libreyolo_checkpoint -> save_checkpoint, ~50-100 LoC end-to-end.
light structural (RT-DETR HGNetv2): EMA unwrap + a small set of encoder/decoder key remaps + drop-list for tensors absent in LibreYOLO's port. Saves a flat converted state_dict (no metadata wrap on this path historically).
heavy structural (YOLOv9): translate numbered upstream layer indices into LibreYOLO semantic module names, remap sublayer names for ELAN / RepNCSPELAN / AConv / ADown / SPP / heads, skip the auxiliary head, inject fixed DFL weights. Hundreds of LoC.

Reference test for the helpers: tests/unit/test_weight_conversion_utils.py.

Files-touched matrix (universal centralizing files)

Every family edits these:

File	Why
`libreyolo/models/<family>/{__init__.py, model.py, nn.py, trainer.py, utils.py}`	family-local code
`libreyolo/models/__init__.py`	one-line family import (drives auto-registration order)
`libreyolo/__init__.py`	`LibreYOLO<Family>` export + `__all__`
`libreyolo/training/config.py`	append `<Family>Config(TrainConfig)` (RF-DETR is the exception: keeps `RFDETRConfig` family-local at `models/rfdetr/config.py`)
`libreyolo/validation/preprocessors.py`	append `<Family>ValPreprocessor`
`tests/unit/test_<family>_*.py`	parity / shape / loss / smoke tests against upstream — done before claiming inference is correct
`tests/e2e/conftest.py`	append rows to `MODEL_CATALOG`

Conditional edits depending on family:

File	When
`libreyolo/models/<family>/loss.py`	non-trivial loss (everyone except RF-DETR)
`libreyolo/models/<family>/transforms.py`	augmentation diverges from the shared `training/augment.py` mosaic+mixup default (D-FINE adds tv2-based ops; YOLOv9 / YOLO-NAS subclass)
`libreyolo/training/scheduler.py`	paper recipe needs an LR shape that doesn't exist yet — e.g. D-FINE added `FlatCosineScheduler` (warmup → flat → cosine tail). Add a new generic `BaseScheduler` subclass; do not put schedulers under `models/<family>/`
`libreyolo/training/ema.py`	EMA decay needs to change at runtime (e.g. mid-train restart). D-FINE added `set_decay(decay, ramp=False)` — generic enough to leave shared
`libreyolo/backends/base.py`	output shape diverges from YOLO grid (DETR families need it)
`libreyolo/backends/tensorrt.py`	output names differ from `"output"` (DETR families)
`libreyolo/export/exporter.py`	needs an `_model_context` branch (D-FINE has one for the deploy wrapper)
`libreyolo/export/onnx.py`	output count differs from 1 or 3 (DETR's 2-output case)
`weights/convert_<family>_weights.py`	needed when upstream ships a checkpoint format LibreYOLO can't load directly (extra wrapping, EMA buffer drops, key remaps, or just no `model_family` metadata). Skip it when (a) your family is a wrapper that consumes upstream checkpoints in-process (RF-DETR), (b) your top-level module attributes mirror upstream's so SG/upstream `state_dict`s load with `_strict_loading=False` plus an in-process unwrap helper (YOLO-NAS, YOLOX), or (c) the conversion is trivial enough to keep inside `LibreYOLO("upstream.pt")`. When you do write one: wrap with metadata (`model_family`, `size`, `nc`, `names`) so the factory routes without filename heuristics; print a missing/unexpected-key diff after loading the wrapped dict into a fresh model (silent drops are a frequent source of slow-burn fine-tune bugs); write atomically (`.tmp` + rename) so an interrupted run can't half-write a corrupt `.pt`; fail loudly on shape mismatches. The script can stay under ~100 LoC if your top-level attributes mirror upstream's names so no key remapping is needed (YOLO-NAS is the canonical example of this design).
`pyproject.toml`	mandatory for wrapper integrations (RF-DETR's `[rfdetr]` extra is required, not optional — the wrapper is non-functional without the dep). For native ports, only if you genuinely can't avoid a new dep.

8. The integration-proof tests

You're integrated when both pass for every size of your family.

Test	What it proves	Notes
`tests/e2e/test_val_coco128.py`	Inference loads + runs; preprocessing + class mapping + postprocessing are correct.	Runs `model.val(data="coco128.yaml")`, asserts mAP50-95 ≥ 0.18.
`tests/e2e/test_rf1_training.py`	Training improves the model on a real dataset (marbles).	Trains 10 epochs, asserts post-mAP > pre-mAP and post-mAP ≥ 0.05.

To wire your family into both: append rows to MODEL_CATALOG in tests/e2e/conftest.py. Both tests parametrize over the catalog.

Optional faithfulness gate

test_val_coco128's mAP50-95 ≥ 0.18 floor is a sanity check that preprocessing and class mapping are wired correctly; it is not a faithfulness check. A silent regression in the conversion script or numerical drift in the model port can still leave the floor intact while losing several mAP. The faithfulness gate is loading the converted official checkpoint and asserting full-COCO mAP >= published - 0.5.

Recommended pattern when users care about matching upstream's published numbers: tests/nightly/test_<family>_official_ckpt_map.py, gated on a <FAMILY>_OFFICIAL_CKPT_DIR env var, kept out of the default suite to avoid pulling multi-GB weights in CI. This is opt-in by design.

DETR families: skip the last_loss < first_loss assertion in test_rf1_training. DETR total loss is the sum of ~38 weighted aux terms (per-decoder-layer + pre + encoder-aux + DN paths) and is too noisy on small datasets for monotonic-decrease to be reliable. RF-DETR's branch and the D-FINE branch both exempt themselves.

9. Silent-corruption landmines (in priority order)

The ones below have actually burned integrations in this repo. Each line is a one-shot: [which family hit it] — what to do.

Color space mismatch between training transform and val preprocessor (YOLOX BGR vs YOLOv9 RGB) — pin the convention in both docstrings, cross-check.
Target format mismatch (D-FINE) — DETR criteria want list[dict] but the data pipeline yields padded (B, max_labels, 5); translate in on_forward.
can_load() too greedy (RF-DETR almost stole D-FINE checkpoints) — match on tokens unique to your architecture; never "backbone" or "weight".
Backbone LR multiplier missing (DETR families) — silent ~0.5 mAP loss in fine-tuning. Implies per-group LR + _train_epoch override.
Multi-scale collate epoch propagation (D-FINE) — collate needs set_epoch() called from the trainer at each epoch start.
Stop-epoch augmentation policy (D-FINE) — disable RandomZoomOut/RandomIoUCrop etc. at epoch N. Different from no_aug_epochs (which kills mosaic for last N).
labels_getter=lambda is unpicklable under Python 3.14's forkserver (D-FINE on macOS) — use a module-level function for SanitizeBoundingBoxes.
RandomIoUCrop has no p parameter in torchvision v2 — wrap with RandomApply.
MPS-specific torch bugs in DETR backward (D-FINE) — provide a _setup_device override that falls back to CPU; CUDA path stays unchanged.
Post-train device drift (D-FINE) — when the trainer fell back to CPU, the wrapper's self.device is still MPS; model.val() after model.train() hits a device mismatch. End train() with self.model.to(self.device).
ONNX opset 13 default — DETR families with deformable attention need ≥ 16. Set per-family default in BaseExporter.__call__.
NCNN can't handle DETR ops (D-FINE) — block the export early with NotImplementedError instead of producing a graph the runtime can't load.
head.export flag missing (YOLO-grid families) — without it, ONNX bakes static shapes that work only at the exact resolution exported.
strict=True state-dict loading — override _strict_loading() = False if upstream checkpoints carry EMA buffers, profiling state, or auxiliary heads.
Cross-family rejection on cross-family transfer — when intentionally splicing weights, pop model_family from the donor checkpoint dict.
EMA decay too low — if you lower it from 0.9999 without good reason, early-epoch evals show flat or decreasing mAP because the EMA hasn't settled.
Letterbox vs. plain resize — uses_letterbox property must match the training transform; DetectionValidator reads it for target rescaling.
_train_epoch override drift (DETR families) — when you copy the parent loop to add per-group LR + grad clip + epoch propagation, leave a comment "kept in sync with BaseTrainer._train_epoch as of " so drift is auditable. Promote to shared hooks if a third family needs the same overrides.
Per-size defaults copy-pasted from one size to all. When upstream ships per-size YAMLs, build a side-by-side table of every override per size before assuming the s config applies to n. DETR examples: freeze_at, freeze_norm, backbone-LR multiplier, EMA decay, lr_gamma, peak lr. YOLO-grid examples: BN epsilon / momentum overrides on the smallest size, INPUT_SIZES, depth/width multipliers, head reg-max. The default is rarely uniform across sizes. A single-row table that's wrong on n/m is a silent ~1 mAP regression on those sizes. Cross-check each row against the actual upstream YAML, not just the first one you ported.
<Family>Config.min_lr_ratio is one knob trying to cover several upstream lr_gamma values. FlatCosineScheduler computes min_lr = lr * min_lr_ratio, so the ratio is the upstream lr_gamma by another name. Different families pick different values (D-FINE: 1.0 = no cosine decay, by design; DEIM: 0.5; ECDET: 0.5); within a family, sizes can disagree too (DEIM-N overrides to 1.0). The trap is picking any uniform value without cross-checking. A ratio of 1.0 is correct when upstream's lr_gamma is 1.0 and a silent bug otherwise; the ratio is not the issue, the cross-check is.
Wasted ImageNet backbone download in _init_model (RT-DETR fixed in bf16a2b). When a user constructs your wrapper from a LibreYOLO checkpoint, BaseModel.__init__ will eventually load their weights and overwrite anything you initialised the backbone with. If _init_model unconditionally fetches the upstream ImageNet-pretrained backbone, you pay a 50-100 MB download for nothing on every checkpoint load. Two related pieces guard against it: (a) BaseModel.__init__ peeks at the checkpoint via cls.detect_size before the first _init_model call so the right architecture is built from the start, which means your detect_size has to work on raw upstream state_dicts (not just LibreYOLO-wrapped ones); (b) inside _init_model, check self._loading_pretrained_checkpoint and pass backbone_pretrained=False (or the equivalent kwarg your model uses) when the flag is set.

10. Workflow

Check both licenses — code (upstream LICENSE file) and weights (HF model card). If either is GPL/AGPL, plan for a plugin-only ship; if weights are non-redistributable, link to the upstream CDN like YOLO-NAS.
Pick the pattern: YOLO-grid or DETR. Skim the existing family that's closest.
Audit upstream's training recipe with an agent. Decide what you skip.
Implement family-local code (models/<family>/) — the model, postprocess, and inference wrapper first. Verify inference parity using the recipe in §6 (cross-load upstream state_dict, assert max_abs_diff == 0 on identical inputs over the layers present in both models). For wrapper integrations like RF-DETR, substitute "verify the wrapped package produces its documented outputs."
Wire central files — models/__init__.py, __init__.py, config.py, validation/preprocessors.py. Family must load via LibreYOLO("Libre<Family>s.pt").
Implement the trainer — trainer.py, transforms.py, loss.py. Verify the loss matches upstream on synthetic inputs (parity test, 1e-5 tolerance).
Wire into MODEL_CATALOG and run test_val_coco128 + test_rf1_training.
Test ONNX export at minimum. If the rest of the export formats are family- compatible, run them too.
Upload weights to HuggingFace under LibreYOLO/Libre<Family><size>/ — see the separate libreyolo-upload-hf-model skill (skip if upstream license forbids redistribution; link to upstream CDN instead).