add-model

name: add-model description: Add a new ML model to the TabArena benchmark system. Use this skill whenever the user wants to integrate a new tabular ML model into TabArena — even if they just say "add X model", "integrate X", "support X", or "wrap X for the benchmark". Creates all required files: the AutoGluon model wrapper, the search-space generator, the per-model `info.py`, a test, and the `pyproject.toml` extra. Reads existing similar models for inspiration and optionally fetches documentation URLs to understand the new model's API. argument-hint: [] [] user-invocable: true

Add Model to TabArena

This skill integrates a new tabular ML model into the TabArena benchmark.

Every model lives in one folder at packages/tabarena/src/tabarena/models/<ModelKey>/. That folder contains the wrapper, the HPO generator, and the metadata — and is auto-discovered by tabarena.models._registry.discover_models(). There is no separate benchmark/models/ag/ layout anymore.

Per model, you create up to 5 source files plus one test file, then edit two existing files.

Step 0: Gather inputs

Parse $ARGUMENTS for the model name. Then collect (ask only for what's missing or unclear):

Input	Example	Notes
`ModelName`	`"TabPFN-2.6"`	Human-readable display name
`ModelKey`	`"tabpfnv26"`	Snake_case folder/file key (derive from ModelName)
`ClassName`	`"TabPFNv26"`	CamelCase class prefix (derive from ModelName)
`ag_key`	`"TA-TABPFN-2.6"`	AutoGluon registry key; prefix with `"TA-"`
`ag_name`	`"TA-TabPFN-2.6"`	AutoGluon display name; same as `ag_key` with proper casing
`pip_package`	`"tabpfn>=7.0.0"`	Pip install spec for pyproject.toml
`doc_url`	`"https://..."`	Documentation / GitHub / paper URL
`model_type`	`foundation`	`foundation`, `torch`, or `sklearn`
`supports_gpu`	`true`	Whether the model uses GPU
`problem_types`	`binary,multiclass,regression`	Supported task types

Deriving keys: "TabPFN-2.6" → key "tabpfnv26", class prefix "TabPFNv26". "TabSTAR" → key "tabstar", class prefix "TabStar". Strip hyphens, lowercase for key; CamelCase for class.

Step 1: Understand the model API

If doc_url was provided, fetch it with WebFetch to understand:

Import path (e.g., from tabstar.tabstar_model import TabSTARClassifier)
Constructor parameters and their defaults
.fit(X, y, ...) signature
.predict() / .predict_proba() signature
Key hyperparameters to expose

Step 2: Pick the right base class and reference model

Choose the most similar existing model to read for detailed inspiration:

Model type	Base class	Read this reference model
Foundation / pre-trained / GPU (e.g. TabPFN, SAP-RPT-OSS, TabSTAR)	`AbstractTorchModel`	`packages/tabarena/src/tabarena/models/sap_rpt_oss/model.py`
Torch NN trained from scratch (e.g. TabM, RealMLP)	`AbstractTorchModel`	`packages/tabarena/src/tabarena/models/tabm/model.py`
CPU / sklearn-like (e.g. KNN)	`AbstractModel`	`packages/tabarena/src/tabarena/models/knn/model.py`

Read the reference model file now (use the Read tool). Use it as a structural guide — you will adapt rather than copy.

Also read the annotated patterns in references/model_patterns.md — it contains templates for model.py, hpo.py, info.py, and the test file.

Step 3: Create new files

Create these files (paths relative to the repo root):

3a. `packages/tabarena/src/tabarena/models/{ModelKey}/init.py`

Re-export the public symbols so from tabarena.models.{ModelKey} import ... works:

from __future__ import annotations

from tabarena.models.{ModelKey}.hpo import gen_{ModelKey}
from tabarena.models.{ModelKey}.info import {ModelKey}_info, {ModelKey}_method_metadata

__all__ = ["gen_{ModelKey}", "{ModelKey}_info", "{ModelKey}_method_metadata"]

3b. `packages/tabarena/src/tabarena/models/{ModelKey}/model.py`

The AutoGluon wrapper class. Use the template in references/model_patterns.md section "Model wrapper template". Key points:

Start with from __future__ import annotations
Inherit from AbstractTorchModel (GPU/torch models) or AbstractModel (CPU models)
Set ag_key, ag_name, ag_priority = 65, seed_name = "random_state"
Implement _fit(), _set_default_params(), supported_problem_types()
For GPU models: also implement get_device(), _set_device(), _get_default_resources(), get_minimum_resources(), _get_default_ag_args_ensemble() (with fold_fitting_strategy: sequential_local), _class_tags() (with can_estimate_memory_usage_static: False), _more_tags() (with can_refit_full: True)
Docstring must include: description, paper title, authors, codebase URL, license
Keep optional third-party imports (the wrapped library itself) inside _fit / per-method scope so importing this module never requires the optional dep at top-level

3c. `packages/tabarena/src/tabarena/models/{ModelKey}/hpo.py`

The search-space generator. By default use an empty search space (like TabPFN-2.6) — only add hyperparameters if the user explicitly asks or if the model has obvious tunable knobs. See template in references/model_patterns.md section "hpo.py template".

3d. `packages/tabarena/src/tabarena/models/{ModelKey}/info.py`

Defines {ModelKey}_method_metadata: MethodMetadata and {ModelKey}_info: ModelInfo. info.py is the single source the auto-discovery registry walks — populating it correctly is how the model becomes visible to discover_models(). See template in references/model_patterns.md section "info.py template".

3e. Multi-file support code (optional)

If the wrapper needs helper modules (preprocessors, vendored upstream code, large internal classes), put them in a private subfolder of packages/tabarena/src/tabarena/models/{ModelKey}/:

_internal/ — for hand-written helpers (preprocessors, internal classes, adapters)
_vendor/ — only for code copied verbatim from an upstream project; keep the original layout/license alongside

Both subfolders need their own empty __init__.py. Import them from model.py via absolute paths, e.g. from tabarena.models.{ModelKey}._internal.preprocessing import Preprocessor.

3f. `tst/models/test_{ModelKey}.py`

See template in references/model_patterns.md section "Test template". Include a minimal FitHelper.verify_model() call with model_hyperparameters={} (add a speed-up param if the model has one like max_epochs=1). Wrap the import in try/except ImportError and pytest.skip(...) so the test is automatically skipped when the optional dependency isn't installed.

Step 4: Edit existing files

Edit both locations in a single pass (read each file first, then edit):

4a. `packages/tabarena/src/tabarena/models/init.py`

Add a lazy entry for the new class so from tabarena.models import {ClassName}Model works:

_LAZY_CLASSES = {
    ...
    "{ClassName}Model": "tabarena.models.{ModelKey}.model",
    ...
}

Also add "{ClassName}Model" to __all__ and (under TYPE_CHECKING) to the static from tabarena.models.{ModelKey}.model import {ClassName}Model block, both kept alphabetised.

4b. `packages/tabarena/src/tabarena/models/utils.py`

Add to the name_to_import_map dict in get_configs_generator_from_name(). The key is the friendly model name (often the same as ModelName):

"{ModelName}": lambda: importlib.import_module("tabarena.models.{ModelKey}.hpo").gen_{ModelKey},

4c. `packages/tabarena/pyproject.toml`

The pyproject.toml defines a per-model extra for every supported model, plus three union extras built via self-references ("tabarena[<name>]"):

benchmark — the curated core set used for standard benchmarking. Stable and resolver-friendly. Do not add a new model here unless the user explicitly says it belongs in the core set.
extended — the layered set installed on top of benchmark for the broader model zoo. This is where most new models go.
all — experimental union of benchmark + extended + special-cased extras like probmetrics (which has conflict-prone deps and is excluded from extended on purpose). Updated automatically via tabarena[extended], so usually no manual edit needed unless the model is conflict-prone.

Always declare the pip spec exactly once in the per-model extra, then reference the model by name in the union(s). Never paste the raw {pip_package} into a union extra.

Step 1 — declare the per-model extra under [project.optional-dependencies]:

{ModelKey} = ["{pip_package}"]

Step 2 — add it to the right union via self-reference:

Situation	Edit
Default: new extended model	Add `"tabarena[{ModelKey}]"` to the `extended` extra.
Core benchmark model (only if user explicitly says so)	Add `"tabarena[{ModelKey}]"` to the `benchmark` extra.
Model has known dependency conflicts (rare, like `probmetrics`)	Skip both `benchmark` and `extended`; add `"tabarena[{ModelKey}]"` to `all` only.

After this, users can install the model alone (uv sync --extra benchmark --extra {ModelKey}), as part of the extended set (uv sync --extra benchmark --extra extended), or via --extra all.

Step 3 — verify the per-model extra matches info.py with the drift checker:

python -m tabarena.tools.sync_pyproject_extras

packages/tabarena/src/tabarena/tools/sync_pyproject_extras.py aggregates every ModelInfo.pip_extra from the registry and compares it against [project.optional-dependencies] in packages/tabarena/pyproject.toml, printing per-folder OK/DRIFT. Add --check to make it exit non-zero on drift (CI mode). Run it after editing either side so the two stay in sync.

Step 5: Auto-derived registries (no manual edit)

These pieces pick up the new model automatically once Step 3 lands — do not edit them by hand:

packages/tabarena/src/tabarena/models/_registry.py — discover_models() walks tabarena/models/*/info.py and collects every ModelInfo found. As long as info.py exports a top-level ModelInfo instance, the model joins MODEL_REGISTRY.
packages/tabarena/src/tabarena/benchmark/exec_models/registry.py — auto-derives tabarena_model_registry from get_model_registry(), so the new class becomes available through the AG registry on the next import.

Step 6: Lint

Run ruff on the new files:

ruff check --fix packages/tabarena/src/tabarena/models/{ModelKey}/ tst/models/test_{ModelKey}.py

Fix any reported issues.

Step 7: Metadata artifact (optional — only if the model has been benchmarked)

If the model already has benchmark results to register in TabArena's artifact system, add a metadata entry to the dated batch file:

packages/tabarena/src/tabarena/nips2025_utils/artifacts/_tabarena_method_metadata_YYYY_MM_DD.py

Either add to the latest file or create a new dated file if the benchmarking run is new.

Each entry is a MethodMetadata(...) object (same class used in info.py, so the entry can be the {ModelKey}_method_metadata you already defined). Then import it in _tabarena_method_metadata.py:

from tabarena.nips2025_utils.artifacts._tabarena_method_metadata_YYYY_MM_DD import (
    {ModelKey}_metadata,
)

If the model has not been benchmarked yet, skip this step entirely — info.py already declares the metadata for the registry; the artifact entry is only needed when results files actually exist.

Step 8: Report

Summarize what was created/edited:

List new files created
List files edited and what was added
Note any TODOs left for the user (e.g., implementing _predict_proba if the library API is unclear, tuning ag_priority, adding a real search space later, registering benchmark artifacts after a real run)