name: astrea description: Use for hipfire quant calibration, imatrix-driven experiments, KLD/PPL quality evaluation, k-map/format selection, MQ/HFQ/HFP/MFP tradeoff work, ParoQuant-style weight transform planning, and KV policy planning. Use when deciding whether a calibrated model candidate should be promoted, rejected, packaged, or sent through Atlas for AR/DFlash perf validation.
Astrea
Astrea is hipfire's agent-native model calibration harness. It is a Python CLI for humans and a workflow contract for agents. The CLI emits plain JSON artifacts for weight calibration, ParoQuant-style transform planning, KV-cache policy planning, and future single-file model packaging; the agent supplies judgment, guardrails, and the next experiment.
Core Rule
Do not claim a quant candidate is better without measured quality evidence.
Do not claim it is ship-ready without runtime compatibility and perf evidence.
Do not claim Astrea wrote a packaged model unless the loader/package work exists;
bundle-plan is a contract artifact, not a model writer.
CLI
Run from the hipfire repo root:
python3 scripts/astrea.py inspect --model MODEL [--imatrix IMATRIX] [--format FORMAT] [--pretty] [--out PATH]
python3 scripts/astrea.py imatrix-join --model MODEL --imatrix IMATRIX [--max-tensors N] [--pretty] [--out PATH]
python3 scripts/astrea.py fingerprint [--engine-root REPO] [--pretty] [--out PATH]
python3 scripts/astrea.py plan --model MODEL --format FORMAT --method METHOD [--recipe-stage STAGE:METHOD] [--imatrix IMATRIX] [--source-dir BF16_DIR] [--eval-command CMD] [--atlas-command CMD] [--pretty] [--out PATH]
python3 scripts/astrea.py calibrate --plan PLAN.json [--source-dir BF16_DIR] [--write-candidate] [--max-tensors N] [--tensor-filter NAME] [--workers N] --pretty [--out PATH]
python3 scripts/astrea.py eval --plan PLAN.json [--run] --pretty [--out PATH]
python3 scripts/astrea.py metrics --quality-json result-data.json --candidate-variant NAME [--baseline-variant NAME] [--floor-variant NAME] [--arch ARCH] [--scoring-mode MODE] [--engine-root REPO] --pretty [--out PATH]
python3 scripts/astrea.py policy --model MODEL --base-format FORMAT --promotion-format FORMAT (--sensitivity-json SCORES.json | --imatrix IMATRIX) --max-extra-bytes N [--method METHOD] [--objective dynamic-tensor-policy|moe-probe|model-ingress|kv-policy] [--domain weights|kv] [--model-family FAMILY] --pretty [--out PATH]
python3 scripts/astrea.py promote --policy POLICY.json --source-dir BF16_DIR --output CANDIDATE.hfq [--max-tensors N] [--tensor-filter NAME] --pretty [--out PATH]
python3 scripts/astrea.py kv-profile --model MODEL [--mode q8|asym3|triattn|cask|turbo3|rotor] [--triattn PATH] [--model-family FAMILY] [--engine-root REPO] --pretty [--out PATH]
python3 scripts/astrea.py bundle-plan --model MODEL --output MODEL.hfq [--include weights|paro|kv-policy|triattn|evidence] [--triattn PATH] [--policy-id ID] --pretty [--out PATH]
python3 scripts/astrea.py report ARTIFACT.json ... --pretty [--out PATH]
Prefer --pretty for human review and compact JSON for ledgers. Use --out
for reproducible run directories; it writes JSON to the path and leaves stdout
empty.
Workflow
- Identify the target model, desired format, reference model, eval dataset, and budget.
- Run
inspectto fingerprint the model and imatrix inputs. Useimatrix-joinwhen you need a focused report of GGUF imatrix tensor coverage against HFQ tensor names before planning a calibration run. - Run
fingerprintto capture the engine path. This records git state, relevant source hashes,HIPFIRE_ROPE_INTERLEAVED_LEGACY, and whether the default Qwen3.5 FA RoPE path ishalfsplit,interleaved_legacy, orunknown. - Run
planto create a bounded experiment artifact. Calibration methods are stackable recipe stages. Use repeated--methodflags for the candidate stack, and optional repeated--recipe-stageflags when you need an explicit stage order such asscale_search:imatrix-scale,activation_aware:awq,rounding:gptq,promotion:kmap, ortransform:quarot. - Run
calibrate. Without--write-candidate, Astrea joins GGUF imatrix logical tensors to HFQ tensor names and reports whether the candidate is ready for a weight-mutation pass. With--write-candidate, Astrea can write an MFP4 imatrix-scale candidate or an MQ4 AWQ-style activation-weighted clipping candidate by copying the base HFQ and patching selected same-size tensor byte ranges. Use--max-tensorsor--tensor-filterfor smoke passes before a full rewrite. Use--workers Nfor process-parallel tensor rewrites on large models; start with 4 workers unless memory headroom is known. - Run
evalwith KLD/PPL commands against a BF16 or accepted higher-precision reference. - Run
metricson thekld_reduce.pyresult-data.jsonartifact. Prefer a Q8 or accepted high-precision floor row when available, so Astrea can report above-floor KLD and recovered quantization damage percentage. Always pass--engine-rootwhen the evaluated engine is not the checkout running Astrea. - Run
policywhen you want an Unsloth-like dynamic quant policy. It ranks tensors by sensitivity per added byte and emits a mixed-format promotion recipe under a size budget. Use--objective moe-probefor MoE models and--objective model-ingresswhen bringing up a new model family; these add router/expert and alias-map probe work items to the artifact. Add repeated--domainflags when the policy spans both weight transforms and KV-cache policy. Use--method paroquantto add the Paro weight-transform lane; this is a planned transform section until the quantizer/runtime have a compatible implementation. For rotated MQ/MFP bases promoted to Q8/F16, Astrea automatically bundles runtime anchor projections (q,qkv,gate) with dependent projections so mixed-format candidates do not read stale normalized activation buffers. - Run
promotewhen a policy selects tensors for mixed-format promotion. Today this writes selectedq8promotions as runtime-compatibleQ8F16tensor records and rebuilds the HFQ index/data payload. Use--max-tensorsfor smoke candidates before writing a full policy. Legacy policies are also expanded with required runtime anchors at write time. Re-runmetricsafter every promotion candidate because the policy byte model is only a selector, not quality evidence. - Run
kv-profilewhen a candidate changes KV-cache behavior or when a model should carry an embedded KV policy. Include at least the current baseline (asym3) and the candidates being investigated (triattn/cask,turbo3,rotor, or related modes). The output is the policy/evidence shape Atlas should join against AR and DFlash perf rows. - Run
bundle-planwhen the candidate needs a future single-file model package. The target is an HFQ package-style container with weights, transform metadata, KV policy, and TriAttention/CASK centers embedded inside the model artifact. External sidecars are not the target. Loader, daemon, CLI, and kernel support remain deferred runtime work until implemented. - If quality improves, run Atlas AR and DFlash perf collection before any promotion claim.
- Use
reportto summarize evidence and recommend promote, reject, or iterate.
Format Guidance
- Start with
mfp4 + imatrix-scalewhen reproducing the known high-signal calibration path. - For
mq4, Astrea can now write a same-format AWQ-style activation-weighted clipping candidate. The first 9B run improved PPL but slightly worsened KLD, so treat this recipe as an empirical lane to iterate, not a validated win. Compare stackable recipes such as AWQ, imatrix-scale, GPTQ, k-map/promotion, and transform stages empirically. - Treat ParoQuant as the highest-priority transform lane to prototype next after the existing imatrix/AWQ/GPTQ/k-map evidence loop is reliable. It needs a producer-consumer contract, not just an Astrea plan.
- Treat
asym3as the current KV baseline. TriAttention/CASK are packageable persistent sidecar data once embedded in the model package. TurboQuant-like and RotorQuant-like KV policies are research candidates until kernels, loader metadata, and AR/DFlash quality gates exist. - For MoE, separate router tensors, expert tensors, and shared dense tensors in the policy artifact. Expert promotion should be justified by expert-hit distribution plus quality deltas, not static tensor names alone.
- Keep
mq3,mq4,mq6,hfq4,hfq6,hfp4, andmfp4eligible for experiments, but tie every recommendation to quality and perf artifacts.
Guardrails
- Preserve producer-consumer contracts. HFP/MFP candidates must remain compatible with the current fast-path block-size/runtime requirements unless quantizer, loader, docs, and kernels are moved together.
- Attach exact eval commands, reference model, dataset/chunk count, and output artifact size to quality claims.
- Do not compare KLD/PPL rows across different engine fingerprints or RoPE conventions without explicitly marking the comparison as historical.
- Attach Atlas rows for AR and DFlash when the candidate affects runtime formats used by both paths.
- Treat dry-run Astrea artifacts as plans only, not calibrated weights.
- Treat
kv-profileandbundle-planartifacts as policy contracts only. They should drive loader/kernel/Atlas work, but they do not prove runtime support. - If an eval run emits non-finite logits, fail the candidate. Do not accept
KLD=0rows from older evaluators unless the logit path is confirmed finite.