name: rummi-leveling-ml description: Run and evolve the Rummi Poker leveling/ML pipeline: fresh Dart simulator data collection, chunked contest_policy_v1 runs, economy audit, feature table generation, supervised model smoke reports, and safe tracked report updates. Use when the user says "레벨링 ML", "fresh 데이터", "밸런스 ML", "학습 데이터 쌓기", "시뮬 데이터 수집", or asks to continue the internal leveling data pipeline.
Rummi Leveling ML
Use this skill from the repository root: /Users/cheng80/Desktop/flame_binggo_card.
Core policy:
- Use internal Dart simulator logs first. Do not rely on external poker/RL datasets for runtime balance truth.
- Keep old/archive rows out of current training inputs unless explicitly used as low-trust historical prior.
- Keep raw JSONL, generated CSV, venvs, and logs ignored. Commit lightweight metadata, metrics, feature importance, reports, scripts, and docs.
- Models create candidate evidence only. Never auto-apply target, boss, economy, market, item, or Jester balance changes without human review.
- Prefer
/Users/cheng80/flutter/bin/dartand/Users/cheng80/flutter/bin/flutter.
Quick Run
For the standard fresh pipeline, run:
.agents/skills/rummi-leveling-ml/scripts/run_fresh_leveling_pipeline.sh
Useful overrides:
CHUNKS=3 RUNS_PER_CHUNK=2 DO_MODEL=0 \
.agents/skills/rummi-leveling-ml/scripts/run_fresh_leveling_pipeline.sh
BOT=planner_v2 OUT_PREFIX=logs/sim/fresh_runtime_planner_probe \
.agents/skills/rummi-leveling-ml/scripts/run_fresh_leveling_pipeline.sh
Grid sweep mode widens the default market/loadout axes:
MODE=grid CHUNKS=12 RUNS_PER_CHUNK=4 \
.agents/skills/rummi-leveling-ml/scripts/run_fresh_leveling_pipeline.sh
Model target controls:
MODEL_TARGETS="clear_rate avg_score_ratio cleared_majority" \
.agents/skills/rummi-leveling-ml/scripts/run_fresh_leveling_pipeline.sh
Default pipeline:
tools/sim/chunked_balance_run.py --resumetools/sim/economy_audit.pytools/leveling/build_feature_table.py- optional
tools/leveling/train_leveling_model.pyclear_rate: regressionavg_score_ratio: regressioncleared_majority: classification
- print row count, summary paths, metadata/report paths
Manual Checks
After a run, inspect:
wc -l "$OUT_PREFIX.jsonl"
cat "$OUT_PREFIX"_manifest.json
cat "$OUT_PREFIX"_economy_audit.json
Expected quality gates:
- JSONL rows >= requested minimum, normally 5,000+ for a real fresh dataset.
- manifest
complete: true. missing_cost_events: 0in economy audit for shop slot market traces.- model smoke report explicitly says whether it is only scaffold or usable for candidate ranking.
Commit Guidance
Commit these if changed:
analysis/leveling/data/features/*.metadata.jsonanalysis/leveling/models/**/*_metrics.jsonanalysis/leveling/models/**/*_feature_importance.csvanalysis/leveling/reports/*.mdtools/sim/*.dart,tools/sim/*.py,tools/leveling/*.py- this skill and its scripts
Do not commit:
logs/analysis/leveling/generated/.venv_leveling/- raw JSONL or generated feature CSV
Validation before commit:
/Users/cheng80/flutter/bin/dart analyze tools/sim/run_balance_sim.dart tools/sim/summarize_balance_jsonl.dart
/Users/cheng80/flutter/bin/flutter test test/tools/sim/chunked_balance_run_test.dart
python3 -m py_compile tools/sim/chunked_balance_run.py