name: optimize-model
description: Deeply optimize a Xatu CBT transformation model query using live ClickHouse evidence. Use when the user wants performance analysis (not code edits) for models/transformations/*.sql, including dependency rendering, schema introspection, benchmark runs, and high-impact recommendations.
compatibility: Requires Bash 3.2+, Python 3, jq, HTTP access to ClickHouse on port 8123, and internet access for web research.
argument-hint: ""
Optimize CBT Transformation Model (No Model Edits)
Analyze transformation model $ARGUMENTS and report high-impact ClickHouse performance improvements.
Never modify the model file. Only report findings and recommendations.
Defaults To Confirm First
Before any analysis, collect and confirm (allow overrides):
Credentials and endpoint/database overrides are auto-loaded from the project .env by the run wrappers
(EXTERNAL_USER/EXTERNAL_PASS for the raw cluster, TRANSFORM_USER/TRANSFORM_PASS for the refined cluster).
Already-exported shell variables take precedence over .env. Note these are the PRODUCTION cluster credentials,
separate from the local docker CLICKHOUSE_USERNAME/CLICKHOUSE_PASSWORD.
- External models cluster:
endpoint:http://chendpoint-clickhouse-raw.analytics.production.ethpandaops:8123default_database:defaultusername:EXTERNAL_USER(from.env, default empty)password:EXTERNAL_PASS(from.env, default empty)
- Transformation models cluster:
endpoint:http://chendpoint-clickhouse-refined.analytics.production.ethpandaops:8123default_database:mainnetusername:TRANSFORM_USER(from.env, default empty)password:TRANSFORM_PASS(from.env, default empty)access_to_external_cluster:cluster('{raw}', database.table_name)- Cluster substitution rule: when template uses
cluster(...), external dependencies are resolved to<table>_localautomatically.
If any value is missing or ambiguous, ask follow-up questions before running benchmarks.
Credential preflight (gate before any cluster query)
The raw and refined clusters require authentication, so empty credentials will fail with
AUTHENTICATION_FAILED rather than run anonymously. Before the first run_prepare.sh/benchmark call,
resolve the credentials exactly as the wrappers do and check whether they are populated:
bash -c '. .claude/skills/optimize-model/scripts/_load_env.sh; \
echo "EXTERNAL=${EXTERNAL_USER:+set} TRANSFORM=${TRANSFORM_USER:+set}"'
If the credentials for a cluster you are about to query are empty, STOP and elicit a choice from the
user with AskUserQuestion (do not silently run and let it fail):
- Add credentials — ask the user to populate
EXTERNAL_USER/EXTERNAL_PASSand/orTRANSFORM_USER/TRANSFORM_PASSin.env(or export them), then re-resolve and continue. - Continue unauthenticated — proceed anyway. Only valid if the target cluster permits anonymous
access; otherwise expect the run to fail with
AUTHENTICATION_FAILED.
Only run the benchmark scripts once the user has chosen. Skip this gate (no prompt) when the needed credentials are already present.
Inputs
- Target transformation model (example):
models/transformations/fct_block.sql - Optional benchmark window overrides.
- Optional period mode:
sane(default): use frontmatter interval with protective caps to avoid timeouts.daring: expand beyond sane range for deeper testing.
Workflow
- Resolve target model path
- Accept full path or model name.
- Normalize to
models/transformations/<name>.sql. - Stop with a clear error if not found.
- Run one-shot preparation via wrapper (no flag discovery)
- Run:
SESSION_ID="$(date +%s)-$$"
PREP_OUTPUT="/tmp/optimize-model.${SESSION_ID}.prepare.json"
SESSION_ID="$SESSION_ID" PREP_OUTPUT="$PREP_OUTPUT" \
.claude/skills/optimize-model/scripts/run_prepare.sh "$ARGUMENTS"
- This command generates session-isolated
/tmpartifacts. Artifact names also include the bounds window, so multi-window runs (one prep per window) never overwrite each other's rendered SQL or bench/hash outputs; downstream wrappers resolve paths via the prep manifest (paths,artifact_prefix). - Use
summary,period,unresolved_fragments, andsummary.introspection_okfrom$PREP_OUTPUT. - Confirm
period.bounds_start/period.bounds_endare sane before benchmarking. - If unresolved fragments remain, ask targeted follow-up questions and rerun.
- Introspect dependency schemas
- This is already executed by prep wrapper.
- Always use
SHOW CREATE TABLEfor dependencies. - If a dependency is
Distributed, also inspect its target table (often_local) and reason from local engine/order/partition keys. - Treat
_localtableORDER BYas the effective primary key whenPRIMARY KEYis not explicitly defined. - For any
cluster(...)table access, always target_localtables instead of Distributed table names.
- Early version-aware research pass (recommended before benchmarking)
- Read ClickHouse versions from prep output first:
jq '.clickhouse_versions' "$PREP_OUTPUT"
- Use web search (explicitly) to review docs and changelog/release notes for the detected version before designing candidate rewrites.
- Prefer doing this before baseline/candidate benchmarking; if a quick baseline run already happened, continue and complete research before final recommendations.
- Use official ClickHouse docs first, then release/changelog pages relevant to the detected version and newer versions.
- Include version-aware search terms such as:
clickhouse <major.minor> release notesclickhouse <major.minor> query optimizationclickhouse replacingmergetree final performance
- Use references:
references/clickhouse-playbook.mdreferences/dependency-resolution.mdreferences/research-sources.md
- Benchmark rendered query (baseline, fast sample first)
- Run via wrapper:
SESSION_ID="$SESSION_ID" \
.claude/skills/optimize-model/scripts/run_benchmark.sh "$PREP_OUTPUT"
- Wrapper output includes
BENCH_OUTPUT=...path. - Do not manually widen bounds unless user opts into
PERIOD_MODE=daringor provides explicit bounds. - The benchmark must disable query caches where supported.
- Note cache caveats (filesystem/page cache may still influence timings).
- Use explain output and query_log metrics for evidence.
- Validate across multiple windows (required before recommendation)
- Use at least 2 windows:
sanewindow from frontmatter interval (default).- One additional window (
daringor explicit custom bounds).
- Prefer 3 windows when runtime is acceptable (short/sane/daring) to reduce overfitting.
- For each window:
- Produce window-specific baseline SQL via prep.
- Benchmark baseline.
- Benchmark each candidate rewrite derived from that same window-specific SQL via:
SESSION_ID="$SESSION_ID" WINDOW_LABEL="sane" CANDIDATE_LABEL="argmax" \
.claude/skills/optimize-model/scripts/run_candidate_benchmark.sh "$PREP_OUTPUT" "/tmp/candidate.sql"
- Run hash correctness checks baseline vs candidate for that same window.
- It is acceptable to keep candidates with negative impact in the comparison table; mark them as rejected.
- Hash correctness gate (required for every rewritten query)
- For each candidate optimization SQL file, run:
SESSION_ID="$SESSION_ID" \
.claude/skills/optimize-model/scripts/run_hash_check.sh "$PREP_OUTPUT" "/tmp/candidate.sql"
- Only treat a candidate as valid when
hashes_match = true. - If hash check fails, reject that candidate regardless of speed gains.
- Wrapper exit semantics:
- Hash mismatches are reported via
HASH_STATUS=mismatchandHASH_MATCH=false(non-fatal by default). - Set
HASH_MISMATCH_FAIL=1to make mismatches return exit code3. - Execution/runtime failures still return non-zero and
HASH_STATUS=error.
- Hash mismatches are reported via
- Hash check script behavior:
- Uses a version gate first:
- ClickHouse
>= 25.8: triesFORMAT Hash. - ClickHouse
< 25.8: skipsFORMAT Hashand directly uses exact byte-level comparison.
- ClickHouse
- If
FORMAT Hashis attempted but returnsUNKNOWN_FORMAT, automatically falls back to exact byte-level comparison using:ORDER BY tuple(*)FORMAT RowBinary(value-based fallback)- SHA256 over the response stream
- Uses a version gate first:
- If both methods fail, report that explicitly and do not claim exact-result equivalence.
- Research-backed optimization pass
- Use findings from the early research pass to prioritize only high-impact candidates.
- Candidate rewrites are allowed in temporary SQL files (for benchmarking/hash checks), including moving/combining/removing CTEs when readability and performance both improve.
- Recommendation filter
- Recommend only changes likely to produce meaningful gains.
- Do not trade readability for tiny gains.
- Explicitly evaluate memory impact.
- For ReplacingMergeTree tables, test whether
FINALcan be removed and replaced with dedup logic such asargMax(column, update_date_time)whenupdate_date_timeis the version column. - Do not treat “no duplicates right now” as sufficient evidence to remove
FINAL; future duplicates are expected in ReplacingMergeTree workflows. - Only recommend the
argMax(..., update_date_time)pattern when benchmark evidence and result checks show correctness is preserved. - Always flag sparse high-cardinality key join risks (address/username-like keys) when relevant.
- A candidate is recommended only if gains hold across multiple windows versus original baseline.
Strict Rules
- Never edit model SQL files.
- Temporary candidate SQL rewrites in
/tmpare allowed for experiments and benchmarking. - Never propose materialized views or projections for this workflow.
- Always use HTTP on port
8123for ClickHouse access (curlstyle / HTTP API). - Never use
clickhouse-clientor any native-port (9000) workflow in this skill. - Wrapper shell scripts must remain Bash
3.2+compatible (macOS default bash). - Do not run
--helpfor skill scripts during normal flow; use the canonical commands documented above. - If a script fails, inspect stderr and rerun with corrected arguments instead of probing help output.
- Always run hash correctness checks before recommending query rewrites.
- Keep recommendations evidence-based (benchmark + explain + schema context).
- If evidence is weak or mixed, explicitly say so.
- Final report must include a
Research Evidencesection with version(s), web search queries, and source links used. - Final report must include a comparison table for baseline vs each candidate across tested windows.
- Final report must include the final recommended query text (or explicitly state no rewrite is recommended).
Output Format
Provide exactly these sections:
Input ConfirmationRendered Runnable SQLDependency & Schema FindingsResearch EvidenceBenchmark Results (with cache caveats)Hash Correctness CheckOptimization Comparison TableTop Optimizations (high impact only)Recommended Final QueryWhat Not To ChangeOpen Questions
Research Evidence requirements:
- Include detected ClickHouse version(s) used for analysis.
- Include the exact web search queries used.
- Include links to docs/changelog pages actually consulted.
- Include a short applicability note per key source (why it matters for this version/workload).
Hash Correctness Check requirements:
- Include which method was used:
format_hashorrowbinary_sha256_fallback. - For
rowbinary_sha256_fallback, report that comparison semantics are value-based (FORMAT RowBinary).
Optimization Comparison Table requirements:
- One row per candidate per window.
- Include baseline and candidate metrics side by side and percentage deltas.
- Minimum columns:
candidatewindow_labelhashes_matchbaseline_wall_time_s_mediancandidate_wall_time_s_mediandelta_wall_time_pctbaseline_read_bytes_mediancandidate_read_bytes_mediandelta_read_bytes_pctbaseline_peak_memory_usage_maxcandidate_peak_memory_usage_maxdelta_peak_memory_pctdecision(recommendedorrejected)
Script Notes
- Wrapper prep:
.claude/skills/optimize-model/scripts/run_prepare.sh - Wrapper benchmark:
.claude/skills/optimize-model/scripts/run_benchmark.sh - Wrapper candidate benchmark:
.claude/skills/optimize-model/scripts/run_candidate_benchmark.sh - Wrapper hash check:
.claude/skills/optimize-model/scripts/run_hash_check.sh - Underlying prep:
.claude/skills/optimize-model/scripts/prepare_model_analysis.py - Resolver:
.claude/skills/optimize-model/scripts/resolve_model_sql.py - Schema introspection:
.claude/skills/optimize-model/scripts/introspect_tables.py - Benchmarking:
.claude/skills/optimize-model/scripts/benchmark_query.py - Hash compare:
.claude/skills/optimize-model/scripts/compare_query_hash.py
If a script fails, surface exact error text and continue manually only if safety and correctness are preserved.