optimize-model - SKILL.md Agent Skill

name: optimize-model description: Deeply optimize a Xatu CBT transformation model query using live ClickHouse evidence. Use when the user wants performance analysis (not code edits) for `models/transformations/*.sql`, including dependency rendering, schema introspection, benchmark runs, and high-impact recommendations. compatibility: Requires Bash 3.2+, Python 3, jq, HTTP access to ClickHouse on port 8123, and internet access for web research. argument-hint: ""

Optimize CBT Transformation Model (No Model Edits)

Analyze transformation model $ARGUMENTS and report high-impact ClickHouse performance improvements.

Never modify the model file. Only report findings and recommendations.

Defaults To Confirm First

Before any analysis, collect and confirm (allow overrides):

Credentials and endpoint/database overrides are auto-loaded from the project .env by the run wrappers (EXTERNAL_USER/EXTERNAL_PASS for the raw cluster, TRANSFORM_USER/TRANSFORM_PASS for the refined cluster). Already-exported shell variables take precedence over .env. Note these are the PRODUCTION cluster credentials, separate from the local docker CLICKHOUSE_USERNAME/CLICKHOUSE_PASSWORD.

External models cluster:
- endpoint: http://chendpoint-clickhouse-raw.analytics.production.ethpandaops:8123
- default_database: default
- username: EXTERNAL_USER (from .env, default empty)
- password: EXTERNAL_PASS (from .env, default empty)
Transformation models cluster:
- endpoint: http://chendpoint-clickhouse-refined.analytics.production.ethpandaops:8123
- default_database: mainnet
- username: TRANSFORM_USER (from .env, default empty)
- password: TRANSFORM_PASS (from .env, default empty)
- access_to_external_cluster: cluster('{raw}', database.table_name)
- Cluster substitution rule: when template uses cluster(...), external dependencies are resolved to <table>_local automatically.

If any value is missing or ambiguous, ask follow-up questions before running benchmarks.

Credential preflight (gate before any cluster query)

The raw and refined clusters require authentication, so empty credentials will fail with AUTHENTICATION_FAILED rather than run anonymously. Before the first run_prepare.sh/benchmark call, resolve the credentials exactly as the wrappers do and check whether they are populated:

bash -c '. .claude/skills/optimize-model/scripts/_load_env.sh; \
  echo "EXTERNAL=${EXTERNAL_USER:+set} TRANSFORM=${TRANSFORM_USER:+set}"'

If the credentials for a cluster you are about to query are empty, STOP and elicit a choice from the user with AskUserQuestion (do not silently run and let it fail):

Add credentials — ask the user to populate EXTERNAL_USER/EXTERNAL_PASS and/or TRANSFORM_USER/TRANSFORM_PASS in .env (or export them), then re-resolve and continue.
Continue unauthenticated — proceed anyway. Only valid if the target cluster permits anonymous access; otherwise expect the run to fail with AUTHENTICATION_FAILED.

Only run the benchmark scripts once the user has chosen. Skip this gate (no prompt) when the needed credentials are already present.

Inputs

Target transformation model (example): models/transformations/fct_block.sql
Optional benchmark window overrides.
Optional period mode:
- sane (default): use frontmatter interval with protective caps to avoid timeouts.
- daring: expand beyond sane range for deeper testing.

Workflow

Resolve target model path

Accept full path or model name.
Normalize to models/transformations/<name>.sql.
Stop with a clear error if not found.

Run one-shot preparation via wrapper (no flag discovery)

Run:

SESSION_ID="$(date +%s)-$$"
PREP_OUTPUT="/tmp/optimize-model.${SESSION_ID}.prepare.json"

SESSION_ID="$SESSION_ID" PREP_OUTPUT="$PREP_OUTPUT" \
  .claude/skills/optimize-model/scripts/run_prepare.sh "$ARGUMENTS"

This command generates session-isolated /tmp artifacts. Artifact names also include the bounds window, so multi-window runs (one prep per window) never overwrite each other's rendered SQL or bench/hash outputs; downstream wrappers resolve paths via the prep manifest (paths, artifact_prefix).
Use summary, period, unresolved_fragments, and summary.introspection_ok from $PREP_OUTPUT.
Confirm period.bounds_start/period.bounds_end are sane before benchmarking.
If unresolved fragments remain, ask targeted follow-up questions and rerun.

Introspect dependency schemas

This is already executed by prep wrapper.
Always use SHOW CREATE TABLE for dependencies.
If a dependency is Distributed, also inspect its target table (often _local) and reason from local engine/order/partition keys.
Treat _local table ORDER BY as the effective primary key when PRIMARY KEY is not explicitly defined.
For any cluster(...) table access, always target _local tables instead of Distributed table names.

Early version-aware research pass (recommended before benchmarking)

Read ClickHouse versions from prep output first:
- jq '.clickhouse_versions' "$PREP_OUTPUT"
Use web search (explicitly) to review docs and changelog/release notes for the detected version before designing candidate rewrites.
Prefer doing this before baseline/candidate benchmarking; if a quick baseline run already happened, continue and complete research before final recommendations.
Use official ClickHouse docs first, then release/changelog pages relevant to the detected version and newer versions.
Include version-aware search terms such as:
- clickhouse <major.minor> release notes
- clickhouse <major.minor> query optimization
- clickhouse replacingmergetree final performance
Use references:
- references/clickhouse-playbook.md
- references/dependency-resolution.md
- references/research-sources.md

Benchmark rendered query (baseline, fast sample first)

Run via wrapper:

SESSION_ID="$SESSION_ID" \
  .claude/skills/optimize-model/scripts/run_benchmark.sh "$PREP_OUTPUT"

Wrapper output includes BENCH_OUTPUT=... path.
Do not manually widen bounds unless user opts into PERIOD_MODE=daring or provides explicit bounds.
The benchmark must disable query caches where supported.
Note cache caveats (filesystem/page cache may still influence timings).
Use explain output and query_log metrics for evidence.

Validate across multiple windows (required before recommendation)

Use at least 2 windows:
- sane window from frontmatter interval (default).
- One additional window (daring or explicit custom bounds).
Prefer 3 windows when runtime is acceptable (short/sane/daring) to reduce overfitting.
For each window:
- Produce window-specific baseline SQL via prep.
- Benchmark baseline.
- Benchmark each candidate rewrite derived from that same window-specific SQL via:

SESSION_ID="$SESSION_ID" WINDOW_LABEL="sane" CANDIDATE_LABEL="argmax" \
  .claude/skills/optimize-model/scripts/run_candidate_benchmark.sh "$PREP_OUTPUT" "/tmp/candidate.sql"

Run hash correctness checks baseline vs candidate for that same window.
It is acceptable to keep candidates with negative impact in the comparison table; mark them as rejected.

Hash correctness gate (required for every rewritten query)

For each candidate optimization SQL file, run:

SESSION_ID="$SESSION_ID" \
  .claude/skills/optimize-model/scripts/run_hash_check.sh "$PREP_OUTPUT" "/tmp/candidate.sql"

Only treat a candidate as valid when hashes_match = true.
If hash check fails, reject that candidate regardless of speed gains.
Wrapper exit semantics:
- Hash mismatches are reported via HASH_STATUS=mismatch and HASH_MATCH=false (non-fatal by default).
- Set HASH_MISMATCH_FAIL=1 to make mismatches return exit code 3.
- Execution/runtime failures still return non-zero and HASH_STATUS=error.
Hash check script behavior:
- Uses a version gate first:
  - ClickHouse >= 25.8: tries FORMAT Hash.
  - ClickHouse < 25.8: skips FORMAT Hash and directly uses exact byte-level comparison.
- If FORMAT Hash is attempted but returns UNKNOWN_FORMAT, automatically falls back to exact byte-level comparison using:
  - ORDER BY tuple(*)
  - FORMAT RowBinary (value-based fallback)
  - SHA256 over the response stream
If both methods fail, report that explicitly and do not claim exact-result equivalence.

Research-backed optimization pass

Use findings from the early research pass to prioritize only high-impact candidates.
Candidate rewrites are allowed in temporary SQL files (for benchmarking/hash checks), including moving/combining/removing CTEs when readability and performance both improve.

Recommendation filter

Recommend only changes likely to produce meaningful gains.
Do not trade readability for tiny gains.
Explicitly evaluate memory impact.
For ReplacingMergeTree tables, test whether FINAL can be removed and replaced with dedup logic such as argMax(column, update_date_time) when update_date_time is the version column.
Do not treat “no duplicates right now” as sufficient evidence to remove FINAL; future duplicates are expected in ReplacingMergeTree workflows.
Only recommend the argMax(..., update_date_time) pattern when benchmark evidence and result checks show correctness is preserved.
Always flag sparse high-cardinality key join risks (address/username-like keys) when relevant.
A candidate is recommended only if gains hold across multiple windows versus original baseline.

Strict Rules

Never edit model SQL files.
Temporary candidate SQL rewrites in /tmp are allowed for experiments and benchmarking.
Never propose materialized views or projections for this workflow.
Always use HTTP on port 8123 for ClickHouse access (curl style / HTTP API).
Never use clickhouse-client or any native-port (9000) workflow in this skill.
Wrapper shell scripts must remain Bash 3.2+ compatible (macOS default bash).
Do not run --help for skill scripts during normal flow; use the canonical commands documented above.
If a script fails, inspect stderr and rerun with corrected arguments instead of probing help output.
Always run hash correctness checks before recommending query rewrites.
Keep recommendations evidence-based (benchmark + explain + schema context).
If evidence is weak or mixed, explicitly say so.
Final report must include a Research Evidence section with version(s), web search queries, and source links used.
Final report must include a comparison table for baseline vs each candidate across tested windows.
Final report must include the final recommended query text (or explicitly state no rewrite is recommended).

Output Format

Provide exactly these sections:

Input Confirmation
Rendered Runnable SQL
Dependency & Schema Findings
Research Evidence
Benchmark Results (with cache caveats)
Hash Correctness Check
Optimization Comparison Table
Top Optimizations (high impact only)
Recommended Final Query
What Not To Change
Open Questions

Research Evidence requirements:

Include detected ClickHouse version(s) used for analysis.
Include the exact web search queries used.
Include links to docs/changelog pages actually consulted.
Include a short applicability note per key source (why it matters for this version/workload).

Hash Correctness Check requirements:

Include which method was used: format_hash or rowbinary_sha256_fallback.
For rowbinary_sha256_fallback, report that comparison semantics are value-based (FORMAT RowBinary).

Optimization Comparison Table requirements:

One row per candidate per window.
Include baseline and candidate metrics side by side and percentage deltas.
Minimum columns:
- candidate
- window_label
- hashes_match
- baseline_wall_time_s_median
- candidate_wall_time_s_median
- delta_wall_time_pct
- baseline_read_bytes_median
- candidate_read_bytes_median
- delta_read_bytes_pct
- baseline_peak_memory_usage_max
- candidate_peak_memory_usage_max
- delta_peak_memory_pct
- decision (recommended or rejected)

Script Notes

Wrapper prep: .claude/skills/optimize-model/scripts/run_prepare.sh
Wrapper benchmark: .claude/skills/optimize-model/scripts/run_benchmark.sh
Wrapper candidate benchmark: .claude/skills/optimize-model/scripts/run_candidate_benchmark.sh
Wrapper hash check: .claude/skills/optimize-model/scripts/run_hash_check.sh
Underlying prep: .claude/skills/optimize-model/scripts/prepare_model_analysis.py
Resolver: .claude/skills/optimize-model/scripts/resolve_model_sql.py
Schema introspection: .claude/skills/optimize-model/scripts/introspect_tables.py
Benchmarking: .claude/skills/optimize-model/scripts/benchmark_query.py
Hash compare: .claude/skills/optimize-model/scripts/compare_query_hash.py

If a script fails, surface exact error text and continue manually only if safety and correctness are preserved.