name: bisect description: > Use this skill to find which commit introduced a bug or regression. Uses git bisect with automated build and test. Trigger when a bug appeared recently, a query started failing, performance regressed, or the user wants to compare behavior between two commits. argument-hint: "[good-commit] [bad-commit] [test-command-or-sql]" disable-model-invocation: true
Commit Comparison / Bisect Tool
Find which commit introduced a bug using git bisect with automated build+test. Compare query output across commits against DuckDB CPU baseline.
Reference: See .claude/skills/_shared/build-and-query.md for shared infrastructure (build modes, query execution, result comparison, change tracking).
Workflow
Mode 1: Automated Bisect (commit range)
Parse arguments:
$ARGUMENTS[0]= good commit (SHA, tag, or "N commits ago" e.g., "10 commits ago")$ARGUMENTS[1]= bad commit (default: HEAD)$ARGUMENTS[2]= test command or SQL query- If "N commits ago" syntax used, resolve:
git rev-parse HEAD~N
Pre-flight checks:
- Warn about uncommitted changes:
git status --porcelain - If dirty, ask user to stash or commit first
- Show commit range:
git log --oneline <good>..<bad> - Estimate bisect steps: approximately
log2(N)where N is number of commits
- Warn about uncommitted changes:
Establish CPU baseline (if test is a SQL query): Run the query via DuckDB CPU to get the expected correct result. Save to a temp file.
Create bisect test script at
/tmp/claude-1000/sirius_bisect_test.sh. Ask the user which build preset to use:release(fastest),relwithdebinfo(with debug symbols), orclang-debug(full debug):#!/bin/bash set -e # Build (look in Claude.md) # Run test <test_command>For SQL queries, the script also:
- Captures GPU output
- Compares against the saved CPU baseline
- Exits 0 if match (good), 1 if mismatch (bad), 125 if build fails (skip)
Execute automated bisect:
git bisect start <bad> <good> git bisect run /tmp/claude-1000/sirius_bisect_test.shShow progress updates during bisect (current step / total estimated steps).
Report the first bad commit:
- Show commit message, author, date
- Show the diff:
git show <bad-commit> - Analyze the changes and explain what likely caused the regression
Pipeline-level comparison (optional): Run
tools/parse_pipeline_log.pyon logs from the last good commit and first bad commit to compare per-operator row counts.Cleanup:
git bisect reset
Mode 2: Manual Comparison (two specific commits)
If the user provides just two specific commits (not a range for bisect):
- Checkout commit A, build, run query, capture output + logs
- Checkout commit B, build, run query, capture output + logs
- Diff both the query outputs and the logs, highlighting:
- Result differences (wrong values, missing/extra rows)
- Code path differences (different operators used, different pipeline stages)
- Performance differences (timing, memory usage from logs)
- Return to original branch:
git checkout <original-branch>
Key Design Decisions
- Exit code 125 skips commits that don't build (common in CUDA projects where intermediate commits may break)
- Support both SQL query tests and unit test invocations
- Warn about uncommitted changes before starting bisect (bisect changes HEAD)
- CPU baseline captured once before bisect starts, reused for all steps
- For SQL queries, results are sorted before comparison to handle ordering differences
Important Notes
git bisectchanges HEAD -- the user should not have uncommitted work- Each bisect step requires a full rebuild, which can be slow for large codebases
- If many commits don't build, bisect may take longer than expected due to skips
- The user can interrupt bisect at any time with
git bisect reset