name: fix-ci description: Check CI status, analyze test failures, auto-fix obvious issues or discuss with user argument-hint: "[PR number or URL]" allowed-tools: Bash, Read, Edit, Grep, Glob, Agent, AskUserQuestion
Analyze CI failures for PR $ARGUMENTS (or the current branch's PR if no argument given).
Step 1: Discover build status
1a. Find the PR
# If $ARGUMENTS is a number or URL, use it directly.
# Otherwise detect from current branch:
gh pr view --json number,title,headRefName --jq '{number, title, headRefName}'
1b. Quick test summary (cheapest check, 1 API call)
Parse the buildId from any check URL first (see Step 1c), then hit the test summary endpoint:
curl -s "https://dev.azure.com/{org}/{project_guid}/_apis/test/resultsummarybybuild?buildId={buildId}&api-version=7.0-preview"
No auth needed. Returns aggregate counts instantly (~1KB):
{
"aggregatedResultsAnalysis": {
"totalTests": 279518,
"resultsByOutcome": {
"Passed": {"count": 276837},
"Failed": {"count": 24},
"NotExecuted": {"count": 2657}
},
"runSummaryByOutcome": {
"Failed": {"runsCount": 14},
"Passed": {"runsCount": 23}
}
}
}
If Failed.count == 0 and all checks passed — report success, stop.
1c. Fetch all CI checks
Use gh pr checks text format (tab-separated) — it reliably includes Azure URLs unlike the JSON statusCheckRollup where detailsUrl can be null.
gh pr checks $PR 2>&1
Output format (tab-separated):
CheckName\tstatus\tduration\tURL
Parse and classify each check:
pass— succeededfail— failedpending— still running
Group checks by source:
- Azure Pipelines — URL contains
dev.azure.com. Extract buildId from first Azure URL. These are the test jobs. - GitHub Actions — URL contains
github.com/actions. These includebuild(compilation),gitleaks(secret scanning),Danger(PR linting). Failed GitHub Actions checks are relevant — report them. - Other —
CodeRabbit(no URL or review-only) — skip.
Extract buildId from the first Azure URL (all checks in one pipeline run share the same buildId).
1d. Report non-Azure failures
For each failed GitHub Actions check, fetch its details:
# Extract run_id and job_id from URL: https://github.com/{owner}/{repo}/actions/runs/{run_id}/job/{job_id}
gh run view {run_id} --json jobs --jq '.jobs[] | select(.conclusion == "failure") | {name, conclusion}'
Report these as Category E (non-test failures) in the final output. Common cases:
- Danger — PR convention issues (title format, description, labels). Show the comment:
gh pr view $PR --comments --jq '.comments[-1].body' - build — compilation failure. The job log contains the error.
- gitleaks — secret detected in diff.
1e. Triage
- All passed + summary shows 0 failures — report success, stop.
- Some failed — proceed to Step 2 with failed checks. Report summary: "24 test failures across 14 runs out of 279K tests."
- All pending, none failed — report "CI still running, no failures yet. Re-run this command later or use
/loop 2m /fix-ci $PRto monitor." - Mix of pending + failed — proceed with failed checks immediately. Don't wait for the full run.
- Only non-Azure checks failed (e.g., Danger, build) — report those directly, skip Steps 2-3.
Step 2: Get failed test names (no auth, no log download)
Use the Azure Test Results microservice at vstmr.dev.azure.com. This is a different hostname from the build APIs and serves test result data publicly for public projects.
curl -s "https://vstmr.dev.azure.com/questdb/questdb/_apis/testresults/resultsbybuild?buildId={buildId}&publishContext=CI&outcomes=Failed&\$top=200&api-version=5.2-preview.1"
No authentication or special headers needed. Returns all failed test results (~1-5KB):
[
{
"automatedTestName": "test[/sql/sample_by_fill.test]",
"automatedTestStorage": "io.questdb.test.sqllogictest.SqlTest",
"outcome": "Failed",
"runId": 795795,
"durationInMs": 1067.0,
"id": 100001,
"testCaseTitle": "test[/sql/sample_by_fill.test]"
}
]
Key fields:
automatedTestName/testCaseTitle— the test nameautomatedTestStorage— the test class (e.g.,io.questdb.test.sqllogictest.SqlTest)runId— which CI job run this failure came fromoutcome— always "Failed" given the filter
Deduplicate by test name — the same test fails across multiple platforms (mac, windows, linux). Group by automatedTestName, collect runIds to know which platforms failed.
If this returns 0 results but Step 1b showed failures, fall back to Step 2b (log tail parsing). This can happen if the pipeline doesn't publish JUnit test results.
2a (optional). Enrich with error messages via PAT
The vstmr endpoint returns test names but NOT errorMessage, stackTrace, or computerName. If AZURE_DEVOPS_PAT is set, enrich each failed test with full details.
Step 2 gives us runId per failed test. Use the authenticated test/runs/{runId}/results endpoint to get error details:
# For each unique runId from Step 2:
curl -s -u ":$AZURE_DEVOPS_PAT" \
-H "X-TFS-FedAuthRedirect: Suppress" \
"https://dev.azure.com/questdb/questdb/_apis/test/runs/{runId}/results?outcomes=Failed&api-version=7.0"
This returns full details per failed test:
errorMessage— the assertion failure or exception messagestackTrace— full Java stack traceautomatedTestName,automatedTestStorage— test identityfailingSince— when this test started failingfailureType— type of failure
With this data, skip directly to Step 4 (classification).
Note: test/runs?buildId=... (listing runs by build) requires Build: Read scope and returns 403 with Test Management: Read alone. But test/runs/{runId}/results works with just Test Management: Read — and we already have runIds from the unauthenticated Step 2.
If no PAT is set, suggest the user create one for richer failure data:
To get error messages and stack traces without downloading logs, set
AZURE_DEVOPS_PAT:
- Go to https://dev.azure.com/questdb/_usersSettings/tokens
- Click "New Token"
- Set scope: Test Management → Read (the only scope needed)
- Add to
~/.zshenv:export AZURE_DEVOPS_PAT=<token>Without it, I can still see which tests failed but need to download log tails for error details.
Only suggest this once per session, and only if log tail parsing is actually needed (i.e., the test names alone aren't enough for classification).
2b. Fallback: log tail parsing (no auth)
Use this when Step 2 returns 0 results or when error messages are needed and no PAT is available.
Parse Azure URLs
Extract org, project GUID, buildId, jobId from each failed check's URL:
https://dev.azure.com/{org}/{project_guid}/_build/results?buildId={buildId}&view=logs&jobId={jobId}
All failed checks share the same buildId. Deduplicate: fetch the timeline only once per buildId.
2c. Fetch timeline
curl -s "https://dev.azure.com/{org}/{project_guid}/_apis/build/builds/{buildId}/timeline?api-version=7.0"
No authentication needed (public project).
Parse the JSON response. Records form a tree: Stage -> Job -> Task. For each failed check's jobId:
- Find all records where
type == "Task"ANDparentId == jobIdANDresult == "failed" - Extract
nameandlog.idfrom each failed task
2d. Classify failed steps
- "Run tests" or "Run tests with Coverage" — test failure, proceed to log analysis
- "Compile with Maven" — compilation error, report to user directly
- Other steps — infrastructure failure (checkout, install, upload), report as-is
2e. Get log line counts
curl -s "https://dev.azure.com/{org}/{project_guid}/_apis/build/builds/{buildId}/logs?api-version=7.0"
Response: {"value": [{"id": N, "lineCount": M, ...}, ...]}. Extract lineCount for each failed step's logId.
Step 3: Fetch and parse error summaries
3a. Download tail of each failed test log
Maven Surefire writes the error summary at the END of its output. Use the line-range API to fetch only the last ~500 lines:
curl -s "https://dev.azure.com/{org}/{project_guid}/_apis/build/builds/{buildId}/logs/{logId}?api-version=7.0&startLine={lineCount - 500}&endLine={lineCount}"
This fetches ~50KB instead of ~200MB. Save to a temp file for parsing.
3b. Parse the Surefire summary
Look for these patterns in the downloaded tail:
Test error summary (exceptions during test execution):
[ERROR] Errors:
[ERROR] ClassName.testMethod:lineNum->...chain... >> ExceptionType message
Test failure summary (assertion mismatches):
[ERROR] Failures:
[ERROR] ClassName.testMethod:lineNum expected:<X> but was:<Y>
Totals line:
[ERROR] Tests run: N, Failures: M, Errors: K, Skipped: L
Compilation error (different pattern entirely):
[ERROR] COMPILATION ERROR
[ERROR] /path/to/File.java:[line,col] error: ...
For each failed test, extract:
- Fully-qualified class name (e.g.,
SqlParserTest,WindowFunctionTest) - Test method name
- Error type: assertion mismatch vs exception
- Error message / exception chain
- Line number in test source (from the
:lineNumin the chain)
3c. Deduplicate across jobs
The same test might fail on multiple platforms (mac-griffin, windows-griffin, linux-griffin). Group failures by ClassName.testMethod — if the error message is the same across platforms, it's one logical failure. Note which platforms are affected.
Step 4: Classify failures
4a. Get PR diff
gh pr diff $PR
Parse to understand:
- Which source files changed (production code vs test code)
- What functions/methods were modified
- What behavior changes the PR introduces
4b. Cross-reference each failure group with the PR diff
For each failed test group, determine the category:
Category A — Auto-fixable (ALL must hold):
- The failure is an assertion mismatch (
Failures:section, notErrors:section) - The PR modifies production code in the area the test covers (same package, same class, related function)
- The PR already updated similar test assertions in the same or other test files (pattern exists in the diff)
- The fix is mechanical: swap the expected value to match actual output
- Guard-removal gate: if a test changed from "expected to fail" to "now succeeds" (e.g., error test that no longer errors), check the PR diff for removed guards (methods like
guardAgainst...,throw SqlExceptionblocks, early-return checks). If the PR removed a guard WITHOUT replacing the underlying functionality, the test likely exposes an unhandled code path — escalate to Category B/D. Only auto-fix if the PR replaced the guarded code with a new implementation that handles the case.
The heuristic: "Did the PR replace the functionality, or just remove the gate?" Replacing → auto-fix. Removing gate without replacement → discuss.
Category B — Behavior precision:
- The failure connects to PR changes (related classes/packages)
- But the PR did NOT already update similar tests, so intent is unclear
- Or the failure is an exception (not just a different value) that might indicate the test expected the old behavior to continue
- Or a guard was removed and the test now succeeds where it previously failed — need to verify the old code path is actually handled
Category C — Potential regression:
- The failing test is in a package/class NOT touched by the PR
- No clear connection between the test's subject and the PR's changes
Category D — Potential incompleteness:
- The test covers an edge case (NULL input, empty table, boundary value, special characters)
- The PR introduced new logic but the test suggests it doesn't handle this case
- Often: test expected a query to succeed, but new code throws an exception for this input
Category E — Non-test failure:
- Compilation error
- Infrastructure issue (timeout, OOM, disk full, network error)
- Flaky test (known flaky pattern, random ordering issue)
Step 5: Act
For Category A (auto-fix):
Find the test source file:
- Convert class name to path:
io.questdb.test.griffin.FooTest-> search incore/src/test/java/ - Use Glob:
**/FooTest.java
- Convert class name to path:
Get the expected vs actual values:
- If the Surefire summary contains the full assertion diff (common with
assertEquals): use directly - If the summary only has an exception message (common with
assertQueryNoLeakCheck): need to search the full log - To search: use line-range chunks. First, find the
<<< FAILURE!line for this test method:# Search in chunks of 50K lines from the end, looking for the test method name + FAILURE curl -s "...logs/{logId}?startLine={lineCount-50000}&endLine={lineCount}" -o /tmp/ci-chunk.txt grep -n "testMethodName" /tmp/ci-chunk.txt | grep -i "FAILURE\|ERROR\|expected\|but had" - Read +-100 lines around the match to get the full assertion diff
- If the Surefire summary contains the full assertion diff (common with
Read the test method in the source file. Find the assertion call and its expected value.
Update the expected value to match the actual output. Use the Edit tool.
Report what was changed:
Auto-fixed: FooTest#testBar - Updated expected output: [brief description of what changed] - Reason: PR changed [behavior X], test expected old output - Platforms affected: mac-griffin, windows-griffin, linux-griffin
For Categories B-E (discuss with user):
Present a structured report. Group by category, within each category group by similarity.
## CI Failures: PR #NNN — [PR title]
Analyzed N failed jobs across M platforms.
### Auto-fixed (if any)
- `FooTest#testBar`: updated expected output — [description]
### Needs Discussion
#### Potential Regression (Category C)
Tests in areas NOT touched by this PR:
- `BarTest#testQux`: NullPointerException at SomeClass.java:42
Platforms: linux-griffin, mac-griffin
[Stack trace summary]
#### Potential Incompleteness (Category D)
New logic may not handle these cases:
- `WindowFunctionTest#testWindowAsArg`: SqlException "Window function is not allowed in context of aggregation"
The PR added [feature X] but these tests show queries that combine window functions with aggregation.
Platforms: all
#### Behavior Precision (Category B)
Connected to PR changes but need review:
- `SqlParserTest#testWindowFuncOrder`: expected query model differs from actual
The PR changed [parser behavior X]; this test may need updating or may reveal unintended side effect.
#### Non-test Failures (Category E)
- Job `windows-cairo-2`: "Compile with Maven" step failed — compilation error in FooBar.java:123
After presenting, ask the user how to proceed with each group.
Azure API Reference
Test Results (vstmr.dev.azure.com) — no auth, no special headers
The test results microservice lives on a separate hostname. No authentication or special headers needed.
| Endpoint | Returns |
|---|---|
.../resultsbybuild?buildId={id}&publishContext=CI&outcomes=Failed&$top=200&api-version=5.2-preview.1 |
Array of failed test results: automatedTestName, automatedTestStorage, outcome, runId, durationInMs |
.../resultdetailsbybuild?buildId={id}&publishContext=CI&groupBy=TestRun&$filter=Outcome eq Failed&shouldIncludeResults=true&queryRunSummaryForInProgress=false&api-version=5.2-preview.1 |
Failed results grouped by test run, with counts per outcome |
Base: https://vstmr.dev.azure.com/questdb/questdb/_apis/testresults
Test Result Details (dev.azure.com) — PAT with Test Management: Read
| Endpoint | Returns |
|---|---|
https://dev.azure.com/questdb/questdb/_apis/test/runs/{runId}/results?outcomes=Failed&api-version=7.0 |
Full details: automatedTestName, errorMessage, stackTrace, failingSince, failureType |
Auth: curl -u ":$AZURE_DEVOPS_PAT". Note: test/runs?buildId=... (listing runs) needs Build: Read scope, but test/runs/{runId}/results works with just Test Management: Read since we get runIds from the unauthenticated vstmr call.
Test Summary (dev.azure.com) — no auth
| Endpoint | Returns |
|---|---|
https://dev.azure.com/questdb/{project_guid}/_apis/test/resultsummarybybuild?buildId={id}&api-version=7.0-preview |
Aggregate counts: total, passed, failed, not executed |
Build APIs (dev.azure.com) — no auth
Base: https://dev.azure.com/questdb/{project_guid}/_apis/build
The project GUID is embedded in check URLs. Parse it from there rather than hardcoding.
| Endpoint | Returns |
|---|---|
/builds/{buildId}/timeline?api-version=7.0 |
{records: [{id, parentId, type, name, result, state, log: {id}, order}]} |
/builds/{buildId}/logs?api-version=7.0 |
{value: [{id, lineCount, createdOn}]} |
/builds/{buildId}/logs/{logId}?api-version=7.0&startLine=N&endLine=M |
Plain text, lines N through M |
Timeline record types
Stage— pipeline stage (parent of Jobs)Job— a CI job (parent of Tasks), maps to a GitHub checkTask— a step within a job, haslog.idfor log download
Status mapping
result="succeeded"→ successresult="failed"→ failureresult="skipped"→ skippedresult="canceled"or"cancelled"→ cancelledstate="completed"with no result → success- Otherwise → pending