name: compare-test-runs description: Compares two test runs to identify new failures, newly flaky tests, fixed tests, and duration regressions. Can be invoked with test run IDs, dashboard URLs, or branch names.
Compare Test Runs
Quick Start
You'll typically receive two test run identifiers. Follow these steps:
- Run
tuist test show <id> --jsonfor both base and head test runs. - Run
tuist test module list <test-run-id> --jsonandtuist test suite list <test-run-id> --jsonto get module and suite breakdowns. - Run
tuist test case run list <identifier> --jsonto get individual test case results. - Compare failures, flaky tests, durations, and overall status.
- Inspect failing test cases with
tuist test case run show <id> --json. - Summarize findings with actionable recommendations.
Step 1: Resolve Test Runs
If base/head are test run IDs or dashboard URLs
Fetch each directly:
tuist test show <base-id> --json
tuist test show <head-id> --json
If base/head are branch names
List recent test runs on each branch to identify test run IDs:
tuist test list --git-branch <base-branch> --json --page-size 5
tuist test list --git-branch <head-branch> --json --page-size 5
Pick the latest test run ID from each branch's results.
Defaults
- If no base is provided, use the project's default branch (usually
main). - If no head is provided, detect the current git branch.
Step 2: Compare Top-Level Metrics
After fetching both test runs, compare:
| Metric | What to check |
|---|---|
status |
Flag if base passed but head failed |
duration |
Flag if head is >10% slower |
total_test_count |
Note if test count changed (new or removed tests) |
failed_test_count |
Compare failure counts |
flaky_test_count |
Compare flaky counts |
avg_test_duration |
Flag significant changes |
Step 3: Get Module and Suite Breakdowns
Fetch module and suite-level results for both test runs to understand which areas regressed:
tuist test module list <base-test-run-id> --json
tuist test module list <head-test-run-id> --json
tuist test suite list <base-test-run-id> --json
tuist test suite list <head-test-run-id> --json
Match modules and suites by name across both runs to identify areas with new failures or duration regressions.
Step 4: Get Individual Test Case Results
Fetch test case runs for both test runs:
tuist test case run list <identifier> --json --page-size 100
Match test cases by their name + module_name + suite_name across both runs.
Step 5: Classify Changes
Group test cases into categories:
- New failures: Tests that passed in base but failed in head.
- Fixed tests: Tests that failed in base but passed in head.
- Newly flaky: Tests not flaky in base but flaky in head.
- No longer flaky: Tests that were flaky in base but stable in head.
- New tests: Tests present in head but not in base.
- Removed tests: Tests present in base but not in head.
- Duration regressions: Tests with >50% duration increase.
Step 6: Inspect Failures
For each new failure, get detailed information:
tuist test case run show <test-case-run-id> --json
Key fields to examine:
failures[].message-- the assertion or error messagefailures[].path-- source file pathfailures[].line_number-- exact line of failurefailures[].issue_type-- type of issuerepetitions-- if present, shows retry behavior (flaky detection)crash_report-- crash data if test runner crashed
Step 7: Inspect Attachments
The tuist test case run show output includes attachment and crash report information. Review:
- Screenshots or UI test artifacts
- Log files or crash reports
- Any diagnostic data attached to failing runs
Summary Format
Produce a summary with:
- Overall verdict: Better, worse, or neutral compared to base.
- New failures: List each with failure message, file path, and line number.
- New flaky tests: List with flakiness context.
- Fixed tests: List tests that are now passing.
- Duration: Overall and notable per-test regressions.
- Recommendations: Actionable next steps for each issue.
Example:
Test Run Comparison: base (run-123 on main) vs head (run-456 on feature-x)
Status: success -> failure -- REGRESSION
Duration: 120.5s -> 145.2s (+21%)
Tests: 342 -> 345 (3 new tests)
Failures: 0 -> 2 (2 new failures)
Flaky: 1 -> 3 (2 newly flaky)
New Failures:
1. AuthModuleTests/LoginTests/test_login_with_expired_token
Message: "Expected status 401, got 500"
File: Tests/AuthModuleTests/LoginTests.swift:42
Likely cause: Server error handling changed for expired tokens
2. NetworkTests/RetryTests/test_retry_on_timeout
Message: "Timed out waiting for retry"
File: Tests/NetworkTests/RetryTests.swift:87
Likely cause: Timeout threshold too low after network layer refactor
Newly Flaky:
1. CacheTests/WriteCacheTests/test_concurrent_writes (flaky in 3/5 runs)
Recommendations:
- Fix expired token handling in AuthModule
- Increase timeout in RetryTests or mock the network layer
- Investigate concurrent write synchronization in CacheTests
Done Checklist
- Resolved both base and head test runs
- Compared top-level metrics
- Fetched module and suite breakdowns for both runs
- Identified new failures, fixed tests, and flaky changes
- Inspected failure details for new failures
- Provided actionable recommendations with file paths