name: retry_flaky_tests description: Automatically retry failed required CI checks on GitHub PRs, with smart waiting and filtering logic arguments:
- name: pr_number description: PR number to check. If not provided, will detect from current branch required: false
Retry Flaky Tests Skill
Automatically handles flaky CI tests by intelligently retrying only failed required checks on GitHub PRs. Saves time by avoiding manual monitoring and selective retrying.
Usage
/retry-flaky-tests [--pr_number 123] [--wait_for_completion true] [--max_retries 3] [--poll_interval 30]
Key Features
- Smart Filtering: Only retries required checks (ignores optional ones)
- Wait Logic: Can wait for in-progress runs to complete before retrying
- Batch Processing: Handles multiple failed checks in one command
- Status Awareness: Distinguishes between failed, in-progress, cancelled states
- Retry Limiting: Prevents infinite retry loops with max attempts
- Real-time Feedback: Shows progress and results throughout process
Process
Step 1: Identify Target PR
If --pr_number provided:
gh pr view {pr_number} --json number,headRefName,baseRefName
If not provided, detect from current branch:
# Get current branch
current_branch=$(git branch --show-current)
# Find PR for this branch
gh pr list --head "$current_branch" --json number,title,headRefName
Validation:
- Ensure PR exists and is open
- Confirm we have appropriate permissions
- Extract PR number, head branch, base branch for later use
Step 2: Get Current Check Status
# Get all check runs for the PR
gh pr checks {pr_number} --json name,status,conclusion,detailsUrl
# Get workflow runs with more detail
gh run list --pr {pr_number} --json databaseId,name,status,conclusion,workflowName
Parse results to categorize checks:
- ✅ Passed:
status: "completed",conclusion: "success" - ❌ Failed:
status: "completed",conclusion: "failure" - ⏸️ In Progress:
status: "in_progress"orstatus: "queued" - ⏭️ Skipped/Cancelled:
conclusion: "skipped"orconclusion: "cancelled"
Step 3: Identify Required vs Optional Checks
Get branch protection rules:
# Get protection rules for base branch (usually main/master)
gh api repos/{owner}/{repo}/branches/{base_branch}/protection \
--jq '.required_status_checks.contexts[]'
Cross-reference with current checks:
- Match check names from Step 2 with required contexts from protection rules
- Build list of required checks that failed
- Exclude optional checks from retry logic
Fallback if no protection rules:
# If branch protection API fails, use heuristics:
# - Checks with "required" in description
# - Common required check patterns: "build", "test", "lint", "security"
# - Exclude obvious optional ones: "codecov", "sonar", "deploy-preview"
Step 4: Wait Logic (if enabled)
If --wait_for_completion true:
while [[ $in_progress_count -gt 0 ]]; do
echo "⏳ Waiting for $in_progress_count checks to complete..."
sleep {poll_interval}
# Re-check status
gh pr checks {pr_number} --json name,status,conclusion
# Update in_progress_count
done
echo "✅ All checks completed. Proceeding with retry logic..."
Progress indicators:
- Show which checks are still running
- Display estimated time remaining (if available from API)
- Allow user to interrupt waiting with Ctrl+C
Step 5: Smart Retry Logic
For each failed required check:
# Get the specific workflow run ID
run_id=$(gh run list --pr {pr_number} --workflow "{workflow_name}" \
--json databaseId --jq '.[0].databaseId')
# Retry only the failed jobs (not successful ones)
gh run rerun $run_id --failed
echo "🔄 Retried {workflow_name} (run #$run_id)"
Retry strategy:
- Only retry checks that both:
- Are required for merge
- Have
conclusion: "failure"(not cancelled/skipped)
- Track retry attempts to respect
--max_retries - Wait between retries to avoid rate limiting
Step 6: Post-Retry Monitoring
After triggering retries:
echo "🚀 Triggered retries for $retry_count required checks:"
for check in "${retried_checks[@]}"; do
echo " - $check"
done
echo ""
echo "📊 Current status summary:"
echo " ✅ Passed: $passed_count"
echo " 🔄 Retrying: $retry_count"
echo " ⏸️ In Progress: $in_progress_count"
echo " ❌ Still Failed: $still_failed_count"
This skill transforms flaky CI from a time-sink into an automated background task! 🚀