name: debugging-ci-failures
description: >
Debugs failing GitHub Actions CI runs for PostHog PRs, commits, and branches,
and answers broad CI-health questions ("is CI red?", "is master green today?",
"what's broken right now?"). Use when the user asks why CI is red, asks for the
current CI or master status, or mentions a failing check, GitHub Actions run,
Depot runner, workflow, job, shard, flaky test, lint failure, typecheck
failure, snapshot diff, migration check, generated types drift, or skills
build failure. Start with the hogli ci:insights digest (aggregated cross-run
CI intelligence), then guides read-only inspection, failure classification,
smallest local reproduction with hogli, and safe reporting without rerunning CI
or posting to GitHub.
Debugging PostHog CI failures
Find the first meaningful failure, classify it, reproduce the smallest useful case locally when appropriate, and report the result. Avoid public-visible or irreversible actions unless the user explicitly asks.
Always start with the hogli ci:insights digest. It is the institutional,
cross-run source of truth, and it is fresher and more reliable than scraping
gh run list / the GitHub Actions API, which can lag or rate-limit.
This skill triages and classifies. Once a failure is confirmed flaky, hand off
to the fixing-flaky-tests skill, which owns local reproduction, root-cause
fixing, and N-run validation.
Safety rules
Do not do any of these without explicit approval in the current conversation:
- Rerun or cancel a GitHub Actions run.
- Post a GitHub comment, PR review, or issue comment through any CLI, MCP, or API tool.
- Push commits, force-push, rename branches, or delete branches.
- Edit
.github/workflows/files (CI infra changes need human review). - Merge, close, or reopen the PR.
- Accept or update snapshots.
Read-only gh calls and read-only GitHub tools are fine. If you need to
change local Git state, make sure it is necessary for the task and does not
overwrite unrelated work.
Workflow
1. Start with CI insights (always first)
hogli ci:insights is the institutional CI-intelligence backend. It aggregates
cross-run history — recurring flakes, occurrence counts, confidence, and any
proposed or merged fix — that a single run can't show. Consult it before any raw
gh log archaeology; the raw GitHub Actions API can lag or rate-limit, while the
insights digest is the freshest aggregated view.
hogli ci:insights # digest for the current repo + branch
hogli ci:insights search "<error or test name>" # match a specific failure
hogli ci:insights view <id> # one insight + its remediation actions
hogli ci:insights plan <id> # print the recommended fix plan (does not apply it)
- Broad question ("is CI red?", "is master green today?", "what's broken right now?"): the no-arg digest answers directly — it lists open / in-progress / resolved counts and the most recent insights with severity and confidence. You often do not need a target PR or run at all; report from the digest.
- Specific failure: run
search "<error>"to match it before reading logs. When a matching insight exists, weigh its confidence and occurrence history, and note whether a fix is already merged (the failure may already be resolved onmaster) or proposed (a plan you can adapt).
hogli ci:insights prints a setup hint if the backend isn't installed or
authenticated — if so, fall back to the gh-based inspection below. Surface what
you find per the Safety rules — do not auto-apply a fix.
2. Find the failing run (for a specific failure)
Determine the target in this order:
- If the user gave a PR number, run ID, check name, or branch, use it.
- Otherwise, infer from the current branch with
gh pr view --json number,headRefName,statusCheckRollup. - If neither works, ask the user for a PR URL or run ID. Do not guess.
Inspect read-only:
gh pr checks <pr>
gh pr view <pr> --json statusCheckRollup
gh run view <run-id> --json jobs,conclusion,name,workflowName,url
gh run view <run-id> --log-failed
Use the full job log only when --log-failed lacks the failing command or
enough surrounding output:
gh run view <run-id> --log --job <job-id>
Extract these before classifying:
- Workflow name or file, e.g.
.github/workflows/ci-backend.yml. - Job name, e.g.
backend-tests (4/10). - Step name, e.g.
Run pytest. - Failing command and the smallest useful output excerpt.
When scanning logs, search for FAIL, Error, error:, assert,
Traceback, exit code, and ##[error]. Stop at the first failing step that
explains the run's conclusion. Keep excerpts under 40 lines.
Classification
| Signal in the log | Class | First action |
|---|---|---|
AssertionError, test diff, FAILED test_... in a committed test file |
code regression | reproduce with hogli test <path>::<test> |
Test failed here, passed on master or on rerun in the same PR |
flaky test | confirm against master history; to fix, use fixing-flaky-tests |
ruff, oxlint, stylelint, markdownlint, prettier errors |
lint | hogli lint:python:fix or hogli format on touched files |
mypy, pyright, tsc, typescript:check errors |
typecheck | run the same checker locally, not the full suite |
| Chromatic / Storybook / Playwright visual diff, snapshot mismatch | snapshot / visual | surface the diff URL; do NOT auto-accept snapshots |
manage.py migrate error, migrations:check failure, missing migration |
migration / schema | hogli migrations:check locally |
| OpenAPI schema diff, generated API types out of sync | codegen drift | hogli build:openapi |
Cannot connect, ECONNREFUSED, OOM, runner killed, setup step timeout |
infra / runner | treat as transient; report, do not fix |
apt-get, uv sync, pnpm install, docker pull, setup action failures |
environment / setup | diff .nvmrc, pyproject.toml, package.json, Dockerfiles |
hogli lint:skills, hogli build:skills failure |
skills build | run the same hogli command locally |
SDK compat check, ci-survey-sdk-check, cross-version failure |
SDK compatibility | check SDK version matrix for the affected package |
If multiple signals match, choose the most specific class. For example, prefer codegen drift over lint, migration over typecheck, and snapshot / visual over a generic Playwright test failure.
Local reproduction
Run only the narrowest command that exercises the failure. If the command shape
is unclear, read .agents/skills/hogli/SKILL.md and hogli <command> --help.
| Class | Repro guidance |
|---|---|
| code regression | hogli test path/to/test.py::TestClass::test_method or hogli test <file.test.ts> |
| flaky test | Hand off to the fixing-flaky-tests skill. |
| lint | Use the failing formatter/linter on touched files, e.g. hogli format:python. |
| typecheck | Run the failing checker, e.g. pnpm --filter=@posthog/frontend typescript:check. |
| snapshot / visual | Run the specific Playwright or Storybook workflow; read playwright-test if needed. |
| migration / schema | hogli migrations:check; run migrations only if the user agrees. |
| codegen drift | hogli build:openapi. |
| infra / runner | No local repro. Report and stop. |
| environment / setup | Reproduce the setup step only if cheap and relevant to changed files. |
| skills build | hogli lint:skills; if that passes, hogli build:skills. |
Do NOT run hogli test with no arguments. Do NOT run hogli nuke or
hogli dev:reset as a shortcut. Do NOT bypass hooks with --no-verify.
PostHog CI notes
- Most PostHog jobs run on
depot-ubuntu-latestordepot-ubuntu-latest-16. Depot runs surface logs through the GitHub Actions UI /gh run viewjust like standard GitHub-hosted runners. There is no separate Depot console that agents can query in this environment. - If a job fails before
Checkoutcompletes (no app code ran), classify asinfra / runner. Do not propose code fixes. - PostHog CI frequently parallelizes the same test class across N shards
(
backend-tests (3/10)style). Reproduce from the specific failing test path, not the shard index.
Report shape
Keep the response short. Include one likely-cause sentence and avoid deeper speculation.
Target: PR #<num> - run <run-id> (<workflow file>)
Failing job: <job name>
Failing step: <step name>
Command: <failing command>
Excerpt:
<up to 40 lines, trimmed around the failure>
Classification: <class from the table>
Shadow run: <yes | no>
Likely cause: <one sentence>
Local repro: <exact command, or "none">
Next action (needs your approval):
- <push fix | rerun job | update snapshot | none>
If the classification is infra / runner or a shadow run, say so and stop;
do not propose a code change.