debugging-ci-failures

star 34.9k

Debugs failing GitHub Actions CI runs for PostHog PRs, commits, and branches, and answers broad CI-health questions ("is CI red?", "is master green today?", "what's broken right now?"). Use when the user asks why CI is red, asks for the current CI or master status, or mentions a failing check, GitHub Actions run, Depot runner, workflow, job, shard, flaky test, lint failure, typecheck failure, snapshot diff, migration check, generated types drift, or skills build failure. Start with the `hogli ci:insights` digest (aggregated cross-run CI intelligence), then guides read-only inspection, failure classification, smallest local reproduction with hogli, and safe reporting without rerunning CI or posting to GitHub.

PostHog By PostHog schedule Updated 6/2/2026

name: debugging-ci-failures description: > Debugs failing GitHub Actions CI runs for PostHog PRs, commits, and branches, and answers broad CI-health questions ("is CI red?", "is master green today?", "what's broken right now?"). Use when the user asks why CI is red, asks for the current CI or master status, or mentions a failing check, GitHub Actions run, Depot runner, workflow, job, shard, flaky test, lint failure, typecheck failure, snapshot diff, migration check, generated types drift, or skills build failure. Start with the hogli ci:insights digest (aggregated cross-run CI intelligence), then guides read-only inspection, failure classification, smallest local reproduction with hogli, and safe reporting without rerunning CI or posting to GitHub.

Debugging PostHog CI failures

Find the first meaningful failure, classify it, reproduce the smallest useful case locally when appropriate, and report the result. Avoid public-visible or irreversible actions unless the user explicitly asks.

Always start with the hogli ci:insights digest. It is the institutional, cross-run source of truth, and it is fresher and more reliable than scraping gh run list / the GitHub Actions API, which can lag or rate-limit.

This skill triages and classifies. Once a failure is confirmed flaky, hand off to the fixing-flaky-tests skill, which owns local reproduction, root-cause fixing, and N-run validation.

Safety rules

Do not do any of these without explicit approval in the current conversation:

  • Rerun or cancel a GitHub Actions run.
  • Post a GitHub comment, PR review, or issue comment through any CLI, MCP, or API tool.
  • Push commits, force-push, rename branches, or delete branches.
  • Edit .github/workflows/ files (CI infra changes need human review).
  • Merge, close, or reopen the PR.
  • Accept or update snapshots.

Read-only gh calls and read-only GitHub tools are fine. If you need to change local Git state, make sure it is necessary for the task and does not overwrite unrelated work.

Workflow

1. Start with CI insights (always first)

hogli ci:insights is the institutional CI-intelligence backend. It aggregates cross-run history — recurring flakes, occurrence counts, confidence, and any proposed or merged fix — that a single run can't show. Consult it before any raw gh log archaeology; the raw GitHub Actions API can lag or rate-limit, while the insights digest is the freshest aggregated view.

hogli ci:insights                                # digest for the current repo + branch
hogli ci:insights search "<error or test name>"  # match a specific failure
hogli ci:insights view <id>                       # one insight + its remediation actions
hogli ci:insights plan <id>                       # print the recommended fix plan (does not apply it)
  • Broad question ("is CI red?", "is master green today?", "what's broken right now?"): the no-arg digest answers directly — it lists open / in-progress / resolved counts and the most recent insights with severity and confidence. You often do not need a target PR or run at all; report from the digest.
  • Specific failure: run search "<error>" to match it before reading logs. When a matching insight exists, weigh its confidence and occurrence history, and note whether a fix is already merged (the failure may already be resolved on master) or proposed (a plan you can adapt).

hogli ci:insights prints a setup hint if the backend isn't installed or authenticated — if so, fall back to the gh-based inspection below. Surface what you find per the Safety rules — do not auto-apply a fix.

2. Find the failing run (for a specific failure)

Determine the target in this order:

  1. If the user gave a PR number, run ID, check name, or branch, use it.
  2. Otherwise, infer from the current branch with gh pr view --json number,headRefName,statusCheckRollup.
  3. If neither works, ask the user for a PR URL or run ID. Do not guess.

Inspect read-only:

gh pr checks <pr>
gh pr view <pr> --json statusCheckRollup
gh run view <run-id> --json jobs,conclusion,name,workflowName,url
gh run view <run-id> --log-failed

Use the full job log only when --log-failed lacks the failing command or enough surrounding output:

gh run view <run-id> --log --job <job-id>

Extract these before classifying:

  • Workflow name or file, e.g. .github/workflows/ci-backend.yml.
  • Job name, e.g. backend-tests (4/10).
  • Step name, e.g. Run pytest.
  • Failing command and the smallest useful output excerpt.

When scanning logs, search for FAIL, Error, error:, assert, Traceback, exit code, and ##[error]. Stop at the first failing step that explains the run's conclusion. Keep excerpts under 40 lines.

Classification

Signal in the log Class First action
AssertionError, test diff, FAILED test_... in a committed test file code regression reproduce with hogli test <path>::<test>
Test failed here, passed on master or on rerun in the same PR flaky test confirm against master history; to fix, use fixing-flaky-tests
ruff, oxlint, stylelint, markdownlint, prettier errors lint hogli lint:python:fix or hogli format on touched files
mypy, pyright, tsc, typescript:check errors typecheck run the same checker locally, not the full suite
Chromatic / Storybook / Playwright visual diff, snapshot mismatch snapshot / visual surface the diff URL; do NOT auto-accept snapshots
manage.py migrate error, migrations:check failure, missing migration migration / schema hogli migrations:check locally
OpenAPI schema diff, generated API types out of sync codegen drift hogli build:openapi
Cannot connect, ECONNREFUSED, OOM, runner killed, setup step timeout infra / runner treat as transient; report, do not fix
apt-get, uv sync, pnpm install, docker pull, setup action failures environment / setup diff .nvmrc, pyproject.toml, package.json, Dockerfiles
hogli lint:skills, hogli build:skills failure skills build run the same hogli command locally
SDK compat check, ci-survey-sdk-check, cross-version failure SDK compatibility check SDK version matrix for the affected package

If multiple signals match, choose the most specific class. For example, prefer codegen drift over lint, migration over typecheck, and snapshot / visual over a generic Playwright test failure.

Local reproduction

Run only the narrowest command that exercises the failure. If the command shape is unclear, read .agents/skills/hogli/SKILL.md and hogli <command> --help.

Class Repro guidance
code regression hogli test path/to/test.py::TestClass::test_method or hogli test <file.test.ts>
flaky test Hand off to the fixing-flaky-tests skill.
lint Use the failing formatter/linter on touched files, e.g. hogli format:python.
typecheck Run the failing checker, e.g. pnpm --filter=@posthog/frontend typescript:check.
snapshot / visual Run the specific Playwright or Storybook workflow; read playwright-test if needed.
migration / schema hogli migrations:check; run migrations only if the user agrees.
codegen drift hogli build:openapi.
infra / runner No local repro. Report and stop.
environment / setup Reproduce the setup step only if cheap and relevant to changed files.
skills build hogli lint:skills; if that passes, hogli build:skills.

Do NOT run hogli test with no arguments. Do NOT run hogli nuke or hogli dev:reset as a shortcut. Do NOT bypass hooks with --no-verify.

PostHog CI notes

  • Most PostHog jobs run on depot-ubuntu-latest or depot-ubuntu-latest-16. Depot runs surface logs through the GitHub Actions UI / gh run view just like standard GitHub-hosted runners. There is no separate Depot console that agents can query in this environment.
  • If a job fails before Checkout completes (no app code ran), classify as infra / runner. Do not propose code fixes.
  • PostHog CI frequently parallelizes the same test class across N shards (backend-tests (3/10) style). Reproduce from the specific failing test path, not the shard index.

Report shape

Keep the response short. Include one likely-cause sentence and avoid deeper speculation.

Target: PR #<num> - run <run-id> (<workflow file>)
Failing job:   <job name>
Failing step:  <step name>
Command:       <failing command>
Excerpt:
  <up to 40 lines, trimmed around the failure>

Classification: <class from the table>
Shadow run:     <yes | no>
Likely cause:   <one sentence>
Local repro:    <exact command, or "none">
Next action (needs your approval):
  - <push fix | rerun job | update snapshot | none>

If the classification is infra / runner or a shadow run, say so and stop; do not propose a code change.

Install via CLI
npx skills add https://github.com/PostHog/posthog --skill debugging-ci-failures
Repository Details
star Stars 34,943
call_split Forks 2,841
navigation Branch main
article Path SKILL.md
More from Creator