dag-reasoning - SKILL.md Agent Skill

name: dag-reasoning description: > Reason about causal structure using directed acyclic graphs for epidemiological studies. Use when identifying confounders, mediators, or colliders; when deciding which variables to adjust for; or when a reviewer questions the adjustment strategy.

DAG-Based Causal Reasoning

Variable classification rules

Before adjusting for any variable, classify its role:

Role	Definition	Adjust?	Consequence of error
Confounder	Common cause of exposure and outcome	Yes	Omitting causes bias
Mediator	On causal pathway from exposure to outcome	No (unless decomposing)	Adjusting blocks the effect of interest
Collider	Common effect of exposure and outcome	No	Adjusting opens spurious path
Instrument	Affects exposure only, not outcome directly	No (use for IV methods)	Adjusting reduces precision
Precision variable	Predicts outcome, unrelated to exposure	Optional	Improves precision if included

The backdoor criterion

To estimate the causal effect of X on Y:

List all paths from X to Y
Identify which paths are causal (follow arrow direction) and which are non-causal (backdoor paths)
Find a set of variables that blocks all backdoor paths without opening new non-causal paths
This set is a valid adjustment set

A path is blocked if it passes through:

A variable you condition on (if that variable is not a collider on the path)
A collider you do NOT condition on

Drawing DAGs

Use ASCII art for simple DAGs:

  Confounder
   /      \
  v        v
Exposure --> Outcome

Exposure --> Mediator --> Outcome

Exposure --> Collider <-- Outcome
(DO NOT condition on Collider)

Guidelines:

Include all variables in the analysis plus important omitted ones
Every missing arrow is an assumption (no direct effect)
Time flows left to right where possible
Mark adjusted variables with brackets: [Variable]

Common pitfalls

Table 2 fallacy

Interpreting coefficients of adjustment variables as if they are causal effects of those variables. Each variable in a model has a different confounding structure; the adjustment set valid for the primary exposure is not valid for other variables.

Overadjustment

Conditioning on a descendant of the outcome, which can induce bias. Example: adjusting for a variable caused by the outcome.

M-bias

Conditioning on a variable that is a common effect of two unobserved causes (one of the exposure, one of the outcome). This opens a path that was previously blocked.

Adjusting for a proxy

Using a proxy for a confounder may not fully block the backdoor path. Residual confounding remains proportional to how poor the proxy is.

Time-varying confounding

When a confounder is itself affected by prior exposure values. Standard regression cannot handle this; requires marginal structural models or g-estimation.

Hierarchical/nested structures

When units are nested (e.g., models within method classes, patients within hospitals):

Effects can operate at different levels
Adjusting for group-level variables when the exposure varies at the group level can absorb signal
Consider whether random effects are appropriate vs fixed effects
A variable that is constant within clusters cannot confound within-cluster comparisons

Applying DAGs to forecast evaluation

Example for a study of method effects on forecast accuracy:

                Trend
                 |
                 v
Method -----> Accuracy <----- Horizon
  ^              ^               |
  |              |               |
  +-- Location --+               |
  |              |               |
  +---- Time ----+---------------+

Method -> Accuracy: The effect of interest
Location, Time, Trend, Horizon -> Accuracy: Prediction difficulty confounders (adjust)
Location, Time -> Method: If some methods are used in specific locations/periods (adjust)
Model: Nested within Method; random effect absorbs model-specific skill

Key question: Is "number of forecasts submitted" a confounder, mediator, or collider?

If more experienced teams (exposure -> submissions) also forecast better (submissions -> outcome): mediator. Do not adjust.
If better accuracy leads to continued participation: collider on a specific path. Adjusting may bias.