aer-identification - SKILL.md Agent Skill

name: aer-identification description: Use when selecting, implementing, or stress-testing the causal identification strategy for an empirical economics manuscript — difference-in-differences (including staggered designs), instrumental variables (including weak-IV-robust inference), regression discontinuity, synthetic control, or shift-share / Bartik. Apply before writing the introduction or results.

AER Identification

Overview

In modern AER-track empirical economics, identification is the paper. A weak design cannot be rescued by clever writing, more controls, or a larger sample. This skill walks through the five canonical design-based strategies, the modern defaults that have replaced naive textbook implementations, and the referee-anticipating tests each demands.

If the identification strategy is fragile, return to aer-topic-selection. There is no point polishing an indefensible empirical strategy.

When to Use

Designing the empirical strategy for a new project
The current strategy is TWFE / first-stage F / naive RDD and the referee will flag it
A prior submission was rejected on identification grounds and the design needs rebuilding
Choosing between two candidate identification strategies for the same question

Master Decision Tree

Is treatment assignment plausibly random conditional on observables?
├── Yes, by design (RCT, lottery) → run the RCT analysis; register PAP via AEA RCT Registry
└── No → identification must come from variation
    ├── Sharp threshold in a running variable → RDD (sharp or fuzzy)
    ├── Discrete policy change in some units, not others, over time → DiD
    │     ├── Single treatment date → canonical 2×2 DiD
    │     └── Staggered adoption → Callaway-Sant'Anna or Borusyak-Jaravel-Spiess
    ├── Endogenous regressor + plausibly exogenous shifter → IV
    │     ├── Shifter × pre-existing exposure shares → shift-share / Bartik
    │     └── Single instrument → weak-IV-robust inference if F < 50
    ├── One treated unit / aggregate intervention → synthetic control
    └── None of the above → reconsider the question

Difference-in-Differences

Canonical 2×2 (single treatment date, two groups)

Use TWFE if and only if:

Treatment timing is simultaneous for all treated units
The control group is never treated
Treatment-effect heterogeneity is implausible

Otherwise, TWFE produces biased and often sign-flipped estimates.

Staggered Adoption (most modern applications)

Do not use TWFE. Use one of:

Callaway and Sant'Anna (2021) — csdid (Stata), did (R). Identifies group-time average treatment effects (ATT(g,t)); estimands are doubly robust; supports event-study aggregation.
Borusyak, Jaravel, and Spiess (2024) — imputation estimator.
de Chaisemartin and D'Haultfœuille (2020) — did_multiplegt.
Sun and Abraham (2021) — interaction-weighted estimator for event studies.

Required diagnostics:

Goodman-Bacon decomposition to show the share of weight from "forbidden" comparisons under TWFE
Event-study plot with the imputation or Callaway-Sant'Anna estimator
Pre-trends test reported as the joint test, not just the visual
Heterogeneity by treatment cohort

Pre-Trends

A flat pre-trend is necessary but not sufficient. Report:

Visual event-study plot with 95% confidence intervals
Formal joint test of pre-period coefficients (p-value)
Honest DiD (Rambachan-Roth 2023) sensitivity bounds for the post-period

Instrumental Variables

Weak Instruments

The first-stage F > 10 rule is obsolete. Modern conventions:

For just-identified models: report Anderson-Rubin (AR) confidence sets as the primary inference. AR has correct size regardless of instrument strength.
For F < 50: 2SLS confidence intervals are unreliable; AR is required, not optional.
Stock-Yogo critical values for TSLS bias assume homoskedasticity and are rarely valid in modern clustered settings.

Use weakivtest (Stata), ivDiag (R), or the Olea-Pflueger effective F statistic.

Exclusion Restriction

The IV's credibility depends on a story, not a test. State the exclusion restriction in one sentence in the introduction and defend it with:

Institutional narrative (one paragraph)
A placebo regression where the instrument predicts an outcome it should not affect
Sensitivity analysis: how much exclusion-restriction violation would overturn the result (Conley et al. 2012)

Shift-Share / Bartik

Two valid sources of identification, with very different implications:

Exogenous shares (Goldsmith-Pinkham, Sorkin, Swift 2020) — argue that pre-existing exposure shares are conditionally exogenous; report the Rotemberg weights and inspect the top-5 industries driving identification.
Exogenous shocks (Borusyak, Hull, Jaravel 2022; Adão, Kolesár, Morales 2019) — argue that aggregate shocks are as-good-as-random; report shock-level inference.

Pick one explicitly. Do not hand-wave between the two.

Regression Discontinuity

Modern Defaults

Local linear regression with a triangular kernel. Polynomials of order > 1 are discouraged (Gelman-Imbens 2019).
MSE-optimal bandwidth (Calonico-Cattaneo-Titiunik 2014) with the robust bias-corrected confidence interval. Use rdrobust.
Donut RDD if bunching near the cutoff is a concern.
Covariate adjustment for efficiency; main result must hold without it.

Required Diagnostics

McCrary (2008) / Cattaneo-Jansson-Ma (2020) density test for manipulation of the running variable
Balance tests on predetermined covariates at the cutoff
Placebo cutoffs away from the true threshold
Bandwidth sensitivity — show the estimate across at least three bandwidths
Visual RD plot using rdplot with the binning method explicitly stated

Synthetic Control

When Appropriate

One (or few) treated units
Long pre-treatment outcome series (≥ 10 periods)
A large donor pool of plausibly comparable untreated units
Aggregate intervention (policy at the country, state, city level)

Modern Extensions

Generalized synthetic control (Xu 2017) for multiple treated units
Augmented synthetic control (Ben-Michael, Feller, Rothstein 2021) for bias correction
Synthetic DiD (Arkhangelsky et al. 2021) combining SCM and DiD weighting

Required Diagnostics

Placebo (in-time): apply SCM to pre-treatment fake intervention dates
Placebo (in-space): apply SCM to every donor as if it were treated; report the distribution of placebo effects
Permutation inference / Fisher exact p-value
Weight vector reported in the appendix; donors with > 10% weight discussed

Field Experiments and RCTs

If the paper uses a field experiment:

Register with AEA RCT Registry before the intervention begins. AEA journals require this prior to submission.
Pre-analysis plan (PAP) posted before unblinding. Per Olken and others, keep the PAP moderate in scope — pre-specify primary outcomes and the analysis specification, leave exploratory work clearly labeled as such.
Power calculations in the manuscript or appendix.
Multiple-hypothesis correction if more than one primary outcome.
Attrition documented and tested for differential attrition by treatment arm.

Mechanism vs. Identification

A common confusion: identification answers whether X causes Y; mechanism answers why. Mechanism evidence should not weaken the identification of the main effect. Run:

Subgroup heterogeneity (does the effect concentrate where theory predicts?)
Mediation analysis only if the mediator is itself plausibly exogenous (rare)
Auxiliary outcomes consistent with the proposed channel

Red Flags for Referees

TWFE on staggered data with no Goodman-Bacon decomposition
First-stage F = 12 cited as evidence of instrument strength
RDD with a polynomial of order 4
Synthetic control with no placebo inference
DiD with a "control group" of eventually-treated units
IV exclusion restriction defended only by "we control for X"
Quoting an Angrist-Pischke citation as a substitute for showing the diagnostic

Repository Resources

When working from the AER-skills repository or plugin bundle, load only the relevant resource:

Estimator defaults, package calls, diagnostics, and citations: docs/methods-reference.md
Staggered DiD implementation: templates/stata/03_main_did.do, templates/r/03_main_did.R, or templates/python/main_did.py
Worked empirical examples: examples/aer-exemplars.md and examples/modern-aer-exemplars.md

Use the methods reference before drafting prose: it fixes the estimand, diagnostic, inference method, and citation that the manuscript must report.

Identification Gate

Do not advance to robustness or writing until, for the chosen design, all are true:

A modern estimator is used — no TWFE on staggered data, no first-stage-F-only IV, no high-order-polynomial RDD
Every required diagnostic for the design (see the per-design lists above) is run and reported
Inference matches the design — cluster-robust / AR / wild bootstrap / permutation, not default OLS SEs by reflex
The identifying assumption is stated in one sentence, ready to drop into the introduction
No item in "Red Flags for Referees" is present

Handoff

STRATEGY: <DiD | IV | RDD | SCM | shift-share | RCT>
MODERN ESTIMATOR USED: <yes / no / which>
REQUIRED DIAGNOSTICS REPORTED: <list>
INFERENCE METHOD: <robust / cluster-robust / AR / wild bootstrap / permutation>
WEAK-IV / TWFE / POLY-ORDER RED FLAGS: <list or "none">
NEXT SKILL: aer-robustness

Anti-Patterns

Defending an old design ("the prior literature used TWFE") when modern estimators exist
Reporting OLS-with-controls as the main specification and IV/RD as "robustness"
Using more than one identification strategy as if they were independent confirmations when they share identifying variation
Footnoting the identifying assumption instead of stating it in the introduction