name: psychbull-moderators-and-bias description: Use when explaining heterogeneity and probing robustness in a Psychological Bulletin meta-analysis — moderator/subgroup analysis, meta-regression, and publication-bias diagnostics (funnel, Egger, trim-and-fill, PET-PEESE, p-curve, selection models) plus sensitivity analyses. Extends the core model; estimation lives in psychbull-meta-analysis-methods.
Moderators & Publication Bias (psychbull-moderators-and-bias)
Once a pooled effect and its heterogeneity exist, two questions decide the paper's credibility: what
explains the variation (moderators), and is the effect an artifact of selective reporting
(publication bias). Psychological Bulletin reviewers scrutinize both, and MARS requires reporting
bias assessment. This skill extends the core model in psychbull-meta-analysis-methods.
When to trigger
- Testing pre-specified moderators / meta-regression to explain heterogeneity
- Running publication-bias diagnostics
- A reviewer asks for sensitivity / robustness analyses
- Reconciling conflicting signals across bias tests
Moderators & meta-regression
- Pre-specify moderators in the protocol; treat unplanned ones as exploratory and label them.
- Use mixed-effects meta-regression (categorical subgroups and continuous moderators); report the moderator coefficient, its CI, residual heterogeneity, and R² analog (variance explained).
- Beware ecological/aggregation bias (study-level moderators ≠ individual-level), multiple testing across many moderators, and confounded moderators; interpret cautiously.
Publication-bias diagnostics (run several, not one)
- Funnel plot (with contour enhancement) — visual asymmetry; not proof on its own.
- Egger's regression / rank tests — small-study effects, with the usual caveats under high heterogeneity.
- Trim-and-fill — imputes "missing" studies; treat as sensitivity, not truth.
- PET-PEESE — regression-based bias-adjusted estimate.
- p-curve / p-uniform — evidential value and right-skew vs. p-hacking signatures.
- Three-parameter selection models (
weightr) — model the selection process directly.
No single test is decisive; converging evidence across methods is the standard, and all are weak under strong heterogeneity — say so.
Sensitivity & robustness
- Leave-one-out and influence/outlier diagnostics; refit without high-leverage studies.
- Sensitivity to effect-size metric, model (RVE vs. multilevel), and inclusion borderline.
- Subset by study quality / risk of bias; published vs. grey literature.
Anti-patterns
- Mining dozens of moderators and theorizing the one that hits (HARKing); no multiple-testing caution
- A single bias test reported as if it settled the question
- Trim-and-fill or PET-PEESE reported as the "true" effect rather than a sensitivity bound
- Ignoring that bias diagnostics behave poorly under high heterogeneity
- Subgroup claims from tiny k (few studies per cell)
What Psychological Bulletin referees demand here
The APA's flagship review journal treats moderator and bias work as the place where a competent meta-analysis either earns trust or collapses. Referees at this venue apply a recognizable bar:
| Referee expectation | Pass | Desk-reject / major-revision trigger |
|---|---|---|
| Moderators pre-registered | Listed in protocol, confirmatory vs. exploratory labeled | Moderators appear only in Results, none in the protocol — read as fishing |
| Multiple bias diagnostics | Funnel + Egger + selection model + PET-PEESE converge | One funnel plot, eyeballed, called "no evidence of bias" |
| Bias caveats under heterogeneity | States that diagnostics weaken when I² is high | Egger taken at face value with I² = 75% |
| Subgroup k disclosed | k per cell reported; thin cells flagged | A moderator "effect" rests on a cell of k = 3 |
| Sensitivity breadth | Leave-one-out + metric + model + quality subsets | A single estimate, no robustness at all |
Worked vignette — bias and moderators on an intervention synthesis
Illustrative numbers only — not real data. A random-effects synthesis of a self-affirmation intervention pools k = 42 effects, g = 0.34, 95% CI [0.24, 0.44], I² = 68%, τ² = 0.041. The moderator/bias pass under this skill's rules:
- Pre-specified moderator (delivery format, 3 levels): mixed-effects meta-regression gives an R²-analog of 0.22; residual I² drops to 51%. Confirmatory, so it carries theoretical weight.
- Exploratory moderator (publication year): tested but labeled exploratory; the slope is null and reported as such, not spun.
- Bias diagnostics run together: funnel asymmetry is visible; Egger p = 0.03; trim-and-fill adds 6 imputed studies and shifts g to 0.27 (a sensitivity bound, not "the truth"); a three-parameter selection model lands g ≈ 0.25; PET-PEESE gives 0.21. Convergence says the effect is real but likely inflated, so the abstract reports the range, not the rosy 0.34.
- Sensitivity: leave-one-out moves g within [0.31, 0.36]; restricting to low-risk-of-bias studies (k = 19) gives 0.29. The bottom line is hedged accordingly.
Referee pushback → venue-specific fix
- "Your moderators look post-hoc." → Cite the protocol; relabel any unplanned moderator as exploratory.
- "A single funnel plot is not a bias analysis." → Add Egger, a selection model, and PET-PEESE; report convergence and the heterogeneity caveat.
- "Subgroup claim rests on too few studies." → Disclose k per cell; down-weight thin-cell claims.
Output format
【Moderators】pre-specified vs exploratory; meta-regression coef + CI + R²
【Residual heterogeneity】after moderators
【Bias diagnostics】funnel / Egger / trim-fill / PET-PEESE / p-curve / selection — converge?
【Sensitivity】leave-one-out, metric, model, quality subsets
【Bottom line】is the effect robust? [statement]
【Next】psychbull-theory-integration
Supplementary resources
../../resources/external_tools.md—metafor,dmetar(PET-PEESE),weightr,puniform, p-curve../../resources/official-source-map.md— MARS bias-assessment reporting