methods-reporting - SKILL.md Agent Skill

name: methods-reporting description: Check methods reporting against CONSORT, JARS, DA-RT standards. argument-hint: "[paste your methods section or describe what to audit]"

Transparent Methods Reporter

Instructions

1. Design Documentation and Pre-Registration

This section covers upstream documentation that makes transparent reporting possible. A well-documented bad design is still a bad design; transparency tools are necessary but not sufficient (Druckman 2022).

Design Document: Before data collection, create a comprehensive document recording all decisions and their rationale -- motivation, stimuli, outcome measures, predictions, analysis plans, and logistics. This "design document" is the upstream practice that enables downstream transparent reporting (Druckman 2022, Ch. 5).
Pre-Registration vs. Pre-Analysis Plan: Distinguish between study registration (basic: recording the study's existence, hypotheses, and population in a public repository) and a pre-analysis plan (detailed: specifying exact statistical tests, evidence thresholds, and contingency plans). Require the latter for confirmatory experiments. Name the repository used (OSF, EGAP, or AsPredicted) and provide the registration ID. For PAP structure, cross-reference the pre-registration-writing skill; for estimand, SESOI, and primary/secondary/exploratory classification, cross-reference hypothesis-building.
Six Core Pre-Registration Elements (Lakens's JARS-aligned summary): For preregistered experiments, ensure the following six elements are specified, following Lakens's (2025) summary of the APA Journal Article Reporting Standards for quantitative research (JARS-Quant; Appelbaum et al. 2018): (1) randomization procedure, (2) inclusion/exclusion criteria, (3) sampling procedures and expected participation rate, (4) sample size justification with power analysis or precision rationale, (5) data diagnostics (exclusion criteria, missing data handling, outlier definitions, assumption checks), and (6) analytic strategy organized into primary, secondary, and exploratory tiers. The primary JARS-Quant tables (Appelbaum et al. 2018, American Psychologist; incorporated into APA 2020, Publication Manual, 7th ed., ch. 3) contain a longer branching item set covering observational, clinical-trial, longitudinal, replication, and N-of-1 designs; the six elements above are Lakens's pedagogical condensation, not JARS-Quant in full. For the broader reproducibility context in which JARS sits, see Munafò et al. (2017, "A Manifesto for Reproducible Science").
Minimal vs. Complete Pre-Registration. Waldron and Allen (2022) show that minimal pre-registrations (hypotheses only, or under-specified analysis plans) re-admit exploratory researcher degrees of freedom under a confirmatory banner. A pre-registration that lists hypotheses but leaves variables, exclusion rules, or model specifications open is closer to a public hypothesis than a confirmatory PAP. Require complete specification (items 1--6 above) when a study will be reported as confirmatory.
Analysis Code: The gold standard is to preregister analysis code that runs on a simulated dataset, eliminating ambiguity about all analytical decisions (Lakens 2025). For conjoint analyses, provide the regression specification in code form (e.g., R lm() or feols() call).
Pilot Documentation: Document all pilot studies, including not just manipulation check results but also response rate data, recruitment language testing, and any modifications made as a result (Druckman 2022, Ch. 5). For conjoints, pilot whether respondents attend to all attributes, find combinations plausible, and process the display as intended.

2. Subjects, Recruitment, and Setting

Eligibility: Explicitly state who was eligible to participate and the criteria for subject selection.
Target Population: State the target population to which inference is intended. This includes the units, contexts, measures, and outcomes (Druckman 2022, Ch. 6). Distinguish between the sampling frame (who could be reached) and the target population (who the theory applies to).
Timeline: Report the exact dates of the recruitment period and when the experiments were conducted, including any repeated measurements or follow-ups. For "firehouse studies" conducted in response to real-world events, document the lag between the event and data collection (Mutz 2011).
Provider Details: For survey experiments, identify the survey firm used and describe their recruitment methods if they are not universally known. Note whether the sample is probability-based or nonprobability (quota-matched online panel) and the implications for generalizability.
Response Rates: Provide the response rate and specify the exact formula used for its calculation.
Survey Error Pipeline: Report how each of three sequential error sources was addressed: (1) coverage error (does the sampling frame reach the target population?), (2) sampling error (does the sample represent the frame?), and (3) nonresponse error (do respondents differ from non-respondents?) (Stantcheva 2023).
Benchmark Validation: Compare sample demographics and key attitude measures to those from existing high-quality, representative surveys that serve as benchmarks for the target population (Stantcheva 2023).
Context: Detail the specific setting (e.g., lab, field, online panel) and relevant geographic or institutional characteristics of the population.
Incentives: Describe the form and amount of any incentives provided to participants (Gerber et al. 2014).

3. Allocation and Treatment

Randomization Procedure: State clearly if random assignment was used and describe the specific procedure (e.g., simple randomization, blocking, stratification, or restrictions). Identify the software or tool used for randomization (Gerber et al. 2014).
Unit of Randomization: Explicitly define the unit of randomization -- whether individuals, households, groups, or clusters.
Assignment Sequence: Provide details on the exact randomization sequence: who generated it, when it was generated, and whether it was concealed from researchers during enrollment (Gerber et al. 2014).
Blinding: Report whether single-blinding (subjects unaware of condition), double-blinding (subjects and analysts unaware), or no blinding was used (Gerber et al. 2014).
Baseline Balance: Provide a table of baseline means and standard deviations for demographic characteristics and other pretreatment measures across all experimental groups to detect potential errors in assignment.
Intervention Detail: Describe every treatment condition and the control condition in detail. This must include exact stimuli, scripts, images, or question wordings. Specify the mode of delivery (e.g., text, audio, video, in-person) (Gerber et al. 2014).
Material Availability: Ensure complete treatment materials (vignettes, mailings, software programs) are provided in an appendix for replication. Material availability is not just a reporting requirement; it is infrastructure for cumulative science (Druckman 2022).
Manipulation Checks: If manipulation checks are used, report their exact wording, placement in the survey flow, and results. Do not selectively exclude respondents who "fail" manipulation checks without pre-specifying this exclusion rule and reporting results both with and without exclusions (Druckman 2022). Place comprehension checks at the end of the survey or after outcome elicitation to avoid signaling the study's purpose (Stantcheva 2023).
Question Wording Standards: Use item-specific scales rather than agree-disagree, true-false, or yes-no formats to reduce acquiescence bias. Randomize response option order for nominal items; invert order for ordinal items. Separate question stems from response alternatives with a semantic pause for forced-choice items (Stantcheva 2023).
Soft Launch: Before full deployment, run a small-scale "soft launch" of the complete survey to check for technical issues in the survey flow (loading, display, branching logic), separate from content pretesting. Document any issues discovered and modifications made (Stantcheva 2023).

4. Measurement and Sample Flow

Variable Definitions: Provide precise definitions for how all primary outcomes, secondary outcomes, and covariates are measured and coded.
Index Construction: If an index is used, explain exactly how it was constructed.
CONSORT-Style Sample Flow: Document the sample at every stage, following the enrollment → allocation → follow-up → analysis structure of the CONSORT 2010 flow diagram (Schulz, Altman, and Moher 2010, BMJ 340:c332). The CONSORT 2010 Statement includes a 25-item checklist and a four-stage flow diagram; experiments reported in political science and social psychology typically adopt the flow structure even when the full 25-item medical-trial checklist does not apply.
- The number initially assessed for eligibility.
- Any exclusions prior to random assignment and the specific reasons for them.
- The number of subjects assigned to each experimental group.
- The proportion of each group that actually received the intended intervention and reasons for non-receipt.
Attrition and Missing Data: Report the number of subjects in each group who dropped out or lack outcome data. Examine whether this attrition is related to treatment assignment. Report missing data handling procedures (listwise deletion, imputation, or other methods) and justify the choice (Lakens 2025, citing Wicherts et al. 2016).
Outlier Procedures: If outliers are excluded or winsorized, state the definition used and report results both with and without outlier treatment.
Analysis Sample: State the number of subjects included in the final analysis for each group and provide a rationale for any cases deleted at this stage.

5. Statistical Analysis

Sample Size Justification: Report the type of sample size justification used. Lakens (2025) identifies six types: (1) measure the entire population, (2) resource constraints, (3) accuracy (confidence interval width), (4) a priori power analysis, (5) heuristics, or (6) no justification. For a priori power analyses, report: the test used, the assumed effect size and its source, alpha, power, and the resulting N. For resource-constrained designs, conduct a sensitivity analysis reporting the minimum detectable effect given the available N.
Raw Results First: Always report unadjusted (raw) results alongside any corrected or reweighted results, either as the primary presentation or in an appendix. This allows readers to assess the impact of any statistical adjustments (Stantcheva 2023).
Intent-to-Treat (ITT): Prioritize ITT analysis by reporting sample means, standard deviations, and Ns for outcome variables for the entire collection of subjects assigned to a group, regardless of whether they received the treatment.
Clustering and Weights: Note if the level of analysis differs from the level of randomization and describe any weighting procedures in detail.
Three-Tier Results Labeling: Label every reported result as primary (Type I and Type II error rates controlled), secondary (Type I controlled, Type II not), or exploratory (error rates uncontrolled). Exploratory results must be "clearly labeled, justified, methodologically sound, and informative" (Lakens 2025; Druckman 2022, citing JEPS review criteria).
Equivalence Test Reporting: When reporting equivalence tests (TOST procedure), report: the equivalence bounds in raw effect size units (not Cohen's d), the higher p-value of the two one-sided tests, and the 90% confidence interval (not 95%, because TOST uses two one-sided tests at alpha = 0.05). Never claim "no effect" -- instead state that effects more extreme than the equivalence bounds were rejected (Lakens 2025).
AMCE Estimation: For conjoint designs, specify: the estimator (LPM with respondent fixed effects is standard), clustering structure (SEs clustered at respondent level), and how marginal means are computed.
Interaction Specification: If interaction models are pre-registered, report the exact interaction terms, the hypothesis each tests, and the visualization method (e.g., conditional marginal means plots).
Cross-Group Models: If the design is fielded across multiple sites/countries, specify whether per-group models are estimated separately or pooled with group × attribute interactions.
Pre-Specified Figures: List all planned figures with a brief description of what each shows and which hypothesis it tests.
Sensitivity and Robustness: Beyond secondary DV replication, specify planned robustness checks: specification curves, alternative exclusion criteria, alternative model specifications. This addresses the "garden of forking paths" concern -- the risk that researchers unconsciously make particular analytic decisions that lead to outcomes that may not occur under other reasonable decisions (Gelman and Loken 2014; see also the Wicherts et al. 2016 34-item DF checklist for a comprehensive enumeration of researcher degrees of freedom to close through reporting).

6. Conjoint-Specific Reporting

When the experiment uses a conjoint or factorial vignette design, report the complete attribute table (with levels and reference categories), any randomization constraints and their justification, the task structure (number of tasks, forced choice vs. rating, attribute and profile randomization), clearly distinguished primary and secondary DVs with exact wording, all post-block items, and the effective sample size (Respondents × Tasks × Profiles). For factorial vignettes, also report the assembly template and worked examples. The primary methodological references are Hainmueller, Hopkins, and Yamamoto (2014) and Bansak, Hainmueller, Hopkins, and Yamamoto (2021, Advances in Experimental Political Science).

Conjoint-specific reporting: see reference/conjoint-reporting.md.

7. Validity Framework

Four Validity Types: Evaluate the design against Druckman's (2022) four validity types: (1) construct validity (do measures capture the intended concepts?), (2) statistical conclusion validity (are the statistical inferences correct?), (3) internal validity (is the causal claim warranted?), and (4) external validity (does the finding generalize?). Random assignment provides internal validity; representative sampling provides generalizability -- these are independent contributions (Mutz 2011).
Deviation Reporting: When deviating from a preregistered plan, document: (a) what changed, (b) why, (c) the impact on severity (does the deviation make the test more or less capable of falsifying the hypothesis?), and (d) the impact on validity (does the deviation improve or degrade the design's ability to measure what it claims?). Not all deviations reduce quality -- fixing a validity problem can increase test severity (Lakens 2025). This skill enforces disclosure of deviations; any narrative framing of revisions belongs in the narrative-building skill, not here.

8. Open Science Infrastructure

DA-RT Three Pillars: Adhere to the Data Access and Research Transparency principles articulated for political science: (1) data access -- share replication data, (2) production transparency -- document how data were generated, and (3) analytic transparency -- provide complete analysis code. The canonical formulation is Lupia and Elman (2014, PS: Political Science & Politics 47(1), 19--42), the lead article of the DA-RT symposium, later codified in the APSA Guide to Professional Ethics in Political Science and extended by the APSA (2020) Principles and Guidance for Human Subjects Research. Cross-journal operationalization appears in the Transparency and Openness Promotion (TOP) Guidelines (Nosek et al. 2015, Science 348:1422--1425), which translate DA-RT-style commitments into eight journal-policy standards (citation, data, analytic methods, research materials, design, pre-registration of studies, pre-registration of analyses, replication) at three enforcement levels. See Christensen, Freese, and Miguel (2019) for implementation guidance.
Data Sharing Plan: Specify what data will be shared, in what format, with what documentation, and on what platform. For cross-national studies, address varying privacy regimes across national contexts (Christensen et al. 2019).
Reproducible Workflow: Provide version-controlled analysis code that reproduces all results from raw data to published figures. The path from raw data to published results should be fully documented and executable, not just described in prose (Christensen et al. 2019).
During-Collection Documentation: Maintain session logs, variable creation decisions, case selection decisions, and analytic code version control throughout data collection -- not only after (Druckman 2022, Ch. 5).
IRB and Ethics: Report IRB approval details. For audit studies or studies involving deception, note the specific ethical provisions. Begin the IRB process early -- it can take weeks or months (Druckman 2022). Political-science-specific human-subjects guidance is in APSA (2020, Principles and Guidance for Human Subjects Research), which covers power differentials, deception, consent, and the obligation to acknowledge and justify deviations from the principles in published work. For manuscript-level checks on data availability, ethics, IRB, and funding statements, cross-reference the paper-review-lite skill (or its heavier standalone counterpart, presubmit).

Quality Checks

The full 45-item reporting checklist synthesizes the 19-item APSA Experimental Section rubric (Gerber et al. 2014), JARS-Quant (Appelbaum et al. 2018; APA 2020) as condensed by Lakens (2025), DA-RT principles (Lupia and Elman 2014; Nosek et al. 2015), conjoint best practice (Hainmueller, Hopkins, and Yamamoto 2014; Bansak et al. 2021), and survey-quality recommendations (Stantcheva 2023). Items are grouped into four blocks: Core Reporting (1--19), Conjoint-Specific Reporting (20--28), Design Transparency (29--40), and Survey Design Quality (41--45). No single authority mandates all 45 as a unit; treat the list as a recommended reporting ceiling, not a minimum-mandatory floor.

For a paragraph-level worked example illustrating how items 1--19 can be compactly reported in ~650 words of prose plus named supplementary materials, see reference/example-methods-paragraph.md.

Full checklist: see reference/checklist.md.