name: capture-environment description: Snapshot the computational environment for a replication package — detects the analysis stack (R / Stata / Python) and emits the right lockfiles (renv.lock + sessionInfo.txt, requirements.txt / environment.yml / uv.lock, Stata version + ado package list), records seeds and RNG kind, optionally writes a pinning Dockerfile, and produces a paste-ready "Computational requirements" block. Use when user says "capture the environment", "snapshot my dependencies", "pin the versions", "make a renv.lock / requirements.txt", "make this byte-reproducible", or before releasing a replication package to openICPSR / the AEA Data Editor. argument-hint: "[project-dir] [--docker] [--no-verify] (project-dir defaults to repo root)" allowed-tools: ["Read", "Grep", "Glob", "Write", "Bash"] effort: medium
/capture-environment — snapshot the computational environment
A replication package that runs on the author's laptop in 2026 and nowhere else in 2029 is not reproducible. This skill captures the exact computational environment — language versions, package versions, seeds, RNG kind, and (optionally) the OS layer — so a referee, the AEA Data Editor, or future-you can reconstruct it. It detects which stack the project uses and emits the artifacts that stack's ecosystem expects, then verifies the lockfile installs clean.
Core principle: Pin everything a result depends on. Display rounding aside, a re-run on a pinned environment should reproduce the paper to the replication-protocol.md tolerances — byte-identical when the optional Dockerfile is used.
When to use
- Before releasing a replication package to openICPSR, Zenodo, Dataverse, or a journal archive — the AEA Data Editor / DCAS standard expects a documented, version-pinned environment.
- Before submission, alongside
/audit-reproducibility— that skill checks the numbers; this one captures the environment those numbers were produced in (itssessionInfo.txtrequirement is satisfied by this skill). - After adding or upgrading a package mid-project — re-snapshot so the lockfile doesn't drift from what the code actually loads.
- When handing a project to a co-author or RA who needs to reconstruct your stack.
Inputs
$0— project directory. Defaults to the repo root. The skill looks underscripts/R/,scripts/stata/,scripts/python/.--docker— also emit aDockerfilepinning OS + language version + system libraries for byte-identical reproduction.--no-verify— skip Phase 3 (the best-effort clean-install check). Useful in CI or when the toolchain isn't installed locally.
Workflow
Phase 0: Detect the stack
Glob for stack signals and decide which capture paths to run (a project may be multi-language — DiD in R, an IV robustness check in Stata):
| Signal | Stack | Capture path |
|---|---|---|
scripts/R/*.R, DESCRIPTION, renv/, *.Rproj |
R | renv + sessionInfo |
scripts/python/*.py, *.ipynb, pyproject.toml, requirements.txt, environment.yml, uv.lock |
Python | pip / conda / uv |
scripts/stata/*.do |
Stata | version + ado list |
If no signal is found, report and stop — there is no environment to capture.
Phase 1: Capture per language
R — emit two artifacts:
renv.lockviarenv::snapshot()(runrenv::init(bare = TRUE)first if the project isn't renv-managed; snapshot records every package + version + source/remote and the R version). Honors the seed conventions inr-code-conventions.md.sessionInfo.txtviaRscript -e "writeLines(capture.output(sessionInfo()), 'scripts/R/_outputs/sessionInfo.txt')"— the human-readable companion/audit-reproducibilitylooks for.
Python — emit whichever matches the project's existing tooling (do not invent a new one):
uv.lock(preferred whenpyproject.toml+uvpresent — fully-resolved, hashed, cross-platform):uv lock/uv export --format requirements-txt > requirements.txt.requirements.txtviapip freeze(orpython -m pip freeze) for a venv/pip project — pin==exactly.environment.ymlviaconda env export --no-buildsfor a conda project. Always also record the interpreter version (python --version) in the report.
Stata — Stata has no lockfile, so capture the closest equivalents (mirrors stata-code-conventions.md §3):
- The pinned
versionline each.dofile declares (e.g.version 18) — grepscripts/stata/*.doand report the version actually pinned. - An ado/plus package inventory: a small
.dothat runswhichon the user-installed commands the pipeline uses (reghdfe,ivreg2,estout/esttab,rdrobust,csdid, …) plusado dirandabout, logged toscripts/stata/_outputs/sessionInfo.txt. - A note that Stata version pinning is semantic (
version 18fixes command behavior), not a binary pin — the Dockerfile (Phase 2) cannot help here because Stata is licensed and not redistributable; record the exact Stata version + flavor (SE/MP/IC) + update level in the report so a replicator can match it.
Phase 1b: Record seeds and RNG
Grep the analysis scripts for the master seed and RNG kind so the "Computational requirements" block can state them:
- R:
set.seed(YYYYMMDD), andRNGkind()— flag"L'Ecuyer-CMRG"if parallel/Monte Carlo work is present (seesimulation-conventions.md). - Stata:
set seedandset sortseed. - Python:
numpy.random.default_rng(seed)/random.seed()/ framework seeds.
If the pipeline does randomized work (bootstrap, MC, RCT re-randomization, permutation inference) and no seed is found, surface it as a WARNING — an unseeded random result is not reproducible.
Phase 2: Dockerfile (only with --docker)
Emit a Dockerfile that pins the OS + language version + system libraries for byte-identical reproduction:
- R →
FROM rocker/r-ver:<X.Y.Z>(Rocker pins the R version),COPY renv.lock,RUN R -e "renv::restore()", plusapt-get installfor system libs the packages need (e.g.libcurl4-openssl-dev,libgdal-devfor spatial work). - Python →
FROM python:<X.Y.Z>-slim,COPY requirements.txt/uv.lock,RUN pip install -r requirements.txt(oruv sync --frozen). - Stata → cannot pin the licensed binary; emit a
Dockerfilestub that documents the expected Stata version + flavor and leaves thestatainstall/license step to the replicator (with a comment pointing at the AEA's guidance on Stata images).
Pin a digest where possible (FROM image@sha256:…) so the base image can't drift.
Phase 3: Verify the lockfile installs clean (best-effort; skip with --no-verify)
Attempt a clean restore in a throwaway location and report PASS / FAIL — never overwrite the working environment:
- R:
renv::restore()into a temp library, orRscript -e "renv::status()"for a dry check. - Python:
uv sync --frozen/pip install --dry-run -r requirements.txtinto a fresh venv. - Docker (if
--docker):docker buildthe image.
A FAIL here means the lockfile references a package version that can't be resolved (yanked release, private remote, platform-specific wheel). Report it; do not auto-edit the lockfile.
Phase 4: Report
Print a paste-ready block and write it to scripts/<lang>/_outputs/computational_requirements.md:
## Computational requirements
**Software:** R 4.4.1 (or: Stata 18.0 SE, update 2026-01-15; Python 3.12.3)
**OS used:** macOS 15.5 (arm64) — Dockerfile pins Ubuntu 24.04 for portability
**Key packages:** fixest 0.12.1, did 2.1.2 (full list in renv.lock)
**Random seeds:** set.seed(20260609); RNGkind("L'Ecuyer-CMRG") for the bootstrap
**Approx. runtime:** [author confirms — e.g. ~12 min, 8 cores]
**Lockfiles in package:** renv.lock, scripts/R/_outputs/sessionInfo.txt[, Dockerfile]
Pre-fill software/package/seed lines from the captured artifacts; leave runtime for the author to confirm.
Output / artifacts
| Stack | Files written |
|---|---|
| R | renv.lock, scripts/R/_outputs/sessionInfo.txt |
| Python | requirements.txt or environment.yml or uv.lock (matching project tooling) |
| Stata | scripts/stata/_outputs/sessionInfo.txt (version + ado list) |
Any (--docker) |
Dockerfile |
| Always | scripts/<lang>/_outputs/computational_requirements.md (the paste-ready block) |
Exit behavior
- All captures succeeded, verify PASS (or
--no-verify): exit 0, requirements block printed. - A missing-seed WARNING on a randomized pipeline: exit 0 with the warning surfaced — reproducibility is compromised but the snapshot still wrote.
- Verify FAIL (lockfile won't resolve): exit 1, so the skill can gate a pre-release
/commit. Report the unresolvable package; do not silently "fix" the lockfile. - No stack detected in Phase 0: exit 1 with the directories searched.
Cross-references
.claude/rules/replication-protocol.md— the tolerance contract a pinned environment is meant to reproduce..claude/rules/r-code-conventions.md— R seeding + output-path conventions this skill reads..claude/rules/stata-code-conventions.md— §3sessionInfo.txt+version-pinning the Stata path mirrors..claude/rules/simulation-conventions.md— L'Ecuyer streams for reproducible parallel/MC work..claude/rules/confidential-data.md— when raw data is restricted, the environment still ships even though the data does not; coordinate the README's "data availability" section with this block./audit-reproducibility— consumes thesessionInfo.txtthis skill produces; run it after./data-analysis,/stata-replication,/simulation-study— the pipelines whose environment this snapshots.- AEA Data Editor checklist / openICPSR / DCAS — the external standards this skill targets.
What this skill does NOT do
- Re-run your analysis or check your numbers. It captures the environment;
/audit-reproducibilityverifies the manuscript's numeric claims against the outputs. - Package or de-identify data. Lockfiles describe software, not data. Disclosure avoidance, de-identification, and data-availability statements are out of scope — see
confidential-data.md. - Upgrade or "fix" your dependencies. It records what the code currently uses. If a verify FAIL surfaces a yanked version, you decide whether to pin an alternative.
- Pin a Stata binary. Stata is licensed and not redistributable; the skill records the exact version/flavor/update so a replicator can match it, but cannot containerize it.