capture-environment

star 1.2k

Snapshot the computational environment for a replication package — detects the analysis stack (R / Stata / Python) and emits the right lockfiles (renv.lock + sessionInfo.txt, requirements.txt / environment.yml / uv.lock, Stata version + ado package list), records seeds and RNG kind, optionally writes a pinning Dockerfile, and produces a paste-ready "Computational requirements" block. Use when user says "capture the environment", "snapshot my dependencies", "pin the versions", "make a renv.lock / requirements.txt", "make this byte-reproducible", or before releasing a replication package to openICPSR / the AEA Data Editor.

pedrohcgs By pedrohcgs schedule Updated 6/9/2026

name: capture-environment description: Snapshot the computational environment for a replication package — detects the analysis stack (R / Stata / Python) and emits the right lockfiles (renv.lock + sessionInfo.txt, requirements.txt / environment.yml / uv.lock, Stata version + ado package list), records seeds and RNG kind, optionally writes a pinning Dockerfile, and produces a paste-ready "Computational requirements" block. Use when user says "capture the environment", "snapshot my dependencies", "pin the versions", "make a renv.lock / requirements.txt", "make this byte-reproducible", or before releasing a replication package to openICPSR / the AEA Data Editor. argument-hint: "[project-dir] [--docker] [--no-verify] (project-dir defaults to repo root)" allowed-tools: ["Read", "Grep", "Glob", "Write", "Bash"] effort: medium

/capture-environment — snapshot the computational environment

A replication package that runs on the author's laptop in 2026 and nowhere else in 2029 is not reproducible. This skill captures the exact computational environment — language versions, package versions, seeds, RNG kind, and (optionally) the OS layer — so a referee, the AEA Data Editor, or future-you can reconstruct it. It detects which stack the project uses and emits the artifacts that stack's ecosystem expects, then verifies the lockfile installs clean.

Core principle: Pin everything a result depends on. Display rounding aside, a re-run on a pinned environment should reproduce the paper to the replication-protocol.md tolerances — byte-identical when the optional Dockerfile is used.

When to use

  • Before releasing a replication package to openICPSR, Zenodo, Dataverse, or a journal archive — the AEA Data Editor / DCAS standard expects a documented, version-pinned environment.
  • Before submission, alongside /audit-reproducibility — that skill checks the numbers; this one captures the environment those numbers were produced in (its sessionInfo.txt requirement is satisfied by this skill).
  • After adding or upgrading a package mid-project — re-snapshot so the lockfile doesn't drift from what the code actually loads.
  • When handing a project to a co-author or RA who needs to reconstruct your stack.

Inputs

  • $0 — project directory. Defaults to the repo root. The skill looks under scripts/R/, scripts/stata/, scripts/python/.
  • --docker — also emit a Dockerfile pinning OS + language version + system libraries for byte-identical reproduction.
  • --no-verify — skip Phase 3 (the best-effort clean-install check). Useful in CI or when the toolchain isn't installed locally.

Workflow

Phase 0: Detect the stack

Glob for stack signals and decide which capture paths to run (a project may be multi-language — DiD in R, an IV robustness check in Stata):

Signal Stack Capture path
scripts/R/*.R, DESCRIPTION, renv/, *.Rproj R renv + sessionInfo
scripts/python/*.py, *.ipynb, pyproject.toml, requirements.txt, environment.yml, uv.lock Python pip / conda / uv
scripts/stata/*.do Stata version + ado list

If no signal is found, report and stop — there is no environment to capture.

Phase 1: Capture per language

R — emit two artifacts:

  • renv.lock via renv::snapshot() (run renv::init(bare = TRUE) first if the project isn't renv-managed; snapshot records every package + version + source/remote and the R version). Honors the seed conventions in r-code-conventions.md.
  • sessionInfo.txt via Rscript -e "writeLines(capture.output(sessionInfo()), 'scripts/R/_outputs/sessionInfo.txt')" — the human-readable companion /audit-reproducibility looks for.

Python — emit whichever matches the project's existing tooling (do not invent a new one):

  • uv.lock (preferred when pyproject.toml + uv present — fully-resolved, hashed, cross-platform): uv lock / uv export --format requirements-txt > requirements.txt.
  • requirements.txt via pip freeze (or python -m pip freeze) for a venv/pip project — pin == exactly.
  • environment.yml via conda env export --no-builds for a conda project. Always also record the interpreter version (python --version) in the report.

Stata — Stata has no lockfile, so capture the closest equivalents (mirrors stata-code-conventions.md §3):

  • The pinned version line each .do file declares (e.g. version 18) — grep scripts/stata/*.do and report the version actually pinned.
  • An ado/plus package inventory: a small .do that runs which on the user-installed commands the pipeline uses (reghdfe, ivreg2, estout/esttab, rdrobust, csdid, …) plus ado dir and about, logged to scripts/stata/_outputs/sessionInfo.txt.
  • A note that Stata version pinning is semantic (version 18 fixes command behavior), not a binary pin — the Dockerfile (Phase 2) cannot help here because Stata is licensed and not redistributable; record the exact Stata version + flavor (SE/MP/IC) + update level in the report so a replicator can match it.

Phase 1b: Record seeds and RNG

Grep the analysis scripts for the master seed and RNG kind so the "Computational requirements" block can state them:

  • R: set.seed(YYYYMMDD), and RNGkind() — flag "L'Ecuyer-CMRG" if parallel/Monte Carlo work is present (see simulation-conventions.md).
  • Stata: set seed and set sortseed.
  • Python: numpy.random.default_rng(seed) / random.seed() / framework seeds.

If the pipeline does randomized work (bootstrap, MC, RCT re-randomization, permutation inference) and no seed is found, surface it as a WARNING — an unseeded random result is not reproducible.

Phase 2: Dockerfile (only with --docker)

Emit a Dockerfile that pins the OS + language version + system libraries for byte-identical reproduction:

  • RFROM rocker/r-ver:<X.Y.Z> (Rocker pins the R version), COPY renv.lock, RUN R -e "renv::restore()", plus apt-get install for system libs the packages need (e.g. libcurl4-openssl-dev, libgdal-dev for spatial work).
  • PythonFROM python:<X.Y.Z>-slim, COPY requirements.txt / uv.lock, RUN pip install -r requirements.txt (or uv sync --frozen).
  • Stata → cannot pin the licensed binary; emit a Dockerfile stub that documents the expected Stata version + flavor and leaves the stata install/license step to the replicator (with a comment pointing at the AEA's guidance on Stata images).

Pin a digest where possible (FROM image@sha256:…) so the base image can't drift.

Phase 3: Verify the lockfile installs clean (best-effort; skip with --no-verify)

Attempt a clean restore in a throwaway location and report PASS / FAIL — never overwrite the working environment:

  • R: renv::restore() into a temp library, or Rscript -e "renv::status()" for a dry check.
  • Python: uv sync --frozen / pip install --dry-run -r requirements.txt into a fresh venv.
  • Docker (if --docker): docker build the image.

A FAIL here means the lockfile references a package version that can't be resolved (yanked release, private remote, platform-specific wheel). Report it; do not auto-edit the lockfile.

Phase 4: Report

Print a paste-ready block and write it to scripts/<lang>/_outputs/computational_requirements.md:

## Computational requirements

**Software:** R 4.4.1 (or: Stata 18.0 SE, update 2026-01-15; Python 3.12.3)
**OS used:** macOS 15.5 (arm64) — Dockerfile pins Ubuntu 24.04 for portability
**Key packages:** fixest 0.12.1, did 2.1.2 (full list in renv.lock)
**Random seeds:** set.seed(20260609); RNGkind("L'Ecuyer-CMRG") for the bootstrap
**Approx. runtime:** [author confirms — e.g. ~12 min, 8 cores]
**Lockfiles in package:** renv.lock, scripts/R/_outputs/sessionInfo.txt[, Dockerfile]

Pre-fill software/package/seed lines from the captured artifacts; leave runtime for the author to confirm.

Output / artifacts

Stack Files written
R renv.lock, scripts/R/_outputs/sessionInfo.txt
Python requirements.txt or environment.yml or uv.lock (matching project tooling)
Stata scripts/stata/_outputs/sessionInfo.txt (version + ado list)
Any (--docker) Dockerfile
Always scripts/<lang>/_outputs/computational_requirements.md (the paste-ready block)

Exit behavior

  • All captures succeeded, verify PASS (or --no-verify): exit 0, requirements block printed.
  • A missing-seed WARNING on a randomized pipeline: exit 0 with the warning surfaced — reproducibility is compromised but the snapshot still wrote.
  • Verify FAIL (lockfile won't resolve): exit 1, so the skill can gate a pre-release /commit. Report the unresolvable package; do not silently "fix" the lockfile.
  • No stack detected in Phase 0: exit 1 with the directories searched.

Cross-references

What this skill does NOT do

  • Re-run your analysis or check your numbers. It captures the environment; /audit-reproducibility verifies the manuscript's numeric claims against the outputs.
  • Package or de-identify data. Lockfiles describe software, not data. Disclosure avoidance, de-identification, and data-availability statements are out of scope — see confidential-data.md.
  • Upgrade or "fix" your dependencies. It records what the code currently uses. If a verify FAIL surfaces a yanked version, you decide whether to pin an alternative.
  • Pin a Stata binary. Stata is licensed and not redistributable; the skill records the exact version/flavor/update so a replicator can match it, but cannot containerize it.
Install via CLI
npx skills add https://github.com/pedrohcgs/claude-code-my-workflow --skill capture-environment
Repository Details
star Stars 1,240
call_split Forks 2,519
navigation Branch main
article Path SKILL.md
More from Creator