data-management-plan

name: data-management-plan description: Draft a funder-compliant Data Management Plan (NSF DMP, NIH DMS Policy 2023, ERC, Horizon Europe) by composing the confidential-data and environment-capture primitives. Sections cover data description, formats/metadata, storage/backup, access/sharing, preservation/archiving, and roles. Use when user says "data management plan", "DMP", "DMSP", "NIH data sharing plan", "write the data plan for my grant", or when a grant proposal needs a data-management section. NOT a submission tool — produces a draft the user pastes into the funder portal (DMPTool, NIH ASSIST, Horizon Europe portal). argument-hint: "[--funder nsf|nih|erc|horizon] [--input ] [--no-verify]" disable-model-invocation: true allowed-tools: ["Read", "Grep", "Glob", "Write", "Task"] effort: medium

`/data-management-plan` — Funder-Compliant DMP Generator

Produce a Data Management Plan ready to paste into a funder portal. This skill writes the prose and structure; it does not submit anywhere. It is a composition skill — it folds the disclosure-avoidance / IRB rules from .claude/rules/confidential-data.md and the environment + replication-package plan from /capture-environment and /replication-package into a single funder-shaped document.

When to use

Writing a grant proposal. Every NSF, NIH, ERC, and Horizon Europe proposal needs a DMP (NSF), DMS Plan (NIH 2023 policy), or Data Management Plan (ERC/Horizon). /grant-proposal calls this skill for that section.
Before data collection on a funded project. The plan is a commitment you make at award time and report against at renewal.
When restricted or human-subjects data is involved. The access/sharing and preservation sections change materially — see Phase 2.

When NOT to use

For a clinical-trial data-sharing statement governed by ICMJE / ClinicalTrials.gov — use the trial sponsor's template.
As a substitute for IRB protocol text — the DMP references IRB constraints; it is not the protocol itself.

Inputs

$0 --funder nsf|nih|erc|horizon — target funder profile. If omitted, Phase 0 detects it from --input or asks once.
--input <path> — a research spec (/interview-me output under quality_reports/specs/), a grant draft, or a passport-adjacent description. The skill extracts data types, sample, and identification strategy from it.
--no-verify — skip the Phase 4 citation/standard post-flight (inherited from /preregister).

Workflow

Phase 0 — Detect funder + data sensitivity

Resolve the funder (--funder, else infer from --input, else ask once). Load its section schema:

Funder	Plan name	Required sections (abridged)
NSF	Data Management Plan (2 pp max)	data types · standards · access/sharing · re-use/redistribution · archiving
NIH	DMS Plan (2023 policy)	data type · tools/software · standards · preservation/access/timelines · access/distribution + reuse · oversight
ERC	DMP (Horizon Europe Annex)	FAIR per dataset · data summary · making data FAIR · resource allocation · security · ethics
Horizon Europe	DMP (DMP template)	same FAIR-first structure as ERC; open by default, "as open as possible, as closed as necessary"

Classify the data on three axes (drives Phases 2–3):
- Public (open survey, scraped public records, simulated) — minimal restrictions.
- Restricted (admin/tax/Census, proprietary, licensed under DUA) — access procedures dominate.
- Human-subjects (PII, biospecimen-linked, survey with identifiers) — IRB + disclosure avoidance dominate.
If the data is restricted or human-subjects, set sensitive = true and run Phase 2. If it is purely public, Phase 2 is a short paragraph.

Phase 1 — Scaffold sections from the funder profile

Generate the six house sections, mapped onto the funder's required headings:

Data description & types — what data, source, volume, formats produced. Be specific: panel/admin microdata, RCT outcomes, event-study event files, replication intermediate .rds/.dta/.parquet.
Formats & metadata standards — open/non-proprietary formats where possible (.csv/.parquet over .dta; codebooks; DDI / Dublin Core / domain schema). Name the standard, don't say "appropriate metadata".
Storage & backup — during the project: encrypted institutional storage, 3-2-1 backup, version control for code (not raw restricted data in git).
Access & sharing — who can access, when, under what terms. For restricted data this is the restricted-data access procedure (see Phase 2).
Preservation & archiving — a named repository with a persistent identifier (see Phase 3).
Roles & responsibilities — PI as data steward, data manager, institutional support, succession plan.

For any required field the input does not supply, write [CLARIFY: <specific question>] rather than fabricating — same convention as /preregister.

Phase 2 — Fold in disclosure-avoidance + IRB constraints (only if `sensitive = true`)

Pull the relevant rules from .claude/rules/confidential-data.md and weave them into the access & sharing and preservation sections:

Restricted data → describe the access path, not the data. State the data provider, the DUA/restricted-use agreement, and how a replicator obtains access (e.g., FSRDC application, openICPSR restricted-access tier, provider application). The data itself is not deposited; the path to it is.
Human-subjects → IRB + minimization. Reference the IRB protocol number (or [CLARIFY:]), the consent terms governing sharing, and the de-identification plan. Shared outputs are de-identified per the consent.
Disclosure avoidance for any released microdata or tables. Name the technique: suppression of small cells (n < threshold), rounding, top-coding, noise infusion, or aggregation. For tabular output, state the minimum cell-count rule. Defer the actual pre-release scan to /disclosure-check, and say so in the plan ("released outputs pass /disclosure-check before deposit").

Phase 3 — Fold in the computational-environment + replication-package plan

The DMP should commit to reproducibility, not just data deposit:

Environment capture. State that the computational environment will be captured (R sessionInfo() / renv.lock, Stata version + .do ado dependencies, Python requirements.txt / container). Point to /capture-environment as the mechanism. AEA Data Editor / DCAS standards expect this.
Replication package. Commit to depositing a replication package (code + non-restricted data + a master run script + README) in a trusted repository. Point to /replication-package as the builder.
Repository choice — match the data class:
- Economics / social science → openICPSR (AEA's home; DCAS-compliant) or Harvard Dataverse.
- Restricted data → openICPSR restricted-access tier or the provider's enclave (FSRDC); deposit code + metadata, not the microdata.
- Domain repos → field-specific (e.g., ICPSR proper, GenBank, Zenodo for code) where the funder or community expects them.
State the persistent identifier (DOI) and the timeline (e.g., "at publication" or "within 12 months of project end" — NIH expects no later than publication or award end).

Phase 4 — Post-flight (skip with `--no-verify`)

If the draft cites a funder policy or standard by name/number (e.g., "per NIH NOT-OD-21-013", "DCAS v1"), invoke /verify-claims via Task to confirm the policy citation resolves. Forked claim-verifier never sees the draft. Surface any FAIL/PARTIAL.

Phase 5 — Output

Write the draft to quality_reports/dmp/YYYY-MM-DD_<funder>_<slug>.md and a funder checklist alongside it.

✓ DMP draft saved: quality_reports/dmp/<file>.md
  Funder: <nsf|nih|erc|horizon>   Data class: <public|restricted|human-subjects>
  Sections: <count> total — <complete> complete, <clarify> with [CLARIFY:] placeholders
  Disclosure/IRB folded in: <yes (Phase 2) | n/a — public data>
  Repository: <openICPSR | Dataverse | domain repo>   PID: <DOI planned | [CLARIFY:]>
  Policy citations verified: <PASS>/<PARTIAL>/<FAIL>  (or "none to verify")
  Next: resolve [CLARIFY:] items, then paste into <DMPTool | NIH ASSIST | Horizon portal>

The funder checklist is a table: each required section → present? → complete / [CLARIFY:], so the user sees at a glance whether the plan will pass the funder's compliance check.

Exit behavior

All required sections present, zero [CLARIFY:] → "DMP READY", checklist all green.
Any required section unresolved → "INCOMPLETE — N MUST items unresolved", listed in the checklist. The draft is still written (so the user can fill it in), but not marked ready.
This skill does not block anything — it produces a document. The gate is the funder's, not ours.

Cross-references

.claude/rules/confidential-data.md — restricted-data / IRB / disclosure-avoidance rules folded in at Phase 2.
.claude/skills/disclosure-check/SKILL.md — pre-release disclosure scan the plan commits released outputs to.
.claude/skills/capture-environment/SKILL.md — the environment-capture mechanism Phase 3 references.
.claude/skills/replication-package/SKILL.md — the replication-package builder Phase 3 commits to.
.claude/skills/grant-proposal/SKILL.md — calls this skill for the proposal's data-management section.
.claude/skills/preregister/SKILL.md — sibling document-generator; shares the MUST/[CLARIFY:] + post-flight conventions.
.claude/rules/replication-protocol.md — the reproducibility contract the deposited package must satisfy.

What this skill does NOT do

Submit the plan. It writes a Markdown draft; the user pastes it into DMPTool / NIH ASSIST / the Horizon portal.
Run the disclosure scan or build the package. It commits the project to /disclosure-check, /capture-environment, and /replication-package, and references them — it does not execute them.
Write the IRB protocol. It references the protocol number and consent terms; the protocol is authored separately.
Choose a repository for you when the funder mandates one. If NIH names a domain repository for your data type, that mandate wins over the defaults in Phase 3 — the skill flags it as [CLARIFY:] rather than guessing.