name: dataclaw description: > Export Claude Code, Codex, and other coding-agent conversation history to Hugging Face. Use when the user asks about exporting conversations, uploading to Hugging Face, configuring DataClaw, reviewing PII/secrets in exports, or managing their dataset. allowed-tools: Bash(dataclaw *), Bash(hf auth login ), Bash(pip install dataclaw), Bash(grep *)
DataClaw Skill
THE RULE
Every dataclaw command outputs next_steps. FOLLOW THEM.
Do not memorize the flow. Do not skip steps. Do not improvise.
Run the command -> read the output -> follow next_steps. That's it.
Runtime guidance follows this checklist:
- Install
- Install skill
- Prep 3A. Choose source scope 3B. Choose project scope 3C. Set redacted strings
- Export locally
- Review and confirm
- Publish
The CLI tracks your stage as 1-4: auth -> configure -> review -> done.
dataclaw export (push) is gated - you must run dataclaw confirm first or it will refuse.
Getting Started
Run dataclaw status (or dataclaw prep for full details) and follow the next_steps.
Output Format
dataclaw prep,dataclaw config,dataclaw status, anddataclaw confirmoutput pure JSONdataclaw exportoutputs human-readable text followed by---DATACLAW_JSON---and a JSON block- Always parse the JSON and act on
next_steps
Key fields:
stage/stage_number/total_stages- where you arenext_steps- follow these in ordernext_command- the single most important command to run next (null if user input needed first)
PII Audit (Stage 5)
After dataclaw export --no-push, follow the next_steps in the JSON output. The flow is:
- Ask the user their full name - then grep the export for it
- Run the pii_commands from the JSON output and review results with the user
- Ask the user what else to look for - company names, client names, private URLs, other people's names, custom domains
- Deep manual scan - sample ~20 sessions (beginning, middle, end) and look for anything sensitive the regex missed
- Fix and re-export if anything found:
dataclaw config --redact "string"thendataclaw export --no-push - Run
dataclaw confirmwith text attestations - pass--full-name,--attest-full-name,--attest-sensitive, and--attest-manual-scan. It runs PII scan, verifies attestations, shows project breakdown, and unlocks pushing. - Push only after explicit user confirmation:
dataclaw export --publish-attestation "User explicitly approved publishing to Hugging Face."
Commands Reference
dataclaw status # Show current stage and next steps
dataclaw prep # Discover projects, check HF auth
dataclaw prep --source all # Prep with all sources explicitly selected
dataclaw prep --source claude # Prep using one supported source key (for example claude)
dataclaw confirm --full-name "NAME" --attest-full-name "..." --attest-sensitive "..." --attest-manual-scan "..." # Scan PII, verify attestations, unlock pushing
dataclaw confirm --file /path/to/file.jsonl --full-name "NAME" --attest-full-name "..." --attest-sensitive "..." --attest-manual-scan "..." # Confirm a specific export file
dataclaw list # List all projects with exclusion status
dataclaw list --source all # List all sources
dataclaw list --source claude # List projects for one supported source key
dataclaw config # Show current config
dataclaw config --repo user/my-dataset # Set HF repo
dataclaw config --source all # REQUIRED source scope selection (<source|all>; for example claude or codex)
dataclaw config --exclude "a,b" # Add excluded projects (appends)
dataclaw config --redact "str1,str2" # Add strings to redact (appends)
dataclaw config --redact-usernames "u1,u2" # Add usernames to anonymize (appends)
dataclaw config --confirm-projects # Mark project selection as confirmed
dataclaw export --publish-attestation "..." # Export and push (requires dataclaw confirm first)
dataclaw export --no-push # Export locally only
dataclaw export --source all --no-push # Export all configured sources locally
dataclaw export --source claude --no-push # Export one supported source scope locally
dataclaw export --all-projects # Include everything (ignore exclusions)
dataclaw export --no-thinking # Exclude extended thinking blocks
dataclaw export -o /path/to/file.jsonl # Custom output path
dataclaw update-skill claude # Install/update the dataclaw skill for Claude Code
Gotchas
- Never run bare
hf auth login- it's interactive and will hang. Always use--token. --exclude,--redact,--redact-usernamesAPPEND - they never overwrite. Safe to call repeatedly.- Source selection is REQUIRED before export - explicitly set
dataclaw config --source <source|all>(for exampleclaudeorcodex), or pass--source ...on export. dataclaw prepoutputs pure JSON - parse it directly.- Always export with
--no-pushfirst - review before publishing. dataclaw export(push) reuses the exact file reviewed bydataclaw confirm- if that file is missing, re-export locally and re-confirm before pushing.dataclaw export(push) requiresdataclaw confirmfirst - it will refuse otherwise. Re-exporting with--no-pushresets this.- PII audit is critical - automated redaction is not foolproof.
- Large exports take time - 500+ sessions may take 1-3 minutes. Use a generous timeout.
Prerequisite
command -v dataclaw >/dev/null 2>&1 && echo "dataclaw: installed" || echo "NOT INSTALLED - run: pip install dataclaw"