name: pyarchivist description: Archive online content into archives/ with automatic index.md updates via pyarchivist tool.
pyarchivist Workflow
Continuous improvement: see
continuous_improvement.mdin this folder for a history of feedback and tips for using pyarchivist.
Use this skill when archiving web content, media, or online documents into the knowledge base.
What pyarchivist does
pyarchivist/ is a git submodule that automatically archives online content to archives/ and updates index.md files with metadata (source URL, timestamp, file hash).
When to use
- Archiving articles, web pages, or media from online sources
- Storing Wikimedia Commons images alongside notes
- Creating permanent backups of time-sensitive online content
- Auto-maintaining
archives/index.mdfiles
Basic workflow
- Use pyarchivist's interface (CLI or Python API) to download and archive content
- Specify target directory (
archives/Wikimedia Commons/for media,archives/sparse/for documents) - pyarchivist auto-generates metadata (timestamp, source URL, content hash)
index.mdentries are auto-created with source and timestamp information- Filenames are generated consistently (hash-based for deduplication or descriptive for media)
Best practices
- Let pyarchivist handle file naming and
index.mdupdates - Use
archives/Wikimedia Commons/for images/media with descriptive names - Use
archives/sparse/for miscellaneous content (hashes for filenames automatically) - Always preserve source URL and timestamp metadata in
index.md - Check that
index.mdwas updated correctly after archiving
Typical command pattern
uv run -m pyarchivist [options] --target <archives/folder> <source_url>
(Exact interface depends on pyarchivist's implementation)
When in doubt
Consult the pyarchivist documentation or ask the user for guidance on specific archiving needs.