name: asta-documents description: Local document metadata index for scientific documents
Asta Documents Management
Use this skill when the user asks to store a document "in Asta" or retrieve "from Asta". Use it when the
user references an "Asta document" or anything with an asta:// URI.
This skill provides complete document management functionality for tracking research papers, documentation, and resources using the asta-documents CLI.
What it does: Track document metadata (URLs, summaries, tags) in a local index. Think of it as a smart bookmark manager with powerful search capabilities.
Installation
1. Install the CLI Tool
# Install globally using uv
uv tool install git+https://github.com/allenai/asta-resource-repo.git
Prerequisites: Python 3.10+ and uv package manager
Verify installation with asta-documents --help
Quick Command Reference
Add --json flag to any command for machine-readable output.
# List documents
asta-documents list
asta-documents list --tags="ai,research"
# Search document summaries
asta-documents search "query"
# Search by specific field
asta-documents search "title words" --name
asta-documents search "ai,nlp" --tags
asta-documents search ".year > 2020" --extra
# Add document
asta-documents add <url> --name="Title" --summary="Description" --tags="tag1,tag2" --extra='{"author": "Smith et al", "year": 2024, "venue": "NeurIPS"}'
# Get document metadata
asta-documents get <uuid>
# Update document
asta-documents update <uuid> --name="New Title" --tags="new,tags"
# Fetch document content
asta-documents fetch <uuid> -o /tmp/document.pdf
# Manage tags
asta-documents add-tags <uuid> --tags="new,tags"
asta-documents remove-tags <uuid> --tags="old,tags"
# Cache management
asta-documents cache list
asta-documents cache stats
asta-documents cache clean --days 7
# Summary information (document counts)
asta-documents show
Always use the command line interface for all operations to ensure proper index management and caching. Avoid direct read/write operations on the index file.
Fetch Document Content
The index stores metadata only. The content of a document is retrievable via its URL. The fetch command retrieves the content and caches it locally for future use.
Fetch to file (with automatic caching):
asta-documents fetch <uuid> -o /tmp/document.pdf
Cache Management
List cached items:
asta-documents cache list
Show cache statistics:
asta-documents cache stats
Clean old cache entries:
# Remove items older than N days
asta-documents cache clean --days 14
Clear entire cache:
asta-documents cache clear
asta-documents cache clear -y # Skip confirmation
Show specific item details:
asta-documents cache info <hash>
Common Workflows
Workflow 1: Add and Organize Papers
# Add research paper
asta-documents add https://arxiv.org/pdf/1706.03762.pdf \
--name="Attention Is All You Need" \
--summary="Seminal paper introducing Transformer architecture" \
--tags="ai,research,nlp,transformers" \
--mime-type="application/pdf" \
--extra='{"author": "Vaswani et al", "year": 2017, "venue": "NeurIPS"}'
# Search papers by tag
asta-documents search "transformers" --tags
Workflow 2: Search and Fetch
# Search for relevant documents
asta-documents search "transformer architecture" --show-scores
# Get metadata for top result (using UUID from search results)
asta-documents get 6MNxGbWGRC
# Fetch content
asta-documents fetch 6MNxGbWGRC -o /tmp/paper.pdf -q
# Read with PDF support
# Read(/tmp/paper.pdf)
Workflow 3: Search with JSON Processing
# Search and extract UUIDs
RESULTS=$(asta-documents search "query" --json)
# Get first UUID (example with Python)
UUID=$(echo "$RESULTS" | python3 -c "import sys,json; results=json.load(sys.stdin); print(results[0]['result']['uuid'] if results else '')")
# Fetch that document
asta-documents fetch "$UUID" -o result.pdf
Workflow 4: Bulk Tag Management
# List documents with old tag
DOCS=$(asta-documents list --tags="old-tag" --json)
# For each, remove old tag and add new
for uuid in $(echo "$DOCS" | python3 -c "import sys,json; print('\\n'.join([d['uuid'] for d in json.load(sys.stdin)]))"); do
asta-documents remove-tags "$uuid" --tags="old-tag"
asta-documents add-tags "$uuid" --tags="new-tag"
done
Workflow 5: Update Multiple Fields
# Get current metadata (using UUID)
asta-documents get 6MNxGbWGRC
# Update multiple fields
asta-documents update 6MNxGbWGRC \
--name="Updated Title" \
--summary="Updated summary with more details" \
--tags="updated,revised,2025"
Workflow 6: Cache Maintenance
# Check cache usage
asta-documents cache stats
# List what's cached
asta-documents cache list
# Remove old entries if cache is large
asta-documents cache clean --days 7
# Verify cache reduction
asta-documents cache stats
Field-Specific Search
Asta uses different search strategies optimized for each document field:
--name (Name search):
- Simple case-insensitive word matching
- Splits query into words, matches any word in name
- Score = (matched words / total query words)
- Fast, no indexing needed
- Example:
asta-documents search "Attention" --name
--tags (Tag search):
- Comma-separated tag matching
- Case-insensitive
- Score = (matched tags / total query tags)
- Finds documents with any matching tags
- Example:
asta-documents search "ai,nlp" --tags
--summary (Summary search, default):
- Uses best available method automatically:
- Hybrid (BM25 + semantic embeddings) → best quality
- BM25 (keyword relevance ranking) → fast indexed
- FTS5 (full-text search) → fallback
- Simple (substring matching) → always available
- Optimized for natural language queries
- Understands semantic meaning
- Example:
asta-documents search "papers about transformers"
--extra (Extra metadata search):
- JSONPath-like query syntax
- Supported operators:
>,>=,<,<=,==,contains - Numeric and string comparisons
- Examples:
asta-documents search ".year > 2020" --extraasta-documents search ".author contains Smith" --extraasta-documents search ".venue == NeurIPS" --extra
Output Formats
Human-readable (default):
- Formatted tables and lists
- Color-coded (if terminal supports)
- Progress messages
JSON (--json flag):
- Machine-readable
- All fields included
- For scripting and integration
Verbose (-v flag for list):
- Shows all metadata fields
- Includes extra metadata
- Full URIs and timestamps
Best Practices
- Use descriptive summaries: They're indexed for search
- Tag consistently: Establish a tagging scheme
- Use extra metadata: Store author, year, venue for papers
- Let fetch handle caching: Don't manually check cache
- Use JSON for scripting: More reliable than parsing text
- Use quiet mode in scripts:
-qsuppresses progress messages
Troubleshooting
"asta-documents: command not found"
- Verify installation:
uv tool list | grep asta - Add to PATH:
export PATH="$HOME/.local/bin:$PATH" - Reinstall:
uv tool install --reinstall git+https://github.com/allenai/asta-resource-repo.git
"Document not found"
- Verify URI:
asta-documents list --json | grep <partial-uri> - Check namespace: URIs are namespace-specific
- Ensure there is an index file at
.asta/documents/index.yaml
"Fetch failed"
- Check URL is accessible:
curl -I <url> - Try force refresh:
--force - Check network connection
"Search returns no results"
- Try simpler query terms
- Search by name or tags for exact matching:
asta-documents search "keyword" --nameasta-documents search "tag" --tags
- Check if documents exist:
asta-documents list
"Cache is large"
- Check size:
asta-documents cache stats - Clean old entries:
asta-documents cache clean --days 7 - Clear if needed:
asta-documents cache clear -y
Updating
Update the CLI tool:
uv tool install --reinstall git+https://github.com/allenai/asta-resource-repo.git
Update the skill:
curl -o ~/.claude/skills/asta-documents.md https://raw.githubusercontent.com/allenai/asta-resource-repo/main/skills/asta-documents.md