name: mosaic description: > Expert knowledge of MOSAIC (Multi-source Scientific Article Indexer and Collector) — a CLI tool for searching, downloading, and managing scientific papers from 21 sources with a single command. Use this skill whenever the user asks about: building a bibliography programmatically, searching for papers across multiple sources, downloading OA PDFs, formatting citation strings (BibTeX/APA/ MLA/Chicago), exporting to BibTeX/Zotero/Obsidian, interpreting mosaic --json output in AI agent or CI workflows, RAG over a paper library, semantic search over a local paper library, finding similar papers, analysing citation networks, comparing papers across structured dimensions, or any task that involves mosaic search/get/cite/similar/ask/chat/index/network/compare/skill commands. When in doubt, trigger this skill — it is better to consult it unnecessarily than to miss it.
MOSAIC Expert Knowledge
MOSAIC fans out paper searches across 21 scientific sources, deduplicates results by DOI, caches them in a local SQLite database, and can download OA PDFs. It provides structured JSON output for AI agent and CI workflows.
CLI Commands
mosaic search "query" # search all enabled sources
mosaic get <doi> # fetch metadata + download PDF by DOI
mosaic cite <doi> # format citation string (BibTeX/APA/MLA/Chicago/…)
mosaic similar <doi|arxiv_id> # find related papers via OpenAlex + Semantic Scholar
mosaic network # explore citation network, identify hubs and clusters
mosaic compare # structured comparison table across cached papers (LLM or metadata)
mosaic index # build/update vector index for RAG
mosaic ask "question" # RAG Q&A over cached papers
mosaic chat # interactive multi-turn RAG session
mosaic config --show # view or edit configuration
mosaic cache list # inspect local SQLite cache
mosaic cache stats # cache statistics
mosaic notebook create "topic" # create a Google NotebookLM notebook
mosaic auth login elsevier # browser session for authenticated PDF access
mosaic skill install # install this Claude Code skill to the current project
mosaic skill install --global # install to ~/.claude/skills/ (available in all projects)
mosaic skill show # print skill content to stdout
JSON Output (scripting / AI agents)
Add --json to search or similar for machine-readable stdout. All rich table output is
suppressed; results are written to stdout as a single JSON object. Papers are still saved to the
local cache so subsequent --cached queries work immediately.
mosaic search "attention mechanism" --max 20 --oa-only --json
mosaic similar 10.48550/arXiv.1706.03762 --max 15 --json
JSON schema — search
{
"status": "ok",
"query": "attention mechanism",
"count": 3,
"papers": [
{
"title": "Attention Is All You Need",
"authors": ["Vaswani, Ashish", "Shazeer, Noam"],
"year": 2017,
"doi": "10.48550/arXiv.1706.03762",
"arxiv_id": "1706.03762",
"pii": null,
"abstract": "The dominant sequence transduction models...",
"journal": null,
"volume": null,
"issue": null,
"pages": null,
"pdf_url": "https://arxiv.org/pdf/1706.03762",
"source": "arxiv",
"is_open_access": true,
"url": "https://arxiv.org/abs/1706.03762",
"citation_count": 50000,
"relevance_score": null,
"uid": "10.48550/arxiv.1706.03762"
}
],
"errors": []
}
status is "ok" (errors are non-fatal warnings from individual sources, not fatal failures).
uid is the deduplication key used by the cache: prefers DOI → arxiv_id → pii → title slug.
Fields are always present; unavailable values are null.
JSON schema — similar
Same as above but with an extra "seed" key:
{
"status": "ok",
"seed": "Attention Is All You Need",
"query": "10.48550/arXiv.1706.03762",
"count": 10,
"papers": [...],
"errors": []
}
Exit code is 0 on success, 1 on fatal failure (bad identifier, no results).
Agent scripting — bash
result=$(mosaic search "transformer architecture" --max 30 --oa-only --json)
count=$(echo "$result" | jq '.count')
dois=$(echo "$result" | jq -r '.papers[].doi | select(. != null)')
pdfs=$(echo "$result" | jq -r '.papers[] | select(.pdf_url != null) | .doi')
echo "Found $count papers, $(echo "$pdfs" | wc -l) with PDF"
Agent scripting — Python
import json, subprocess
def mosaic_json(args: list[str]) -> dict:
r = subprocess.run(["mosaic"] + args, capture_output=True, text=True, check=False)
if r.returncode != 0 and not r.stdout.strip():
raise RuntimeError(f"mosaic failed: {r.stderr}")
return json.loads(r.stdout)
# Search and parse
data = mosaic_json(["search", "FDTD high-order", "--max", "25", "--json"])
papers = data["papers"]
oa_papers = [p for p in papers if p["is_open_access"]]
print(f"Found {data['count']} papers, {len(oa_papers)} open-access")
# Find similar to the most-cited result
top = max(papers, key=lambda p: p["citation_count"] or 0)
if top["doi"]:
related = mosaic_json(["similar", top["doi"], "--max", "10", "--json"])
search Command
mosaic search "query" [OPTIONS]
| Option | Default | Description |
|---|---|---|
--max, -n |
10 | Max results per source |
--source, -s |
(all) | Limit to one source shorthand (see table below) |
--oa-only |
off | Open-access papers only |
--pdf-only |
off | Papers with downloadable PDF only |
--year, -y |
— | Year filter: "2020", "2020-2024", or "2020,2022,2024" |
--author, -a |
— | Author name filter (repeatable) |
--journal, -j |
— | Journal name filter (substring match) |
--field, -f |
all |
Scope query to "title", "abstract", or "all" |
--raw-query |
— | Send query directly to source API, bypass field transforms |
--sort |
— | Sort order: "citations", "year", or "relevance" |
--download, -d |
off | Download available PDFs after search |
--output, -o |
— | Save results to file (.md, .csv, .json, .bib, .ris); repeatable |
--cached |
off | Search only the local cache — no network requests |
--semantic |
off | Search local vector index by meaning (requires mosaic index + embedding model); shows Sim. column |
--downloaded-only |
off | Restrict to papers with a locally downloaded PDF (only with --cached or --semantic) |
--prefer-cache |
off | Prefer richer cached records over freshly fetched data |
--stats |
off | Print per-source counts and deduplication stats |
--zotero |
off | Export results to Zotero |
--zotero-collection |
— | Zotero collection name (created if missing) |
--obsidian |
off | Export results as notes to an Obsidian vault |
--json |
off | Emit structured JSON to stdout (suppresses table output) |
Source Shorthands
| Shorthand | Source | Coverage | Auth |
|---|---|---|---|
arxiv |
arXiv | Physics, CS, Math, Biology | None |
ss |
Semantic Scholar | 214 M papers, all disciplines | Optional key |
sd |
ScienceDirect | Elsevier journals & books | API key or browser |
sp |
Springer (browser) | Springer, Nature (browser) | [browser] extra |
springer |
Springer API | OA Springer/Nature articles | Free API key |
doaj |
DOAJ | 8 M+ fully OA articles | None |
epmc |
Europe PMC | 45 M biomedical papers | None |
oa |
OpenAlex | 250 M+ works | None |
base |
BASE | 300 M+ from 10k+ repos | None |
core |
CORE | 200 M+ OA full-text | Free API key |
ads |
NASA ADS | Astronomy & astrophysics | Free API token |
ieee |
IEEE Xplore | 5 M+ IEEE papers | Free API key |
zenodo |
Zenodo | 3 M+ OA research outputs | None |
crossref |
Crossref | 150 M+ DOI registry | None |
dblp |
DBLP | 6 M+ CS publications | None |
hal |
HAL | 1.5 M+ French academic OA | None |
pubmed |
PubMed | 35 M+ biomedical citations | Optional key |
pmc |
PubMed Central | 5 M+ free full-text biomedical | Optional key |
rxiv |
bioRxiv/medRxiv | Life science preprints | None |
pedro |
PEDro | Physiotherapy evidence | Fair-use ack |
scopus |
Scopus | 90 M+ Elsevier citations | API key or browser |
get Command
mosaic get <doi> # single DOI — fetch metadata + download PDF
mosaic get --from refs.bib # bulk-download from BibTeX file
mosaic get --from library.csv # bulk-download from CSV file (must have 'doi' column)
Options: --oa-only, --download-dir, --zotero, --zotero-collection, --obsidian.
cite Command
Format and print a citation string for a paper by DOI. Checks the local cache first; falls back to Crossref on a cache miss. BibTeX is rendered locally; all other styles use Crossref content negotiation (network required).
mosaic cite <doi> # BibTeX (default) — no network if cached
mosaic cite <doi> --style apa # APA via doi.org content negotiation
mosaic cite <doi> --style mla
mosaic cite <doi> --style chicago
mosaic cite <doi> --style harvard
mosaic cite <doi> --style vancouver
mosaic cite <doi> --style apa --copy # copy to clipboard (pbcopy/xclip/clip fallback)
| Option | Default | Description |
|---|---|---|
--style, -s |
bibtex |
Citation style; tab-completes: bibtex apa mla chicago harvard vancouver |
--copy, -c |
off | Copy result to clipboard |
similar Command
mosaic similar 10.48550/arXiv.1706.03762 # by DOI
mosaic similar arxiv:1706.03762 # by arXiv ID
mosaic similar <doi> --max 20 --sort citations --json
Uses OpenAlex related_works (always) and Semantic Scholar recommendations (when API key is
configured). Options are the same as search minus --source and --year.
Export Formats
| Extension | Format |
|---|---|
.bib |
BibTeX |
.ris |
RIS (Mendeley, Endnote, Reference Manager) |
.csv |
CSV table |
.json |
JSON array of paper objects |
.md / .markdown |
Markdown table |
# Save to multiple formats in one command
mosaic search "deep learning" --output refs.bib --output summary.md
network Command
Explore the local citation graph built by mosaic index --enrich-citations.
mosaic network [OPTIONS]
| Option | Default | Description |
|---|---|---|
--query, -q |
— | Seed graph from cached papers matching this query (BFS subgraph) |
--depth |
2 | Citation hops to follow from seed papers |
--min-connections |
1 | Exclude papers with fewer edges than this |
--cluster |
off | Group papers into topic clusters (Louvain if networkx installed, else connected components) |
--output, -o |
— | Write graph to file: .json (D3/Gephi node-link), .gv (Graphviz DOT), .md (Mermaid) |
--top |
5 | Most-connected papers to show per cluster in terminal output |
Requires citation edges — run mosaic index --enrich-citations first.
Louvain clustering requires networkx: pipx inject mosaic-search networkx.
# Most-connected papers in the full graph
mosaic network --top 10
# Topic subgraph with community clusters
mosaic network --query "transformer attention" --depth 2 --cluster --top 5
# Export for downstream tools
mosaic network --output graph.json # D3.js / Gephi / NetworkX
mosaic network --output graph.gv # Graphviz: dot -Tpng graph.gv -o graph.png
mosaic network --output graph.md # Mermaid diagram for README / Obsidian
# Combine: topic subgraph → cluster report → save Mermaid
mosaic network --query "diffusion models" --cluster --top 5 --output diffusion.md
JSON node-link schema
{
"nodes": [
{
"id": "doi:10.48550/arxiv.1706.03762",
"title": "Attention Is All You Need",
"year": 2017,
"authors": "Vaswani et al.",
"citation_count": 85000,
"cluster": 0
}
],
"links": [
{ "source": "doi:10.48550/...", "target": "doi:10.18653/..." }
]
}
cluster is null when --cluster is not used.
compare Command
Generate a structured comparison table across cached papers. With a configured LLM, extracts dimensions from each paper's title + abstract. Without one, populates only metadata fields and prints a notice — never fails silently.
mosaic compare [OPTIONS]
| Option | Default | Description |
|---|---|---|
--query, -q |
— | Filter papers from cache by title/abstract |
--from |
— | Load papers from a .bib or .csv file |
--max, -n |
20 | Maximum number of papers to compare |
--dimensions |
method,dataset,metric,result |
Comma-separated comparison axes |
--output, -o |
— | Write table to file: .md, .csv, .json |
--sort |
— | Pre-sort papers: citations (most cited first) or year (newest first) |
# Compare top-cited cached papers on a topic (LLM fills in method/dataset/metric/result)
mosaic compare --query "diffusion models" --sort citations -n 15
# Save as Markdown
mosaic compare --query "transformer attention" --output comparison.md
# Custom dimensions from a BibTeX file
mosaic compare --from refs.bib --dimensions "method,dataset,BLEU,limitations"
# Export as CSV for Excel / Google Sheets
mosaic compare --query "GNN" -n 20 --output gnn-comparison.csv
# Export as JSON for scripting
mosaic compare --query "protein folding" --output folding.json
Metadata-only dimensions (no LLM needed): year, source, journal, doi, authors,
citations. All other dimension names require an LLM and return – without one.
LLM setup (same config as RAG):
mosaic config --llm-provider openai --llm-api-key YOUR_KEY
# or Anthropic:
mosaic config --llm-provider anthropic --llm-api-key YOUR_KEY
# or local Ollama:
mosaic config --llm-provider openai --llm-base-url http://localhost:11434/v1 --llm-api-key ollama
RAG Commands
# 1. Build/update the vector index (incremental — already-indexed papers are skipped)
mosaic index
# 2. Semantic search — retrieve by meaning, no LLM needed at query time
mosaic search "methods that learn without labels" --semantic # ranked paper list + Sim. column
mosaic search "attention mechanism" --semantic --downloaded-only # only papers on disk
mosaic search "diffusion model" --semantic -n 20 --sort citations # sort by citations after retrieval
# 3. Single-shot analysis (LLM required)
mosaic ask "What FDTD schemes achieve high-order accuracy in time?" --mode synthesis
mosaic ask "What open problems remain in discontinuous Galerkin methods?" --mode gaps
mosaic ask "Compare DDPM, DDIM, and score SDE" --mode compare --output report.md
mosaic ask "Extract all methods with accuracy claims" --mode extract
# 4. Interactive session
mosaic chat
--semantic: embeds the query and retrieves top-k papers from the vector index. Shows a Sim.
column (0–1). No LLM needed at query time. Requires mosaic index + embedding model.
Modes for mosaic ask: synthesis (state of the art), gaps (open problems), compare
(side-by-side methods), extract (structured per-paper data extraction).
Requires sqlite-vec (pipx inject mosaic-search sqlite-vec) and a configured embedding model
- LLM. See
mosaic config --embedding-model .../--llm-provider ....
Configuration
# View full config (TOML-formatted)
mosaic config --show
# Essential setup
mosaic config --unpaywall-email you@example.com # enables Unpaywall PDF fallback
# API keys
mosaic config --elsevier-key YOUR_KEY # ScienceDirect
mosaic config --ss-key YOUR_KEY # Semantic Scholar
mosaic config --springer-key YOUR_KEY # Springer API
mosaic config --ads-key YOUR_KEY # NASA ADS
mosaic config --ieee-key YOUR_KEY # IEEE Xplore
# LLM (for RAG and relevance ranking)
mosaic config \
--llm-provider openai \
--llm-api-key YOUR_KEY \
--llm-model gpt-4o-mini
# Ollama (local LLM — no data leaves your machine)
mosaic config \
--embedding-model snowflake-arctic-embed2 \
--embedding-base-url http://localhost:11434/v1 \
--embedding-api-key ollama \
--llm-provider openai \
--llm-base-url http://localhost:11434/v1 \
--llm-api-key ollama \
--llm-model llama3.2
# Enable/disable sources
mosaic config --enable-source scopus
mosaic config --disable-source pedro
# Download location
mosaic config --download-dir ~/papers/
Config file: ~/.config/mosaic/config.toml
Cache DB: ~/.local/share/mosaic/cache.db
Default downloads: ~/mosaic-papers/
AI Agent Workflow: Building a Bibliography
import json, subprocess
from pathlib import Path
def mosaic(args: list[str]) -> dict:
r = subprocess.run(["mosaic"] + args, capture_output=True, text=True, check=False)
if r.returncode != 0 and not r.stdout.strip():
raise RuntimeError(r.stderr)
return json.loads(r.stdout)
# --- Step 1: Search multiple related queries ---
all_papers: list[dict] = []
queries = [
"transformer self-attention",
"BERT language model pre-training",
"GPT autoregressive language model",
]
for q in queries:
data = mosaic(["search", q, "--max", "15", "--oa-only", "--json"])
all_papers.extend(data["papers"])
# --- Step 2: Deduplicate by uid (DOI / arXiv ID) ---
seen: set[str] = set()
unique: list[dict] = []
for p in all_papers:
if p["uid"] not in seen:
seen.add(p["uid"])
unique.append(p)
# --- Step 3: Expand with similar papers for the top-cited seed ---
most_cited = max(unique, key=lambda p: p["citation_count"] or 0)
if most_cited.get("doi"):
related = mosaic(["similar", most_cited["doi"], "--max", "10", "--json"])
for p in related["papers"]:
if p["uid"] not in seen:
seen.add(p["uid"])
unique.append(p)
# --- Step 4: Export the cached results to BibTeX ---
# (mosaic cache already has all papers from steps 1-3)
subprocess.run(["mosaic", "search", queries[0], "--cached", "--output", "bibliography.bib"])
# --- Step 5: Download all OA PDFs ---
for p in unique:
if p["pdf_url"] and p.get("doi"):
subprocess.run(["mosaic", "get", p["doi"]])
# --- Step 6: Index, enrich citations, and ask ---
subprocess.run(["mosaic", "index", "--enrich-citations"])
subprocess.run(["mosaic", "ask", "Summarise the evolution of attention mechanisms",
"--mode", "synthesis", "--output", "synthesis.md"])
# --- Step 7: Explore the citation network ---
subprocess.run(["mosaic", "network", "--query", "attention mechanism",
"--cluster", "--top", "5", "--output", "network.md"])
# --- Step 8: Compare methods across top-cited papers ---
subprocess.run(["mosaic", "compare", "--query", "attention mechanism",
"--sort", "citations", "-n", "20", "--output", "comparison.md"])
Zotero Integration
# Push search results to a Zotero collection (Zotero must be running)
mosaic search "deep learning" --max 20 --zotero --zotero-collection "Deep Learning"
# Push + download PDFs
mosaic search "protein folding" --oa-only --download --zotero --zotero-collection "Bioinformatics"
# Bulk-download an existing .bib file and send to Zotero
mosaic get --from refs.bib --zotero --zotero-collection "Imported"
# Web API (no Zotero app needed)
mosaic config --zotero-key YOUR_WEB_API_KEY
mosaic search "FDTD" --zotero
Obsidian Integration
mosaic config --obsidian-vault ~/Notes
mosaic search "quantum computing" --obsidian --obsidian-folder "Papers/Quantum"
Each note gets YAML frontmatter, >[!abstract] callout, metadata table, and [[wikilinks]].
Skill Installation
# Install to current project's .claude/skills/mosaic/ — enables /mosaic in this project
mosaic skill install
# Install globally to ~/.claude/skills/mosaic/ — enables /mosaic in all projects
mosaic skill install --global
# Inspect the bundled skill content
mosaic skill show
After installation, restart Claude Code or open a new session. The /mosaic slash command will
be available in that project's Claude Code context.