name: literature-search description: "Comprehensive multi-database scientific literature search orchestrating Semantic Scholar, OpenAlex, arXiv, PubMed, and CrossRef. Use when: (1) systematic literature review, (2) finding all relevant papers on a topic, (3) checking state of the art, (4) building comprehensive bibliographies. NOT for: single-database queries (use specific search skills), data analysis (use code-execution)." metadata: { "openclaw": { "emoji": "๐" } }
Literature Search (Meta Skill)
Orchestrate comprehensive literature searches across multiple databases. Always execute real API calls โ never fabricate results or rely on training data.
Priority Order of Databases
- Semantic Scholar (PRIMARY) โ best relevance ranking, AI TLDR summaries, citation graph
- OpenAlex (PRIMARY) โ 250M+ works, powerful filtering, open access URLs
- arXiv โ preprints in physics, math, CS, biology, finance, statistics
- PubMed โ biomedical and life sciences (NCBI may be unreachable from some networks)
- CrossRef โ DOI resolution and metadata only (NOT for search โ poor relevance ranking)
IMPORTANT: CrossRef search results are poorly ranked by relevance. Never use CrossRef as the primary search engine. Use it only for DOI-based lookups and metadata enrichment.
Mandatory Search Protocol
Every literature search MUST follow this protocol:
Step 1: Semantic Scholar Search (always do this first)
# Primary search โ returns papers ranked by relevance with AI summaries
curl -s "https://api.semanticscholar.org/graph/v1/paper/search?\
query=YOUR+SEARCH+TERMS&limit=10&\
fields=title,authors,year,abstract,citationCount,influentialCitationCount,\
isOpenAccess,openAccessPdf,url,externalIds,tldr,venue,publicationDate"
Parse results with:
| python3 -c "
import sys, json
data = json.load(sys.stdin)
print(f'Total: {data[\"total\"]} papers')
for i, p in enumerate(data['data']):
authors = ', '.join(a['name'] for a in (p.get('authors') or [])[:3])
if len(p.get('authors') or []) > 3: authors += ' et al.'
tldr = p.get('tldr', {})
tldr_text = tldr['text'][:150] if tldr else 'N/A'
oa = '๐' if p.get('isOpenAccess') else '๐'
doi = (p.get('externalIds') or {}).get('DOI', '')
print(f'[{i+1}] {p[\"title\"]}')
print(f' {authors} ({p.get(\"year\",\"?\")}) โ {p.get(\"venue\",\"?\")}')
print(f' Cited: {p.get(\"citationCount\",0)} (influential: {p.get(\"influentialCitationCount\",0)}) {oa}')
print(f' TLDR: {tldr_text}')
print(f' DOI: {doi}')
print()
"
Useful filters:
year=2022-2025โ restrict by year rangefieldsOfStudy=Computer Scienceโ filter by disciplineminCitationCount=10โ only cited papers
Step 2: OpenAlex Search (for broader coverage + OA links)
# Complementary search with powerful filtering
curl -s "https://api.openalex.org/works?\
search=YOUR+SEARCH+TERMS&per_page=10&\
sort=relevance_score:desc&\
select=title,publication_year,cited_by_count,doi,authorships,open_access,\
primary_location,abstract_inverted_index&\
mailto=scienceclaw@openclaw.ai"
Useful filters (append to URL as &filter=):
publication_year:2023-2025โ year rangecited_by_count:>50โ minimum citationsopen_access.is_oa:trueโ only open accessauthorships.author.id:A5023888391โ by author OpenAlex IDconcepts.id:C41008148โ by concept (e.g., Computer Science)
Step 3: Discipline-Specific Database (if relevant)
| Discipline | Additional Database | Skill |
|---|---|---|
| Biomedicine / Clinical | PubMed | pubmed-search |
| Physics / CS / Math | arXiv | arxiv-search |
| Computer Science | DBLP | dblp-search |
| Economics / Social Sci | SSRN/RePEc | ssrn-econpapers |
Step 4: Deduplication and Ranking
Match across databases by DOI (most reliable), then normalized title. Rank by: Semantic Scholar relevance > citation count > influential citations > recency.
Step 5: Citation Chaining (for thorough searches)
For top 3-5 seed papers, retrieve their references and citations:
# Forward citations (who cites this paper)
curl -s "https://api.semanticscholar.org/graph/v1/paper/{paperId}/citations?\
fields=title,year,citationCount,venue&limit=20"
# Backward references (what this paper cites)
curl -s "https://api.semanticscholar.org/graph/v1/paper/{paperId}/references?\
fields=title,year,citationCount,venue&limit=20"
Step 6: Paper Recommendations (for discovery)
# Find similar papers
curl -s "https://api.semanticscholar.org/recommendations/v1/papers/\
forpaper/{paperId}?fields=title,year,citationCount,tldr&limit=10"
Search Quality Checklist
Before presenting results, verify:
- At least Semantic Scholar was searched with a real API call
- Results contain real DOIs/paper IDs (not fabricated)
- Citation counts are from the API (not estimated)
- Each paper has a verifiable identifier (DOI, arXiv ID, PMID, or S2 URL)
- TLDR summaries are from Semantic Scholar (not self-generated)
Output Format
[1] Title
Authors (Year) โ Venue
Cited: N (influential: M) ๐/๐
TLDR: AI-generated summary from Semantic Scholar
DOI: 10.xxxx/xxxxx | arXiv: xxxx.xxxxx | PMID: xxxxxxxx
URL: https://...
Zero-Hallucination Rule (ABSOLUTE)
Every citation detail must come from a tool result in this conversation.
- NEVER fabricate or "fill in" paper titles, authors, DOIs, PMIDs, citation counts, or journal names from training data
- NEVER say "a well-known study by X et al." without having searched for it first
- If a search returns 0 results, report that honestly โ do not substitute training knowledge
- If a tool returns partial metadata (title but no DOI), report only what the tool returned
- Before presenting any paper, verify: Did a tool in THIS conversation return this information?
Common Pitfalls to Avoid
- DO NOT use CrossRef
/works?query=for discovery โ its relevance ranking is poor - DO NOT fabricate paper titles, authors, or DOIs from training knowledge
- DO NOT skip API calls and rely on what you "know" about the literature
- DO NOT present Semantic Scholar TLDRs as your own analysis
- ALWAYS run the actual curl commands and parse real responses
- ALWAYS include at least one verifiable identifier per paper
- ALWAYS self-check: every detail in your response must trace back to a tool result
Rate Limits
| Database | Without Key | With Key |
|---|---|---|
| Semantic Scholar | 100 req/5 min | 1/sec sustained |
| OpenAlex | 10 req/sec (polite pool with mailto) | Same |
| arXiv | ~1 req/3 sec | Same |
| CrossRef | 1 req/sec | 50 req/sec (with mailto) |