name: chem-paper-search
description: Searches Semantic Scholar and the open web for chemistry / chemical engineering / materials science papers given a topic and keyword sets. Returns a triaged table with title, first author, year, citation count, PDF availability, and one-line summary. Used as a sub-skill by paper-mentor during Phase 2; can also be invoked directly. Triggers on phrases like "find papers on", "幫我找文獻", "search Semantic Scholar", "literature on X".
version: "0.1.0"
chem-paper-search — Chemistry / Materials Paper Discovery
What this does
Given a research topic + 3-5 keyword sets, this skill returns a curated list of 10-20 papers, prioritizing those with open-access PDFs.
This is a sub-skill called by paper-mentor Phase 2. When invoked directly, it produces the same output as the literature search step.
Inputs (when called)
topic: "narrow bandgap Sn-Pb perovskite stability"
keyword_sets:
- english_broad: "Sn-Pb perovskite stability"
- english_narrow: "GASCN additive Sn-Pb perovskite crystallization"
- chinese_optional: "錫鉛鈣鈦礦 穩定性"
target_count: 15 # default
year_min: 2020 # default (last 5 years from current)
Workflow
Step 1: Semantic Scholar search (primary)
Use WebFetch to call Semantic Scholar API:
https://api.semanticscholar.org/graph/v1/paper/search?query={keywords}
&fields=title,authors,year,abstract,citationCount,openAccessPdf,tldr
&limit={target_count}
&sort=citationCount
For each keyword set, run the API call and merge results. Deduplicate by title.
Step 2: WebSearch fallback
If Semantic Scholar API rate-limits (429) or returns <5 results:
First: retry with exponential backoff (10s, 30s, 60s) — most 429s clear within a minute.
If still failing, fall back to WebSearch:
WebSearch: site:semanticscholar.org {topic} {year_min}..{current_year}
Then WebFetch the Semantic Scholar pages to extract metadata.
⚠️ Known limitation: WebFetch-scraped metadata loses structured fields (citationCount, openAccessPdf, tldr). When fields are missing:
- Mark
cites = unknown,pdf = unknown,summary = unknown - Do NOT auto-filter these papers — show them all and let the user / orchestrator decide
- Note in output: "⚠️ Step 3 filter ran with reduced metadata; manual review recommended"
Step 3: Filter
Apply filters in order:
- ✅ Has openAccessPdf → PRIORITY
- ✅ citationCount > 20 (or > 5 for papers <2 years old)
- ✅ year >= year_min
- ❌ Predatory journal flag (use Beall's list — check journal name)
- ❌ Off-topic (manual review of abstract)
Step 4: Triage
Mark each paper as:
- 🔴 Must-read (必讀): Direct competitor, foundational, or very high citations
- 🟡 Reference (參考): Useful method or comparison data
- ⚪ Skippable (可略): Low rigor or off-topic borderline
Step 5: Output
# Literature Search Results — [topic]
**Date**: 2026-MM-DD
**Searches run**: [list keyword sets]
**Total found**: N (after dedup + filter)
## Triaged table
| # | Tag | Title | First author | Year | Cites | PDF | Summary |
|---|-----|-------|--------------|------|-------|-----|---------|
| 1 | 🔴 | ... | ... | 2024 | 142 | ✅ | One-line tldr from Semantic Scholar |
| ... |
## Recommended reading order
1. [#1] — start here because [reason]
2. [#3] — read for method comparison
3. [#7] — read for competing mechanism interpretation
## Searches that returned nothing (if any)
Document any keyword sets that returned 0 results — this is **valuable evidence of a gap** for the gap report.
Backup search engines
If Semantic Scholar is down or insufficient:
| Engine | URL pattern | When to use |
|---|---|---|
| Elicit | https://elicit.com/search?q={query} | When you need AI-extracted research questions |
| Connected Papers | https://www.connectedpapers.com/search?q={query} | When you want a citation graph view |
| Google Scholar | (manual via WebSearch) | Last resort, low metadata quality |
Hard rules
- Never invent papers. If 0 results, say so explicitly.
- Always show searches that returned nothing — these are evidence of gaps.
- Always verify citation count with a second source if it seems high (>500).
- Filter out predatory journals — names ending in "International Journal of [Field] Research" with sketchy publishers should be excluded.
- Quote exact API responses when uncertain — do not paraphrase metadata.
Common issues
| Issue | Fix |
|---|---|
| Semantic Scholar returns 429 | Wait 10s, retry. If persists, switch to WebSearch fallback. |
| All papers are too old | Drop year_min by 2 years and retry; if still old, the field is dormant. |
| No open PDFs | Switch search to include arXiv preprints (they have open PDF). |
| Topic is too narrow | Loosen by removing 1-2 keywords; the field may not have studied it. |
| Topic is too broad | Tighten by adding specific technique or material; expect 100+ raw results. |
Output downstream
This output feeds into:
- paper-mentor Phase 2 Step 3: passes the table to
chem-nlm-helperfor NotebookLM ingestion - Direct invocation: returned to user as-is for manual reading