name: save-to-academic-notion description: Save a URL/arXiv link into Selina's Notion Academic database as a new entry if missing, and populate metadata (title, authors, abstract). Supports arXiv papers (auto-fetch), non-arXiv URLs (web scraping fallback), and manual mode for agent-assisted saves. Use when the user says "save to notion", "add this paper", "save this arxiv", "put this link in my Academic database", or asks to create a new paper entry from a URL.
Save to Academic Notion (Selina)
Quick Usage (Automated)
Preferred method - use the Python script:
cd ~/.openclaw/workspace/skills/save-to-academic-notion
./save_paper.py <arxiv_id_or_url>
Examples:
# arXiv papers (auto-fetch metadata)
./save_paper.py 2501.12345
./save_paper.py https://arxiv.org/abs/2501.12345
./save_paper.py https://arxiv.org/pdf/2501.12345.pdf
# Non-arXiv URLs (web scraping fallback)
./save_paper.py https://openreview.net/forum?id=abc123
# Manual mode (when auto-extraction fails)
./save_paper.py --manual \
--title "Paper Title" \
--url "https://example.com/paper" \
--authors "Alice; Bob; Charlie" \
--abstract "This paper presents..."
# JSON output (for automation)
./save_paper.py 2501.12345 --json
What it does:
- Auto-fetches metadata from arXiv API (title, authors, abstract, date)
- For non-arXiv URLs: attempts web scraping fallback
- Checks for duplicates (via Link property)
- Creates page with proper properties and abstract as quote block
- Uses "Authors txt" field (semicolon-separated text, not multi-select)
- Returns Notion page URL
Output:
Fetching metadata for 2501.12345...
Creating Notion page for: Paper Title...
Authors: Author One, Author Two, ...
✓ Created: https://www.notion.so/...
If paper already exists:
✓ Paper already exists: https://www.notion.so/...
JSON output:
{
"ok": true,
"action": "created", // or "exists"
"page_id": "...",
"page_url": "https://www.notion.so/...",
"source": "arxiv" // or "web_fetch" or "manual"
}
Error handling (non-arXiv URLs): If web scraping fails, returns structured error with suggestion:
{
"ok": false,
"error": "Could not extract metadata from URL",
"suggestion": "Use browser tools to extract metadata, then use --manual mode"
}
Non-arXiv Paper Fallback
arXiv papers: Metadata auto-fetched from arXiv API (always works)
Non-arXiv URLs: Attempts web scraping via openclaw web_fetch
- Success: Creates page with extracted metadata (may be incomplete)
- Failure: Returns error with suggestion to use
--manualmode
Manual mode workflow (for agent):
- Try auto-save first:
./save_paper.py <url> --json - If
ok: false:- Use browser tools to extract title, authors, abstract
- Save with
--manualmode:./save_paper.py --manual \ --title "Paper Title" \ --url "https://..." \ --authors "Author1; Author2; Author3" \ --abstract "..." \ --date "2026-03-10"
Supported URLs:
- arXiv (always works):
https://arxiv.org/abs/... - OpenReview, ACL Anthology, bioRxiv, journal sites (web scraping fallback)
- Any URL (manual mode as last resort)
Targets
- Academic paper database:
- database_id:
d3e3be7f-c96a-45de-8e7d-3a78298f9ccd - data_source_id (query):
73e9f7f8-c667-4279-a62f-2c16c1885d0f
- database_id:
Hard rules
- Only write when Selina explicitly asks to save/add/log.
- Deduplicate: do not create a new entry if it already exists.
- Prefer using the database property
Linkas the canonical key. - Authors field: Uses "Authors txt" (rich text, semicolon-separated), not "Author" (multi-select)
Dedup workflow
- Normalize the URL (trim, remove tracking params when safe).
- Query the data source filtering:
Linkequals the normalized URL- If arXiv: also try canonical
https://arxiv.org/abs/<id>if the user provided a PDF link.
- If found: return the existing page URL; optionally offer to update metadata if missing.
Create workflow
If not found:
- Create a new page in the database:
POST /v1/pageswithparent: {"database_id": <database_id>}- Set properties:
Name(title): best available title (from arXiv/metadata)Link(url): normalized URL- Optionally:
Publisher,Type,Publishing/Release Date,Discipline,Topics,Status
- Populate authors:
- Updated (Mar 10, 2026): Uses
Authors txtproperty (rich text, semicolon-separated) - Format:
"Author1; Author2; Author3" - No longer uses:
Authorproperty (multi-select) — causes schema overflow with 1000+ unique authors - Example:
properties["Authors txt"] = {"rich_text": [{"text": {"content": "Alice; Bob; Charlie"}}]}
- Updated (Mar 10, 2026): Uses
- Populate abstract:
- There is no dedicated "Abstract" property.
- Put the abstract into the page body as blocks (preferred):
quote(common in Selina's pages) ORheading_2: "Abstract"paragraph(or quote content): abstract text
- Optionally use
TLDRfor a short summary (not the full abstract), if Selina wants that later.
Metadata extraction
If URL is arXiv
- Accept input forms:
https://arxiv.org/abs/<id>https://arxiv.org/pdf/<id>.pdf- bare
<id>like2501.01234
- Fetch metadata from one of:
- arXiv API:
http://export.arxiv.org/api/query?id_list=<id>(Atom XML) - Fallback: scrape the arXiv abs page.
- arXiv API:
- Extract:
- title
- authors
- abstract
- published date (optional)
If URL is not arXiv
- Use web_fetch to get the page and extract:
- title (best guess)
- author list if present
- abstract/summary if present
- If not reliably available, create entry with just
Name+Linkand leave placeholders.
Suggested property mapping (when confident)
Publisher: set tobioRxiv,arXiv,NeurIPS, etc. only if clearly indicated.Type: default toAcademic Journalfor papers;Blog Postfor posts.Status: default toReady to Start.Discipline: add tags likeML,MechInterp,Geometry,ScienceofDLif strongly implied.
UX
Before writing:
- Confirm the normalized link + the guessed title. After writing:
- Return the created/updated Notion page URL.
- Briefly list what fields were populated.