arxiv-automation - SKILL.md Agent Skill

name: arxiv-automation description: "Search and monitor arXiv papers. Query by topic, author, or category. Track new papers, download PDFs, and summarize abstracts for research workflows."

arXiv Automation

Search, monitor, and analyze academic papers from arXiv.

Capabilities

Search papers by keyword, author, category
Monitor new submissions in specific categories
Download PDFs for analysis
Extract and summarize abstracts
Track citation-worthy papers

Usage

Search Papers (arXiv API)

import urllib.request, urllib.parse, xml.etree.ElementTree as ET

def search_arxiv(query, max_results=10):
    base_url = "http://export.arxiv.org/api/query?"
    params = urllib.parse.urlencode({
        "search_query": query,
        "start": 0,
        "max_results": max_results,
        "sortBy": "submittedDate",
        "sortOrder": "descending"
    })
    url = base_url + params
    response = urllib.request.urlopen(url).read()
    root = ET.fromstring(response)
    ns = {"atom": "http://www.w3.org/2005/Atom"}
    papers = []
    for entry in root.findall("atom:entry", ns):
        papers.append({
            "title": entry.find("atom:title", ns).text.strip(),
            "summary": entry.find("atom:summary", ns).text.strip()[:200],
            "link": entry.find("atom:id", ns).text,
            "published": entry.find("atom:published", ns).text,
            "authors": [a.find("atom:name", ns).text for a in entry.findall("atom:author", ns)]
        })
    return papers

# Example: search for LLM agent papers
papers = search_arxiv("all:LLM AND all:agent", max_results=5)
for p in papers:
    print(f"{p['title']}\n  {p['link']}\n  {', '.join(p['authors'][:3])}\n")

Monitor Categories

Common CS categories:

Category	Description
cs.AI	Artificial Intelligence
cs.CL	Computation and Language (NLP)
cs.LG	Machine Learning
cs.CV	Computer Vision
cs.SE	Software Engineering

RSS feeds: http://arxiv.org/rss/{category} (e.g., http://arxiv.org/rss/cs.AI)

Download PDF

# arXiv ID format: 2401.12345
arxiv_id = "2401.12345"
pdf_url = f"https://arxiv.org/pdf/{arxiv_id}.pdf"

Rate Limits

arXiv API: max 1 request per 3 seconds
Be respectful of arXiv's resources
Use RSS feeds for monitoring (less load than API queries)

Integration

Combine with pdf skill for PDF text extraction and analysis. Combine with rss-automation for periodic monitoring of new papers.