name: update-papers description: Update the research paper collection — download new papers from references.bib, retry failed downloads, sync to Google Drive, and regenerate index.md
Update Papers
Maintain the research paper collection in docs/research/. Downloads papers referenced in docs/paper/references.bib, manages the manifest, syncs to Google Drive, and keeps docs/research/index.md current.
Prerequisites
rcloneinstalled and configured with agdriveremotePAPERS_REMOTEenv var set (e.g.,gdrive:problemreductions-papers)
Step 1: Check Current Status
make papers-status
Note the counts: total entries, PDFs on disk, pending downloads, missing papers.
Step 2: Lookup New Papers
Run the lookup to find arxiv/OA URLs for any new entries in references.bib since the last run. This is incremental — it skips entries already found in the manifest.
make papers-lookup
Review the output:
- New arxiv papers found
- New OA (open access) papers found
- Papers with no free source (will need Sci-Hub in Step 4)
Step 3: Download Free Papers
Download papers with known free URLs (arxiv + open access). Skips PDFs already on disk.
make papers-download
If some OA downloads fail with 403, that's expected — publisher paywalls. These will be picked up by Sci-Hub in the next step.
Step 4: Fetch Remaining via Sci-Hub
For papers with DOIs that aren't on disk yet, try Sci-Hub mirrors. This is the slowest step (~5 seconds per paper).
make papers-scihub
The script tries multiple mirrors (sci-hub.ru, sci-hub.do, sci-hub.it.nf, sci-hub.es.ht). If all mirrors are down, retry later — the script is fully idempotent.
Step 4b: Manual Web Search for Remaining Failures
After Sci-Hub, check make papers-status for papers still missing. For each one with a DOI that Sci-Hub couldn't find:
- Web search for
"<title>" <first-author> PDF— try:- Author homepages (Stanford, university pages)
- Open-access publishers: LIPIcs/Dagstuhl (all free), HAL archives, ECCC
- Preprint servers: arxiv (search by title), IACR ePrint
- Download manually with
curl -L -o docs/research/raw/<key>.pdf "<url>" - Verify the file is a real PDF:
file docs/research/raw/<key>.pdf
Skip textbooks (garey1979, sipser2012, cormen2022, conway1967) — these aren't available as single PDFs.
Step 5: Regenerate Index
Update docs/research/index.md with the latest paper collection, cross-referenced against reduction rules and problem definitions in reductions.typ.
make papers-index
Verify the index looks correct:
- Check the download count at the top
- Spot-check that new papers appear in the correct section (rules / problems / other)
- Confirm PDF links resolve for newly downloaded papers
Step 6: Sync to Google Drive
Push updated PDFs and manifest to the shared Google Drive remote. Only uploads new/changed files.
First verify the remote is configured:
echo $PAPERS_REMOTE # should show e.g. gdrive:problemreductions-papers
# If empty, set it:
export PAPERS_REMOTE=gdrive:problemreductions-papers
Then push:
make papers-push
Step 7: Final Status
make papers-status
Report to the user:
- How many new papers were downloaded
- How many remain missing (and why: no DOI, textbooks, Sci-Hub mirrors down)
- Whether the Google Drive sync succeeded
One-Liner
For a full update in one command:
make papers && make papers-index
This runs: lookup → download → scihub → status → index.
Troubleshooting
Sci-Hub mirrors all fail: Mirrors rotate frequently. Update SCIHUB_DOMAINS in scripts/fetch_papers.py or retry later.
rclone auth expired: Run rclone config reconnect gdrive: to refresh the OAuth token.
Manifest is stale: Delete docs/research/manifest.json and re-run make papers-lookup to rebuild from scratch. Existing PDFs on disk are preserved.
New bib entry not appearing: Ensure the entry is in docs/paper/references.bib with proper formatting. The parser expects @type{key, ... } with fields like title, doi, author, year.