name: paper-close-reading
description: "Create a paper close-reading workflow from a prepared PDF, raw HTML, and figure images. Always make clean-original.html as Step 1, then write the Japanese close-reading as -ja.md (canonical, git-tracked, editable). render_all.sh renders ja.md into a self-contained -ja.html with embedded figures, served by fenrir_portal at http://fenrir:8080/paper/. Use when the user says paper-close-reading, 論文精読, clean-original.html, original.html, ja.md, 精読メモ, or wants a prepared paper tmp directory turned into readable original (embedded HTML) + ja (markdown canonical + portal HTML) artifacts."
metadata:
short-description: "論文PDF+raw HTML+図から clean-original.html (英) + -ja.md (canonical) を作り、portal が ja.html を embedded で配信"
Paper Close Reading
Purpose
Turn a prepared paper staging directory into 3 reading artifacts:
clean-original.html— English clean text + embedded figures (kept intmp/<slug>/as the reading source, not committed)<slug>-ja.md— canonical Japanese close-reading: section summaries, figure interpretation, 批判的コメント (markdown, git-tracked, human-editable)<slug>.md— lightweight index: title, authors, DOI, journal, links, summary (git-tracked)
render_all.sh then renders both markdown files to ~/.local/share/life/_life/paper/, embedding local image references (tmp/<slug>/Fig.1.png etc.) as data: URIs resized to a 1200px max width via Pillow (so portal pages stay reasonable in size and figures look visually consistent). It also regenerates portfolio/paper-close-readings/README.md Link 集 table from a glob so the new paper appears on the portal index automatically. fenrir_portal serves the result at:
http://fenrir:8080/paper/— auto-updated paper index (README.html)http://fenrir:8080/paper/<slug>.html— per-paper index pagehttp://fenrir:8080/paper/<slug>-ja.html— full Japanese close-reading with embedded figures (max-width 1200px,<figure class="md-figure">styled via Newsprint CSS)
clean-original.html is always Step 1. It is the required foundation for everything that follows.
Per-Paper tmp Subdirectory (REQUIRED)
Every paper's staging assets live in their own per-slug subdirectory, never in bare tmp/:
portfolio/paper-close-readings/tmp/<slug>/
original.html clean-original.html full.txt
Fig.1.png Fig.2.png ... Fig.N.png
where <slug> is the exact paper slug (= the <slug>-ja.md filename stem, e.g.
2026-06-08-pleyer-iron-porphyrin-biosignature).
Why: bare tmp/ is shared scratch — the next paper's fetch overwrites it. If <slug>-ja.md
references bare tmp/Fig.1.png, then after the next paper is staged those references silently point
to a different paper's figures. A future re-render of the old ja.md would then embed the wrong
images. Per-slug subdirectories make each paper self-contained, so references never go stale and
tmp/ itself stays empty between papers.
Therefore, throughout this skill, every tmp/... path means tmp/<slug>/...:
- staging input:
tmp/<slug>/original.html,tmp/<slug>/Fig.N.png - Step 1 output:
tmp/<slug>/clean-original.html <slug>-ja.mdfigure refs:<slug>.mdindex clean-original path:tmp/<slug>/clean-original.html
render_all.sh resolves these relative to the .md location, so the subdir path works unchanged.
Step 0 (do this first): determine the slug from the paper metadata, then
mkdir -p portfolio/paper-close-readings/tmp/<slug>/ and fetch/stage all assets into it.
If the user pre-staged files in bare tmp/, move them into tmp/<slug>/ before Step 1.
Expected Input
The staging directory is the per-slug subdir (see "Per-Paper tmp Subdirectory" above):
portfolio/paper-close-readings/tmp/<slug>/
There are two input modes — prefer HTML mode when available, fall back to PDF mode when not.
HTML mode (preferred)
tmp/<slug>/original.html— raw page HTML from the publisher (with full body, figures, references)tmp/<slug>/Fig.1.png(or.jpg) ...tmp/<slug>/Fig.N.png— extracted figure images
This mode supports accurate text and structure-preserving rendering. Acquire HTML via either:
scripts/papers/fetch_html.py <publisher-url>— playwright + stealth + UTokyo institutional IP, passes Cloudflare and gives full figure markup- Browser "Save Page As → Web Page Complete" if scripted fetch is blocked
PDF mode (fallback)
Use this mode when HTML cannot be obtained:
- Cloudflare / anti-bot blocks
fetch_html.pyeven with stealth - Publisher has no web HTML edition (book chapters, some preprint servers)
- Off-campus and no EZproxy access
- Paywall blocks HTML but PDF is accessible (e.g. via Paperpile)
Inputs:
tmp/<slug>/<author-year>.pdf— the PDF file (often already in Paperpile:~/Library/CloudStorage/GoogleDrive-38kta.lab@gmail.com/マイドライブ/Paperpile/<year>/...pdf)tmp/<slug>/Fig.1.png...tmp/<slug>/Fig.N.png— figures extracted manually (e.g. macOS Preview "export as image", orpdftoppmfrom poppler)
In PDF mode you skip Step 1 (clean-original.html) — there is no clean HTML to produce. Run pdftotext -layout tmp/<slug>/<...>.pdf tmp/<slug>/full.txt (poppler) once to get text, then read directly from the PDF (or text dump) when writing <slug>-ja.md. Note that pdftotext loses table layout and may garble special characters / multi-column flow — flag any ambiguous extraction in the close-reading 批判的コメント section if it affects the conclusion.
Mode declaration
In the index <slug>.md, declare which mode was used so future you (and search) can find it:
---
input_mode: html # or: pdf
...
---
Before writing output, read the current repository's README.md and Rules.md when available.
Required Workflow
- Step 0: Determine the slug and
mkdir -p portfolio/paper-close-readings/tmp/<slug>/. Fetch/stage all assets (raw HTML, figures, PDF) into that subdir. Move any pre-staged bare-tmp/files intotmp/<slug>/. - Inspect
tmp/<slug>/. Determine HTML mode (raworiginal.htmlavailable) vs PDF mode (only PDF + figures). Record the mode in the index<slug>.mdfrontmatter asinput_mode: htmlorinput_mode: pdf. - Step 1: Create
tmp/<slug>/clean-original.html— only in HTML mode. In PDF mode skip this step and proceed directly to Step 2.- Keep the article body, figures, tables, and references.
- Remove surrounding publisher UI, related-content blocks, and other page chrome.
- Preserve the original English text.
- Embed figure references (
src="Fig.1.png") as data: URIs usingembed_local_images_in_html.py(optionally with--max-width 1200for smaller offline copy).
- (HTML mode only) Create or refresh
tmp/<slug>/original.htmlif a broader original reading copy is useful. - Step 2: Write
<slug>-ja.mdatportfolio/paper-close-readings/<slug>-ja.mdbased onclean-original.html(HTML mode) or PDF text + visual reading (PDF mode).- Use the paper's own section structure as the spine.
- Summarize
Introduction,Materials and methods,Results and discussion,Conclusionsin Japanese. - Reference figures inline with markdown image syntax:
. These will be embedded automatically during render. Keep[/]out of the alt text (e.g. writeFeCl(oep)not[FeCl(oep)]) — square brackets in the alt break markdown![...]()image parsing and the figure silently fails to embed. - Always include a
## 批判的コメントsection. In PDF mode, also flag any text-extraction uncertainty (table layout / equations / multi-column flow) so future you knows what to re-verify.
- Step 3: Write
<slug>.mdindex atportfolio/paper-close-readings/<slug>.mdwithinput_mode, title, authors, DOI, journal, one-line summary, links to ja and original. - Render to portal: run
bash scripts/render_all.sh(or rely on the next scheduled render). render_all.sh resolves thetmp/<slug>/image paths relative to the source.mdlocation, embeds them as data: URIs (with--max-width 1200resize), and regenerates the/paper/index automatically. Verifygrep -o 'data:image' <slug>-ja.html | wc -lequals the figure count.
Step 1 Rule
clean-original.html creation is mandatory.
Treat it as:
- the first stable artifact
- the source for later Japanese structuring
- the minimum English reading artifact worth preserving
Do not start by writing <slug>-ja.md from raw publisher HTML directly. Always pass through clean-original first so the Japanese version inherits a clean section structure.
Repository-Specific Convention For life
When working inside the life repo:
- staging area:
portfolio/paper-close-readings/tmp/<slug>/(gitignored, one subdir per paper, contains PDFs / raw HTML / figures /clean-original.html; baretmp/stays empty between papers) - canonical artifacts (git-tracked, in
portfolio/paper-close-readings/):<slug>.md— lightweight index<slug>-ja.md— full Japanese close-reading
- portal output (gitignored, auto-rendered to
~/.local/share/life/_life/paper/):<slug>.html— index, served by Caddy athttp://fenrir:8080/paper/<slug>.html<slug>-ja.html— Japanese close-reading with embedded images, served athttp://fenrir:8080/paper/<slug>-ja.html
The <slug>.md index is for:
- title, authors, DOI, journal
- local paths (clean-original.html in
tmp/<slug>/, ja.md, ja.html) - Drive / Paperpile links
- one-line summary
- figure roles (1-2 lines)
- strong conclusions
- weaknesses / unresolved points
- critical comments (high-level — full version in
<slug>-ja.md) - next reading targets
Do not commit PDFs, raw figure assets, or clean-original.html (= all live in tmp/<slug>/).
clean-original.html Rules
- Preserve the full original text of the body.
- Keep major article sections.
- Keep figure and table references readable.
- Keep references unless the user explicitly says to drop them.
- Prefer readability over publisher fidelity.
- Treat publisher structure as variable. Do not assume Springer-only markup.
- When the auto cleaner lands on a too-broad wrapper, trim obvious site chrome manually but keep the article text complete.
- After cleaning, embed local figures so it stays portable when reading offline.
Use the bundled script in auto mode first:
python3 <skill-dir>/scripts/make_clean_original.py tmp/<slug>/original.html tmp/<slug>/clean-original.html
# rewrite any remaining publisher figure URLs to local Fig.N.* refs, then embed
python3 scripts/embed_local_images_in_html.py --max-width 1200 tmp/<slug>/clean-original.html
The cleaner is publisher-agnostic by default and tries multiple DOM families:
Nature / Springerstyle article bodiesPLOSstyle#artTextbodies- generic
<article>wrappers - generic
<main>wrappers
If auto mode fails, fall back to refreshing the raw HTML (scripts/papers/fetch_html.py) so the body is available as article or main content, then rerun Step 1.
ja.md Rules
Use the paper's own structure, then translate and reorganize into readable Japanese.
Minimum expected sections (as markdown headings):
## Abstract## 導入の要点## Materials and methods 日本語整理## Results and discussion 日本語整理## Discussion の要点if separable from results## Conclusions の要点## 批判的コメント(mandatory, even when the paper is strong)## 次に読むポイント
Figure references use markdown image syntax with paths relative to the source .md:

The image will be inlined as a data: URI when render_all.sh produces the portal HTML — no manual embedding step needed for <slug>-ja.md. Do not put [ / ] in the alt text — square brackets break markdown image parsing and the figure silently fails to embed.
Always:
- distinguish what the data directly show from the authors' model
- mark limitations and uncertainty clearly
- leave critical comments even if the paper is strong
Read references/ja-html-template.md for the expected Japanese tone (the template is HTML-flavored but the structure carries over to markdown).
Visual Style Rule
Use the daily-search-trend Newsprint-inspired CSS direction. The Newsprint CSS is automatically applied to portal HTML by scripts/render_md.py — no per-document style work needed for <slug>-ja.html.
For tmp/<slug>/clean-original.html (which is hand-cleaned from publisher HTML), keep restrained paper-like colors, thin rules, and readable serif typography. Avoid app-like card dashboards unless the user explicitly asks for a different visual language.
Self-Contained Output Rule
The portal HTML produced from <slug>-ja.md must have all tmp/<slug>/Fig.*.png references embedded as data: URIs. This is handled automatically by scripts/render_all.sh's paper render pass, which:
- Calls
embed_local_images_in_html.py --base-dir <source-dir> --max-width 1200(Pillow-backed) so figures resolve fromportfolio/paper-close-readings/tmp/<slug>/and are downscaled to a 1200px max width (consistent display + ~65% portal size reduction). - Calls
scripts/automation/regen_paper_index.pyto rewrite the auto-generated table inportfolio/paper-close-readings/README.mdbetween the<!-- AUTO:LINKS:START -->/<!-- AUTO:LINKS:END -->markers. The new paper appears athttp://fenrir:8080/paper/(the portal index) automatically.
For tmp/<slug>/clean-original.html (English reading copy, not in portal), embed images explicitly:
python3 scripts/embed_local_images_in_html.py tmp/<slug>/clean-original.html
# Optionally also resize for offline reading on small screens
python3 scripts/embed_local_images_in_html.py --max-width 1200 tmp/<slug>/clean-original.html
This makes the clean-original portable for offline reading even though it is not committed.
Output Checklist
Before finishing, verify:
tmp/<slug>/clean-original.htmlexists, opens as valid HTML, has embeddeddata:imageURIsportfolio/paper-close-readings/<slug>-ja.mdexists with markdown image references likeand no[/]in alt textportfolio/paper-close-readings/<slug>.mdindex references both files (clean-original path undertmp/<slug>/)- bare
tmp/contains no loose paper assets (everything is undertmp/<slug>/) ## 批判的コメントsection is present in<slug>-ja.md- After
bash scripts/render_all.sh:~/.local/share/life/_life/paper/<slug>.htmland<slug>-ja.htmlexist- ja.html contains embedded
data:imageURIs (grep -c 'data:image' <slug>-ja.htmlshould equal the figure count) - ja.html size is reasonable (low MB range, since
--max-width 1200is applied) portfolio/paper-close-readings/README.mdLink 集 has a new row for this paper (auto-generated byregen_paper_index.py), andhttp://fenrir:8080/paper/shows it
References
references/ja-html-template.md: expected Japanese reading-note structure (HTML format, but the section flow applies to markdown too)
Auto-finalize
After producing <slug>-ja.md and <slug>.md, run the shared finalize script. It is a no-op unless AGENT_AUTO_COMMIT=1 is exported in the shell. On fenrir this is the default; on Air / mini-lab it is unset, so this call has no effect.
bash scripts/agent_auto_finalize.sh \
-m "docs: 📝 paper-close-reading: <paper short title>" \
portfolio/paper-close-readings/<slug>-ja.md \
portfolio/paper-close-readings/<slug>.md
Pass only the canonical markdown files — never the PDF, raw HTML, figure assets, or clean-original.html (all live in tmp/<slug>/, which is gitignored). The script commits with -o so other staged changes are not swept in.