oma-scholar - SKILL.md Agent Skill

name: oma-scholar description: > Scholarly research companion using Knows sidecar spec (.knows.yaml). Generates, validates, reviews, queries, and compares structured research-paper sidecars, and fetches them from knows.academy. Use for academic literature search, survey synthesis, paper authoring assistance, and peer review with token-efficient claim/evidence/relation access.

Scholar - Research Paper Sidecar Companion

Scheduling

Goal

Search, fetch, generate, validate, analyze, review, and compare scholarly paper sidecars using the Knows .knows.yaml spec for token-efficient research workflows.

Intent signature

User asks for academic literature search, sidecar generation, sidecar validation, paper claims/evidence summary, structural paper comparison, or peer review as sidecar.
User references Knows, .knows.yaml, knows.academy, OpenAlex, claims, evidence, relations, or paper sidecars.

When to use

Reading research papers token-efficiently via Knows sidecars (~700 tokens for claims-only vs ~10K for full PDF)
Generating .knows.yaml sidecars from your own paper drafts, LaTeX, or research notes
Validating sidecar structure (rule-based) before sharing
Producing peer reviews as sidecars
Querying or summarizing existing sidecars
Structurally comparing two papers (claims, methods, evidence)
Searching/fetching sidecars from knows.academy (~50K papers indexed)

When NOT to use

General web search or non-academic content -> use oma-search
Translating papers -> use oma-translator
PDF parsing only (no sidecar) -> use oma-pdf
Submitting sidecars back to knows.academy -> out of scope (host LLM only consumes/produces locally)
Full peer-review workflow with editor system -> out of scope

Expected inputs

Paper, abstract, draft, LaTeX, research notes, sidecar file, DOI, OpenAlex ID, Knows record ID, or search query
Desired mode: generate, validate, review, analyze, compare, or remote fetch
Optional strictness, section filter, or CI behavior

Expected outputs

.knows.yaml sidecar, review sidecar, lint report, search/fetch result, natural-language analysis, or structural comparison
Sidecars conforming to v0.9.0 / paper@1 profile
Validation status and warnings before sharing generated sidecars

Dependencies

oma scholar CLI subcommands
knows.academy public API and OpenAlex fallback
resources/sidecar-spec.md, API endpoints, OpenAlex setup, upstream cache, checklist, and execution protocol

Control-flow features

Branches by mode, source availability, Knows/OpenAlex coverage, strict vs lenient validation, and fetched section
Reads/writes YAML sidecars and may call public APIs
Avoids fabrication when source evidence is missing

Structural Flow

Entry

Identify mode and source artifact/query.
Resolve paper identity through Knows or OpenAlex when needed.
Load sidecar spec and mode-specific protocol.

Scenes

PREPARE: Select mode and gather source or remote identifiers.
ACQUIRE: Fetch paper metadata, sidecar sections, or local source text.
REASON: Extract claims, evidence, relations, provenance, or comparison structure.
ACT: Generate, lint, review, analyze, compare, or fetch sidecar data.
VERIFY: Validate schema, enums, IDs, relations, and provenance.
FINALIZE: Return sidecar, report, summary, or comparison with caveats.

Transitions

If knows.academy lacks the paper, fall back to OpenAlex metadata/abstract.
If generating a sidecar, run lint before sharing.
If consuming third-party sidecars with dangling references, use lenient mode when appropriate.
If source evidence is absent, omit fields instead of guessing.

Failure and recovery

If remote API times out, retry or use OpenAlex fallback.
If YAML fails parsing, fix indentation and scalar types.
If relation density or orphan statements warn, add supported-by relations when source evidence supports them.

Exit

Success: requested sidecar operation completes with validation status.
Partial success: missing metadata, fallback source, or validation warnings are explicit.

Logical Operations

Actions

Action	SSL primitive	Evidence
Select mode	`SELECT`	Generate/Validate/Review/Analyze/Compare/Remote
Read paper or sidecar	`READ`	Source files or YAML
Request remote data	`REQUEST`	Knows/OpenAlex APIs
Infer claims/evidence/relations	`INFER`	Sidecar generation/analysis
Write sidecar	`WRITE`	`.knows.yaml` outputs
Validate sidecar	`VALIDATE`	`oma scholar lint`
Report result	`NOTIFY`	Summary or lint report

Tools and instruments

oma scholar search|resolve|get|lint
Knows public API, OpenAlex fallback, sidecar spec, checklist

Canonical command path

oma scholar search "<query>"
oma scholar resolve "<title-or-doi>"
oma scholar get "<record-id-or-doi>"
oma scholar lint "<paper.knows.yaml>"

Resource scope

Scope	Resource target
`LOCAL_FS`	Paper drafts, sidecar YAML, review sidecars
`NETWORK`	knows.academy and OpenAlex APIs
`PROCESS`	`oma scholar` CLI and lint
`USER_DATA`	User-provided paper content and research notes

Preconditions

Mode and source are identifiable.
Spec rules are available for generation or validation.

Effects and side effects

May create local sidecar or review sidecar files.
May query public scholarly APIs.
Does not submit sidecars back to knows.academy.

Guardrails

Target spec is v0.9.0 / paper@1 profile: verified against production sidecars from knows.academy; see resources/sidecar-spec.md
Host LLM generates sidecars: never shell out to anthropic SDK or external LLM CLI; this skill runs inside an agent
Anti-fabrication: if DOI/venue/year is not visible in source, omit the key entirely; never write doi: TODO or guess
Top-level metadata: title, authors, venue, year live at the top level (no metadata wrapper)
Field names are exact: statement_type, evidence_type, predicate, artifact_type (not type/claim)
Provenance has SINGLE actor: provenance.actor is one object, NOT a provenance.actors array
Confidence is an object: {claim_strength: ..., extraction_fidelity: ...}, both from high|medium|low
Coverage is an object: coverage.statements (4-value enum) + coverage.evidence (3-value enum)
Closed enums: actor tool|person|org (never ai/llm/model); artifact role subject|supporting|cited; predicates in present tense
Numbers unquoted: value: 22, never value: '22'
Relation density: average ≥1.5 relations per statement; every claim needs supported_by evidence (lint warns when ratio is below; orphan statements warned per-id)
ID format: descriptive kebab-case with prefix: stmt:privacy-budget-tradeoff, ev:cifar10-accuracy-table, art:paper
Validate before sharing: run oma scholar lint after Generate
Remote API has no auth: https://knows.academy/api/proxy/* is public; do not invent auth headers
Partial fetch param is section (singular): fixed enum statements|evidence|relations|artifacts|citation
OpenAlex key is optional: metadata enrichment only; gracefully degrade when missing
Sidecar content stays English: schema fields, IDs, statement text follow upstream convention; user-facing responses follow oma-config.yaml language
Spec drift awareness: our local rules track v0.9.0 production behavior, which differs from the upstream knows.md natural-language description; refresh resources/upstream-spec-cache.md periodically

Modes

Mode	Trigger	Output
Generate	"create sidecar from this paper / abstract / draft", "generate `.knows.yaml`"	`{paper}.knows.yaml` (host LLM emits, then `oma scholar lint` validates)
Validate	"lint this sidecar", "validate `.knows.yaml`"	Pass/fail report with file:line issues
Review	"peer review this paper as sidecar"	`{paper}.review.knows.yaml`
Analyze	"summarize this sidecar", "what claims does it make?"	Natural-language answer
Compare	"compare paper A and paper B structurally"	Diff table (claims/methods/evidence)
Remote	"find papers on X", "fetch sidecar :id", "get claims only for :id"	Search results / sidecar payload

Provider Fallback (knows.academy → OpenAlex)

knows.academy currently indexes only 2026 papers (~50K, mostly arXiv). For older or non-2026 papers (Transformer 2017, BERT 2018, classics, journals), the skill automatically falls back to OpenAlex for metadata and abstract.

Use the oma scholar CLI subcommands:

# Hybrid search: knows first, OpenAlex fallback
oma scholar search "vision language action"

# Cross-source resolve: figures out which source has the right paper
oma scholar resolve "Attention Is All You Need"

# Get by id (knows record_id, OpenAlex W-id, or DOI)
oma scholar get "10.48550/arXiv.1706.03762"

When OpenAlex returns the answer (knows.academy lacks the paper), use the returned abstract as input to Mode 1 Generate to produce a local sidecar.

How to Execute

Follow resources/execution-protocol.md step by step for the selected mode.

Quick Reference

Search (knows + auto OpenAlex fallback)

oma scholar search "diffusion super resolution"
oma scholar search --year-min 2024 "vision language action"

Find one specific paper

oma scholar resolve "Attention Is All You Need"
# returns top hit from each source + recommendation

Fetch a sidecar or work

# knows.academy full sidecar
oma scholar get "knows:generated/reconvla/1.0.0"

# Partial fetch (claims only, ~700 tokens, 93% reduction vs PDF)
oma scholar get --section statements "knows:generated/reconvla/1.0.0"

# By DOI or OpenAlex W-id (works regardless of knows.academy availability)
oma scholar get "10.48550/arXiv.1706.03762"

When knows.academy is unreachable, get knows:... automatically falls back to OpenAlex by extracting the slug from the record_id. The result is marked with fallback: "openalex" and contains metadata + abstract, useful for running Mode 1 Generate locally.

Validate

# Strict mode for own Generate output (default)
oma scholar lint paper.knows.yaml

# Lenient mode for third-party / fetched sidecars
oma scholar lint --lenient remote.knows.yaml

# Treat warnings as failures (CI mode)
oma scholar lint --fail-on-warning paper.knows.yaml

About 47% of knows.academy-served sidecars contain at least one dangling cross-reference (typo in subject_ref/object_ref, measured across 15 production samples). Use --lenient when consuming third-party records so these surface as warnings rather than blocking errors.

Raw API (when CLI is unavailable)

curl -s "https://knows.academy/api/proxy/search?q=..."
curl -s "https://knows.academy/api/proxy/sidecars/<encoded-id>"
curl -s "https://knows.academy/api/proxy/partial?record_id=<id>&section=statements"
curl -s "https://knows.academy/api/proxy/jobs/stats"   # platform health

Configuration

Project-specific settings: config/scholar-config.yaml

Troubleshooting

Issue	Solution
`[ERROR] *.value: numeric value '22' is quoted`	Remove quotes: `value: '22'` -> `value: 22`
`[ERROR] provenance.actor.type: 'ai' is not allowed`	Change to `tool`, `person`, or `org`
`[ERROR] *.type: use \`statement_type` instead of `type``	Rename `type` -> `statement_type` (or `evidence_type`/`predicate`/`artifact_type`)
`[ERROR] provenance.actors: v0.9 spec uses singular \`actor``	Replace `actors: [{...}]` array with `actor: {...}` object
`[ERROR] *.object_ref: reference 'X' does not match any defined id`	Fix the `subject_ref`/`object_ref` to point to a real id, OR use `--lenient` if consuming third-party data
`[WARN] relations: avg relations/statement is N.NN (target ≥ 1.5)`	Add more `supported_by`/`depends_on` relations
`[WARN] statements: only N statements; most papers warrant ≥ 8`	Expected when generating from abstract only; full-paper Generate should hit 15+
`[WARN] *.predicate: past-tense '...' is suspicious`	Switch to present tense (`evaluated_on` -> `evaluates_on`)
Remote API returns empty results	Try broader query; check `/api/proxy/jobs/stats`; CLI auto-falls-back to OpenAlex
`knows.academy search failed: fetch failed` (stderr)	Platform timeout; fallback to OpenAlex is automatic; retry later for sidecars
OpenAlex 403/429	Set `OPENALEX_API_KEY` (see `resources/setup-openalex.md`)
YAML won't parse	Check indentation; numbers/booleans must be unquoted; strings with `:` need quotes

References

Execution steps: resources/execution-protocol.md
Sidecar spec rules: resources/sidecar-spec.md
API endpoints: resources/api-endpoints.md
OpenAlex setup: resources/setup-openalex.md
Upstream spec snapshot: resources/upstream-spec-cache.md
Post-generation checklist: resources/checklist.md
CLI subcommands: oma scholar search|resolve|get|lint (implementation under cli/commands/scholar/)
Context loading: ../_shared/core/context-loading.md
Quality principles: ../_shared/core/quality-principles.md
i18n rules: ../../rules/i18n-guide.md