ocas-reach - SKILL.md Agent Skill

name: ocas-reach description: 'Live world-data query engine. Queries real-time external APIs for factual ground truth — no synthesis, no opinion, no research. Routes requests through a registry of ~53 registered sources covering US government data, scholarly literature, weather and hazards, geocoding, finance and macro indicators, court records, nutrition, news events, property records, land due-diligence, and academic papers. Do not use for web research (use Sift), entity investigations (use Scout), or pattern analysis over historical signals (use Corvus).' license: MIT source: https://github.com/indigokarasu/reach includes:

references/**
scripts/** metadata: author: Indigo Karasu (indigokarasu) version: 3.9.0 tags:
live-data
API
real-time
facts
research triggers:
live data query
real-time API
fact check
current data
external API query

ocas-reach: Live World-Data Query Engine

Reach is the system's sensory layer for verified ground truth. It answers "what is" — not "what was," not "what should be," not "what might be." Each registered source is a deterministic connector: structured query in, structured response with citation out. No synthesis. No interpretation. Reach returns the fact; downstream skills synthesize.

When to Use

The user asks about real-time external data with a known authoritative source (USGS, NOAA, FDA, FRED, Congress, Redfin, Open-Meteo, NASA, Wikidata, etc.)
The user wants a factual lookup ("what's the current GDP figure," "any recent FDA recalls of ," "is 123 Main St for sale," "papers about transformers since 2022")
The agent needs a citation-bearing fact to ground a downstream response
The request is a specific entity, measurement, or list, not an open-ended exploration

When NOT to Use

Open-ended research over many sources → use Sift
Investigating a person → use Scout
Detecting patterns in historical signals → use Corvus
Requests requiring summarization or synthesis (Reach returns the fact; the agent or another skill does the writing)

Responsibility Boundary

Reach owns: real-time external API calls, source-specific connector logic, source registry, query/response logging, monthly quota tracking, account ledger.

Reach does not own:

Web research over open content (Sift)
Person investigations (Scout)
Knowledge graph queries (Elephas)
Pattern analysis over journals (Corvus)
Long-form synthesis or briefing (Vesper)
Inbox / messaging (Dispatch)

A query that needs synthesis after fetching is a two-step request: Reach returns the fact, the agent or downstream skill synthesizes.

Optional Skill Cooperation

When present:

Vesper may include Reach lookups in morning briefings (overnight USGS earthquake summary, weather, FRED indicator deltas).
Sift may invoke Reach as one of many sources during deep research.
Scout may invoke Reach for property records and SEC filings during contact enrichment.
Custodian may verify a Reach quota state when triaging API rate-limit issues.

Reach functions normally when none of these are present.

Source Registry

The authoritative source list lives in scripts/sources.yml (52 sources). Browsable index: references/sources/index.md. See references/credential-files.md for credential storage, references/account_provisioning.md for account registration.

Account Creation

Reach is explicitly authorized to register accounts at sources requiring them. See references/account_provisioning.md for the registration playbook. Not authorized to: pay for tiers, use different identity, accept arbitration waivers, register outside sources.yml.

Decision Model

8-step query pipeline: identify source → validate auth → check quota → build query → execute → log usage → write journal → return verbatim. No silent fall-through to alternative sources.

Execution

CLI via python3 scripts/reach.py. See references/sources/index.md for command reference and references/usage_tracking.md for quota management.

Adding a New Source

See references/source-evaluation-framework.md for the evaluation checklist and references/sources/_template.md for the source reference template.

Legacy Sources

katzilla and property_lookup predate the v3 registry. References: references/katzilla.md, references/property_lookup.md.

Methodology Notes

Reach/Sift boundary (Jun 12, 2026) — Reach owns: sources.yml, API connectors, MCP connections, quota tracking (CSAPI + all Reach-registered APIs), discovered APIs catalog (references/discovered-apis.md). Sift owns: research synthesis, web_search, entity extraction. NEVER manages own MCP/quota. Sift delegates to Reach for factual anchors via reach.csapi_check/increment and reach.query rapidapi.

RapidAPI is general-purpose marketplace (Jun 12, 2026) — 203 endpoints (finance, crypto, news, geo, weather, security, social, travel). NOT "local business search." Route via reach.query rapidapi. When one skill uses a narrow slice of a tool, don't let that define the tool for all skills. Always check canonical source/definition of a multi-skill shared tool.

Support File Map

File	When to read
`references/sources/index.md`	Before picking which source to use for a query; when checking routing hints
`references/sources/<slug>.md`	Before building a query for a specific source; when you need actions, params, response shape, or pitfalls
`references/account_provisioning.md`	Before registering at a source that requires an account; when creating API keys
`references/usage_tracking.md`	When checking quota status or usage counts; when debugging rate-limit issues
`references/source-evaluation-framework.md`	Before adding a new source to the registry; when evaluating candidate APIs
`references/storage-layout.md`	When inspecting or configuring the on-disk data and journal directories
`references/okrs.md`	When reviewing OKR definitions or scoring skill performance
`references/api-mine-cron-notes.md`	When debugging api-mine cron behavior; when "0 new APIs" result needs interpretation

Recovery Behavior

This skill implements the recovery contract from spec-ocas-recovery.md.

Evidence: Every query run writes an evidence record to {agent_root}/commons/data/ocas-reach/evidence.jsonl, including no-op runs. The not_activity_reason field is mandatory when no side effects occur.
Gap detection: On every wake, checks the evidence log. If gap exceeds 24h for update cron, logs gap_detected.
Degraded mode: When external APIs are unavailable, logs degraded: <api> and returns partial results with error envelope.
Log compaction: Evidence and usage logs older than 30 days compacted. Last 7 days retained.

Storage Layout

See references/storage-layout.md for the full directory structure.

Journal Outputs

Every reach.query run emits an Observation Journal. Account-registration runs (when Reach signs up at a new source) emit an Action Journal because side effects are involved (form submission, key storage).

Journal payload includes: source, action, params, outcome (success / auth_missing / source_error / parse_error / quota_blocked), and result_meta (extracted meta / citation / quality / http_status from the response — not the bulk payload).

Background tasks

Job	Mechanism	Schedule	Command
`reach:update`	cron	`0 0 * * *`	Self-update from GitHub source
`reach:api-mine`	cron	`0 4 * * *`	Scan sessions for sites with APIs → `references/discovered-apis.md`

Source Discovery (reach:api-mine)

The reach:api-mine cron job scans all session transcripts (not just research) for sites, services, databases, and archives that are used or needed by any skill. For each site found, it evaluates:

Does it have an API? — Check for programmatic access (REST, GraphQL, etc.)
Can I access it? — Free/freemium/paid? API key required? Account signup possible?
What data does it provide? — What endpoints exist? What can you search/retrieve?
Does it belong as a preferred source? — Is the API meaningfully better than the current access method (web scraping, SearXNG, browser) for the skill that found it?

A site is only cataloged if it has a confirmed working API, the data is useful to an active skill/workflow, and the API is better than the current approach. Deduplicate against both sources.yml and references/discovered-apis.md.

Catalog: references/discovered-apis.md — fully evaluated candidates ready for integration into sources.yml.

Key principle: Not a general search index. Only sites that specific skills already use or need, where an API would reduce friction vs. the current method.

reach.init registers reach:update on first invocation. No operational background tasks beyond self-update — Reach is purely reactive to user/agent queries.

Session-retention limitation (Jun 18, 2026) — The session database (via session_search) only retains recent sessions (typically 48-72h of FTS5-indexed content). Older research sessions — even those with significant API discoveries — become unsearchable once they age out. This means:

The api-mine cron can only discover APIs from sessions that are still in the active session store
If a session produced API discoveries but was compacted/aged out before the cron ran, those discoveries are lost
Mitigation: When a session produces an API discovery, immediately write it to references/discovered-apis.md (don't rely on the cron to catch it later)
A "0 new APIs" result from the cron is NORMAL and expected when sessions are current — it means the catalog is up-to-date, not that the cron is broken

Self-Update

See references/self-update-reach.md.

Validation rules

A query is valid when:

source is registered in sources.yml
The action is supported by the source
Required env var is present (or auth: optional)
The daily/monthly quota has remaining budget

Otherwise return an explicit error envelope and write the failure to usage.jsonl with the appropriate status. Do not fall through to a different source silently — the operator will not know which source actually answered.

Gotchas

See references/gotchas.md for all operational pitfalls including quota exhaustion, User-Agent requirements, demo mode, rate-limit handling, account creation, non-commercial sources, and silent fall-through prevention.

Ontology types

Reach does not extract entities or emit Signals to Chronicle. It is a query passthrough — entity extraction from returned data is the responsibility of the calling skill.

OKRs

See references/okrs.md.

Visibility

public