name: fact-check description: | Single-claim adversarial verdict — supported / refuted / inconclusive with cited evidence. Use when one factual claim needs checking mid-conversation: 'is it true…?', 'really?', '這個說法對嗎' — even common knowledge. Full report → deep-deep-research. version: 0.1.0
fact-check
A lightweight, single-claim adversarial verifier. You hand it one factual
claim; it gathers a little evidence, runs the same adversarial quorum
deep-research uses, and returns a verdict — supported / refuted /
inconclusive with cited evidence and a confidence — not a multi-page
report.
This is the point-check counterpart to deep-research. Where
deep-research fans out across 3–6 angles and dozens of sources to synthesize a
breadth report, fact-check spends a small budget on one claim and stops at
the verdict. It reuses deep-research's verify + quorum primitives directly:
schemas.py, rank.py, prompts.py, dedup.py in this skill's scripts/
are byte-identical copies of deep-research's (kept in sync by a repo-level
MD5 drift check). Only factcheck.py — the Stage-C verdict mapper — is new.
Executor model — who does what
You (the agent running this skill) are the executor. You supply the LLM reasoning, the web tools, and the per-voter fan-out:
- LLM reasoning — you rank search hits, extract supporting/contradicting quotes, and cast the adversarial votes yourself, emitting JSON that conforms to a bundled schema.
- Web search — your host
WebSearchtool. - Web fetch — your host
WebFetchtool. - Fan-out — you dispatch the parallel verifier voters (see below).
The bundled scripts/*.py supply only deterministic logic — prompt text,
JSON schemas, URL-dedup, ranking, quorum, the verdict mapping. They make no
network calls and read no API keys. They are stdlib-only and run with plain
python.
No API key is required. This skill borrows the host agent's own LLM + web tools (your existing subscription) — there is no key to set, no separate program to install, no per-call API cost. The reasoning and I/O ride on the agent you are already in.
Run all python scripts/… commands from this skill's root directory (the
examples below keep the scripts/ prefix, so they resolve as written; the
scripts' flat imports work because Python puts the script's own directory on
sys.path). Do not add a second scripts/ by running from inside
scripts/ — python scripts/factcheck.py from there looks for
scripts/scripts/… and fails.
Portable fan-out convention
Stage B casts VOTES_PER_CLAIM = 3 independent verifier votes. Do these in
parallel by dispatching 3 subagents, per
loom-code:dispatching-parallel-agents: one fresh subagent per
voter_idx, dispatched in a single assistant message with multiple agent
calls so the harness runs them concurrently.
Describe and dispatch this work abstractly as "dispatch N subagents" — do
not hard-code the Claude Code Workflow tool. Stated abstractly, the fan-out
maps onto whatever concurrent-subagent primitive the host agent provides
(Claude Code, Codex, Cursor, …); binding to one harness's workflow primitive
would break agent-portability. The three voters are independent (same claim,
disjoint voter_idx, no shared files), exactly the case the fan-out is for.
Independence is a GATE, not a nicety. The quorum only means something if
the three votes come from independent contexts — separate parallel
subagents, or (if the host cannot parallelize) three fresh sequential
subagent contexts. Reasoning out three "votes" in your own single context
is not a quorum: the votes echo each other and factcheck.py cannot tell
them from one opinion copied three times. If you genuinely cannot achieve
independent fan-out and have to vote in one shared context, you MUST pass
--solo to Stage C — the verdict will be forced to inconclusive
(reason: no-quorum). Faking three same-context votes without --solo is the
gate-bypass this rule exists to forbid. Prefer real fan-out; --solo is a
last resort that yields a deliberately weak verdict.
Stage A — Gather evidence
Spend a small budget confirming and disconfirming the one claim.
Run host
WebSearchonce or twice: one query phrased to confirm the claim, one phrased to disconfirm it (e.g."<claim>"and"<claim> debunked OR false OR contradicted"). Keep it to 1–2 queries — this is a point-check, not a breadth sweep.Rank/filter the raw hits into structured results. Treat the claim as a single search angle. Get the ranking prompt:
python scripts/prompts.py search --angle '{"label":"claim","rationale":"why this checks the claim","query":"<q>"}' --question "<the claim>"(
rationaleis optional but include it — omitting it renders a dangling "claim — " in the prompt.) Reason over it and emit results conforming to:python scripts/schemas.py searchShape:
{results: [...]}, each result carrying at leasturl,title, andrelevance— an enum{high, medium, low}, not a number (passing an integer raisesValueError: unknown relevance).Dedup and cap to a small fetch budget — ≤6 sources for a point-check (well under deep-research's
MAX_FETCH = 15). The dedup script normalizes URLs (stripswww., trailing/, lowercases host+path) sowww.X.com/a/andx.com/acollapse to one:echo '{"results": [...], "seen": {}, "fetch_slots": 6}' | python scripts/dedup.pystdin
{results, seen, fetch_slots}→ stdout{novel, seen, slots}:novelis the deduped, budget-capped sources to fetch.dedup.pyitself never drops high-relevance hits, sonovelmay come back longer than the slot count — that is expected; you still cap fetching at the budget in step 4 (highest-relevance first), not here.For each novel source — highest-
relevancefirst, up to the ≤6 budget; high-relevance overflow past the budget is optional, not mandatory (this is the point-check's small-budget rule — do not fetch every novel source if that blows the budget) — fetch it with hostWebFetchand extract a supporting or contradicting quote bearing on the claim. If a fetch fails (paywall / dead link / HTTP error), the extraction prompt returnsclaims: []withsourceQuality: "unreliable"— drop that one source and continue; a single dead source is not the all-empty case below. Get the extraction prompt:python scripts/prompts.py fetch --source '{"url":"<url>","title":"<title>"}' --label "claim" --question "<the claim>"Reason over the fetched content and emit an extraction object conforming to:
python scripts/schemas.py extractShape:
{sourceQuality, publishDate, claims}— tag each extracted item with itssourceQuality∈ {primary,secondary,blog,forum,unreliable} andimportance∈ {central,supporting,tangential}. Collect the per-source quote, URL, and quality tag into one small evidence pool for the claim.
If no evidence is found, skip to Stage C with an empty verdict list — the
mapper returns inconclusive. Distinguish two empty cases, because they
mean opposite things to a reader:
- No referent — the claim names a specific entity / event / study and the
search finds no trace of it at all (not just thin coverage — nothing).
For a falsifiable, specific, recent claim, that absence is itself
disconfirming (a real version would leave a footprint). Pass
--no-referentto Stage C so the verdict is taggedreason: no-referent. - Insufficient evidence — the topic is real but coverage is thin,
paywalled, or conflicting. Do not pass the flag; the verdict is tagged
reason: insufficient-evidence.
Stage B — Verify (adversarial quorum)
The one claim faces VOTES_PER_CLAIM = 3 independent adversarial voters
whose job is to refute it, each grounded in the Stage-A evidence. Fan out
one subagent per voter_idx (0, 1, 2) per the convention above.
First, reduce the Stage-A evidence pool to one claim object. Stage A
produces a pool of extracted items (multiple sources, each with a quote),
but the verify prompt takes a single {claim, sourceUrl, sourceQuality, quote}. Collapse the pool to one object: set claim to the claim under
check, and sourceUrl / sourceQuality / quote to the single strongest
supporting item in the pool — strongest = highest sourceQuality, ties
broken by importance — or, if nothing supports it, the most relevant
contradicting item. All three voters receive this same object;
they diverge only by --voter-idx and by each independently WebSearch-ing
for counter-evidence (verify checklist step 2), so a narrow representative
quote does not lose the rest of the pool — the voters re-expand it. This
single-object reduction is the point-check's deliberate narrowing: unlike
deep-research, which ranks up to MAX_VERIFY_CLAIMS = 25 extracted claims
and casts 3 votes on each, fact-check casts exactly one 3-voter quorum
on the single top-level claim — which is why Stage C's factcheck.py takes
exactly three verdicts.
For each voter_idx:
Get that voter's prompt (the per-voter
--voter-idxdiversifies the three votes so they do not echo):python scripts/prompts.py verify --claim '{"claim":"<the claim>","sourceUrl":"<url>","sourceQuality":"<tag>","quote":"<quote>"}' --voter-idx <i> --question "<the claim>"The voter reasons (it may run its own
WebSearch/WebFetchto hunt counter-evidence) and emits a verdict conforming to:python scripts/schemas.py verdictShape:
{refuted: bool, evidence, confidence, counterSource?}, whereconfidence∈ {high,medium,low}. A voter that fails or returns nothing is an abstention — record its vote asnull, not as a non-refutation.Collect the three votes into an array (abstentions as
null).
Stage C — Verdict
Map the three votes to the 3-way taxonomy with factcheck.py. Two opt-in
flags, each set only in its specific case: --no-referent when Stage A
found the claim has no referent at all (see Stage A's two-empty-cases note),
and --solo when Stage B's votes did not come from independent contexts
(see Portable fan-out convention → independence gate):
echo '[<verdict>, <verdict-or-null>, <verdict>]' | python scripts/factcheck.py verdict
# claim whose named entity/event has zero footprint:
echo '[]' | python scripts/factcheck.py verdict --no-referent
# votes were reasoned in one shared context (no independent fan-out possible):
echo '[<verdict>, <verdict>, <verdict>]' | python scripts/factcheck.py verdict --solo
stdin: the verdicts array → stdout a JSON object
{verdict, reason, confidence, valid_count, refuted_count}:
supported(reason: quorum-survived) — survives the quorum (rank.quorum_survives: ≥2 valid votes AND fewer thanREFUTATIONS_REQUIRED = 2of them refute it).refuted(reason: quorum-refuted) — ≥REFUTATIONS_REQUIREDvalid votes carryrefuted: true.inconclusive— anything else (all-abstain[null, null, null], fewer than 2 valid votes, or empty input). The valid-count check gates first, so an all-abstain claim is never falsely "supported" on a refuted-count of 0. Thereasonfield splits this bucket:reason: no-referent—--no-referentwas passed: the claim's named entity/event has zero footprint, so absence is disconfirming. Surface this as "no trace of this exists — likely fabricated", not a neutral "couldn't determine". This is the F-003 guard against under-calling hoaxes.reason: no-quorum—--solowas passed: the votes did not come from independent contexts, so the quorum's integrity precondition failed and it cannot conclude (F-004). Say plainly that an independent check could not be run. This gate fires before the others — non-independent votes never drivesupported/refuted.reason: insufficient-evidence— real topic, thin/conflicting evidence: genuine "not enough to judge".
confidence ∈ {high, medium, low} is direction-aware: the strongest
confidence among the votes that drove the verdict (non-refuting for
supported, refuting for refuted); else low. Calibration caveat: when
the claim uses an absolute quantifier (never / always / all / none) but the
evidence only supports a conditional version, cap each voter's confidence
at medium — "honey never spoils" is over-stated even when "sealed honey is
shelf-stable" is well-sourced (factcheck.py cannot see claim scope, so this
is the voter's call).
Return the verdict to the user as a short answer: the claim, the
supported/refuted/inconclusive label, and — when inconclusive — which
reason (a no-referent answer must read as "likely fabricated", not
"undecided"), the confidence, and the cited quotes + source URLs from Stage A
that back it. Do not synthesize a full report — that is deep-research's job.
When a step fails
The scripts/*.py are deterministic and stdlib-only; failures are almost
always a malformed input you can fix and re-run.
- A script errors (traceback / non-zero exit) — it is a bad-input bug, not
a verdict. Re-read the schema (
python scripts/schemas.py <name>), fix the JSON you piped in (common cause: a value outside an enum, e.g. an integerrelevance), and re-run. Never swallow the error and proceed. - A host tool is missing (no
WebSearch/WebFetch) — this skill cannot run without them; say so plainly rather than fabricating evidence. With no evidence gathered, the honest output isinconclusive/reason: insufficient-evidence. - A single voter subagent fails — that vote is an abstention (
null), per Stage B; do not retry forever or substitute a non-refutation.
Script-invocation quick reference
| Stage | Command | stdin → stdout |
|---|---|---|
| A | prompts.py search --angle A --question Q |
— → search prompt |
| A | schemas.py search |
— → search schema |
| A | dedup.py |
{results, seen, fetch_slots} → {novel, seen, slots} |
| A | prompts.py fetch --source S --label L --question Q |
— → fetch prompt |
| A | schemas.py extract |
— → extract schema |
| B | prompts.py verify --claim C --voter-idx I --question Q |
— → verify prompt |
| B | schemas.py verdict |
— → verdict schema |
| C | factcheck.py verdict [--no-referent] [--solo] |
verdicts array → {verdict, reason, confidence, valid_count, refuted_count} |