name: zbeam-content-researcher description: "Builds a factual research brief for a Z-Beam page before writing. Run before any new page, in parallel with zbeam-differentiation-researcher."
Z-Beam Content Researcher
You are building a factual research brief for a single Z-Beam page. This brief is consumed
by zbeam-content-writer alongside the differentiation brief.
The goal is not general information about the topic — it is specific, quantitative, source-attributed data that cannot be reproduced from training knowledge alone. A page built from this brief should be measurably denser in cited technical data than any competitor page for the same keyword.
Input required: topic name, target keyword, and content type (application or location).
Optional input: existingPagePath — path to an existing frontmatter YAML for this slug
(e.g. frontmatter/materials/aluminum-laser-cleaning.yaml). When provided, runs in
adversarial delta mode (see below). When absent, runs standard mode.
Output: save to data/research/content-brief-[slug]-[YYYY-MM-DD].json
Adversarial delta mode — run when existingPagePath is provided
Read references/adversarial-delta-mode.md for the full protocol (Steps A–C, when to use).
Only needed when enriching an existing page — skip for new pages.
Citation-builder fallback check — run FIRST (before research director)
Before any other step, check whether the citation-builder has flagged this slug for targeted re-sourcing:
import json, glob
slug = '[slug]'
fallback_files = sorted(glob.glob(f'data/audit/needs-resourcing-{slug}-*.json'))
if fallback_files:
fallback = json.load(open(fallback_files[-1]))
print(f'⚡ TARGETED RE-SOURCING MODE — citation-builder flagged {len(fallback["resourcingNeeded"])} metrics:')
for m in fallback['resourcingNeeded']:
print(f' {m["metricName"]}: {m.get("suggestedQuery","")[:90]}')
print()
print('Instructions:', fallback.get('instructions',''))
targeted_mode = True
priority_queries = [m['suggestedQuery'] for m in fallback['resourcingNeeded']]
else:
targeted_mode = False
priority_queries = []
If targeted_mode is True: run Track 1 only, using priority_queries as the search strings. Skip all other tracks (2–6). Output the brief as normal — the citation-builder will re-run after and attempt to verify the new sources. Skip the research director check and track performance check below.
Step 0: Prior knowledge check — run FIRST, before any searches
Read skills/shared/session-primer.md for full protocol.
python3 skills/shared/check-prior-knowledge.py --slug [slug]
Act on recommendation:
USE_PRIOR→ load prior brief, skip settled facts, run only net-new searchesSUPPLEMENT→ load prior brief, target searches atresearchGapsonlyFULL_RESEARCH→ no prior knowledge, proceed with full protocol below
Also read relatedFindings in the output — cross-skill intelligence relevant to this topic.
Do not re-search facts already in topFindings.
Research director check — run before any searches
Research director check: Before searching, run:
ls data/research/content-directive-[slug]-*.json 2>/dev/null | sort | tail -1
If a scoped directive exists, load it — use directive.prioritySearches as first searches, directive.skipList to avoid re-sourcing, directive.bayAreaTargets + regulatoryTargets for context tracks, strategyWeighting to prioritize depth. If none exists, run the full general protocol below.
Category-data reuse check (run before any searches)
Before running any search tracks, check whether a category enricher package already exists for this slug's category/subcategory and is less than 90 days old:
import json, glob, os
from datetime import datetime, timezone
category = '[category]' # e.g., "metal"
subcategory = '[subcategory]' # e.g., "non-ferrous"
cat_key = f'{category}-{subcategory}'
pkg_files = sorted(glob.glob(f'data/research/category-data/{cat_key}-*.json'), reverse=True)
category_package = None
if pkg_files:
pkg = json.load(open(pkg_files[0]))
enriched_at = pkg.get('enrichedAt', '')[:10]
days_old = (datetime.now(timezone.utc).date() -
datetime.fromisoformat(enriched_at).date()).days if enriched_at else 999
if days_old <= 90:
category_package = pkg
print(f'Category package found: {pkg_files[0]} ({days_old} days old)')
print(f' techFindings: {len(pkg.get("categoryTechnicalFindings", []))}')
print(f' standards: {len(pkg.get("applicableStandards", []))}')
print(f' applicablePages: {pkg.get("applicablePages", [])}')
else:
print(f'Category package too old ({days_old} days) — run category-enricher first')
else:
print(f'No category package for {cat_key} — running full research')
If category_package is loaded:
- Skip Track 2 (standards) — load
pkg["applicableStandards"]directly - Skip Track 4 (Bay Area industry context) — load
pkg["categoryTechnicalFindings"] - Load
pkg["diverseFAQQuestions"]as FAQ seed (do not re-generate) - Still run Tracks 1, 3, 5, 6 for page-specific quantitative data and regulatory anchors
- Note reused package in brief:
categoryPackageReused: true, packageFile: [filename]
This saves 4–6 searches per page when enricher data is current.
Search track performance check (adaptive — run before any searches)
Read references/track-performance-check.md and execute it. Produces sorted_tracks (execution order) and compressed_tracks (1-probe-only tracks). Update the performance log after the brief is complete using the script at the bottom of that file.
Intelligence freshness check
Load skills/shared/intelligence-utils.md and run load_strategy_weights(). If age > 45 days, flag and continue.
Apply weights when prioritizing search tracks: high weights boost that track's search depth; low weights reduce it. firstMoverWindows from currentSignals → add +1 to related track depth.
Also read data/audit/researcher-track-performance.json if it exists — sort track_order by avgDataPoints descending to front-load high-yield tracks. Skip tracks with avgDataPoints == 0 after 3+ runs.
Research tracks
Read references/search-tracks.md for the full six-track protocol including search queries,
adaptive depth check after Tracks 1–3, and extraction rules per track.
Output format
{
"topic": "[topic]",
"targetKeyword": "[keyword]",
"slug": "[slug]",
"contentType": "application|location",
"researchedAt": "[ISO timestamp]",
"quantitativeDataPoints": [
{
"value": "1.8 J/cm²",
"parameter": "fluence",
"material": "A36 steel",
"contaminant": "mill scale",
"wavelength": "1064nm",
"result": "complete removal in single pass",
"source": "https://...",
"defensibility": "verified"
}
],
"standardsReferenced": [
{
"designation": "SSPC-SP 16",
"description": "Brush-Off Blast Cleaning of Coated and Uncoated Galvanized Steel",
"laserRelevance": "Laser cleaning accepted as equivalent per...",
"source": "https://..."
}
],
"findings": {
"technical": [
{ "claim": "...", "source": "https://...", "defensibility": "verified" }
],
"regulatory": [
{ "claim": "...", "ruleNumber": "...", "threshold": "...", "source": "https://..." }
],
"industryContext": [],
"competitorGaps": [
{ "competitorUrl": "https://...", "missing": ["no fluence data", "no BAAQMD detail"] }
],
"bayAreaContext": []
},
"dataDensityAssessment": {
"quantitativeDataPointsFound": 0,
"meetsMinimumThreshold": false,
"gaps": ["No fluence data found for this contaminant — writer must qualify all parameter claims"]
},
"suggestedAngles": [],
"deadEnds": [
"search query that returned zero usable results — exact string used",
"another failed query"
]
}
deadEnds[] is mandatory: after every track, if a search query returned zero usable results, append the exact query string to deadEnds[]. This prevents future research sessions from repeating failed searches. check-prior-knowledge.py reads this field and surfaces it before any new research begins.
Minimum standard: at least 2 sourced quantitative data points in quantitativeDataPoints; 3 is the target. Each data point must be specific enough to appear inside a claim sentence — a bare unit range with no mechanism or source does not qualify.
If this threshold is not met, set meetsMinimumThreshold: false and list the gaps. The writer uses this flag to calibrate specificity — pages below threshold must carry qualification language and should be flagged for enrichment. Quantity targets exist to ensure specificity; a page with 2 well-sourced figures outranks one with 4 vague ones.
Save to: data/research/content-brief-[slug]-[YYYY-MM-DD].json
Declare runConfig (run after saving brief)
import sys as _sys; _sys.path.insert(0, 'skills/shared')
from registry_utils import append_run, ContentResearcherRunConfig
append_run(
skill='zbeam-content-researcher',
tier=2,
run_config={
'slug': slug,
'contentType': content_type,
'tracksRun': sorted_tracks, # ordered list of tracks that ran
'tracksSkipped': low_yield, # tracks skipped due to low historical yield
'quantitativeDataPoints': len(quantitativeDataPoints),
'verifiedDataPoints': len([d for d in quantitativeDataPoints if d.get('defensibility') == 'verified']),
'meetsMinimumThreshold': assessment.get('meetsMinimumThreshold', False),
'directiveUsed': bool(directive_file), # True if research-director scoped this run
'researchBriefPath': brief_path,
}
)