zbeam-content-researcher

name: zbeam-content-researcher description: "Builds a factual research brief for a Z-Beam page before writing. Run before any new page, in parallel with zbeam-differentiation-researcher."

Z-Beam Content Researcher

You are building a factual research brief for a single Z-Beam page. This brief is consumed by zbeam-content-writer alongside the differentiation brief.

The goal is not general information about the topic — it is specific, quantitative, source-attributed data that cannot be reproduced from training knowledge alone. A page built from this brief should be measurably denser in cited technical data than any competitor page for the same keyword.

Input required: topic name, target keyword, and content type (application or location).

Optional input: existingPagePath — path to an existing frontmatter YAML for this slug (e.g. frontmatter/materials/aluminum-laser-cleaning.yaml). When provided, runs in adversarial delta mode (see below). When absent, runs standard mode.

Output: save to data/research/content-brief-[slug]-[YYYY-MM-DD].json

Adversarial delta mode — run when existingPagePath is provided

Read references/adversarial-delta-mode.md for the full protocol (Steps A–C, when to use). Only needed when enriching an existing page — skip for new pages.

Citation-builder fallback check — run FIRST (before research director)

Before any other step, check whether the citation-builder has flagged this slug for targeted re-sourcing:

import json, glob

slug = '[slug]'
fallback_files = sorted(glob.glob(f'data/audit/needs-resourcing-{slug}-*.json'))
if fallback_files:
    fallback = json.load(open(fallback_files[-1]))
    print(f'⚡ TARGETED RE-SOURCING MODE — citation-builder flagged {len(fallback["resourcingNeeded"])} metrics:')
    for m in fallback['resourcingNeeded']:
        print(f'   {m["metricName"]}: {m.get("suggestedQuery","")[:90]}')
    print()
    print('Instructions:', fallback.get('instructions',''))
    targeted_mode = True
    priority_queries = [m['suggestedQuery'] for m in fallback['resourcingNeeded']]
else:
    targeted_mode = False
    priority_queries = []

If targeted_mode is True: run Track 1 only, using priority_queries as the search strings. Skip all other tracks (2–6). Output the brief as normal — the citation-builder will re-run after and attempt to verify the new sources. Skip the research director check and track performance check below.

Step 0: Prior knowledge check — run FIRST, before any searches

Read skills/shared/session-primer.md for full protocol.

python3 skills/shared/check-prior-knowledge.py --slug [slug]

Act on recommendation:

USE_PRIOR → load prior brief, skip settled facts, run only net-new searches
SUPPLEMENT → load prior brief, target searches at researchGaps only
FULL_RESEARCH → no prior knowledge, proceed with full protocol below

Also read relatedFindings in the output — cross-skill intelligence relevant to this topic. Do not re-search facts already in topFindings.

Research director check — run before any searches

Research director check: Before searching, run:

ls data/research/content-directive-[slug]-*.json 2>/dev/null | sort | tail -1

If a scoped directive exists, load it — use directive.prioritySearches as first searches, directive.skipList to avoid re-sourcing, directive.bayAreaTargets + regulatoryTargets for context tracks, strategyWeighting to prioritize depth. If none exists, run the full general protocol below.

Category-data reuse check (run before any searches)

Before running any search tracks, check whether a category enricher package already exists for this slug's category/subcategory and is less than 90 days old:

import json, glob, os
from datetime import datetime, timezone

category    = '[category]'    # e.g., "metal"
subcategory = '[subcategory]' # e.g., "non-ferrous"
cat_key     = f'{category}-{subcategory}'

pkg_files = sorted(glob.glob(f'data/research/category-data/{cat_key}-*.json'), reverse=True)
category_package = None

if pkg_files:
    pkg = json.load(open(pkg_files[0]))
    enriched_at = pkg.get('enrichedAt', '')[:10]
    days_old = (datetime.now(timezone.utc).date() -
                datetime.fromisoformat(enriched_at).date()).days if enriched_at else 999
    if days_old <= 90:
        category_package = pkg
        print(f'Category package found: {pkg_files[0]} ({days_old} days old)')
        print(f'  techFindings: {len(pkg.get("categoryTechnicalFindings", []))}')
        print(f'  standards: {len(pkg.get("applicableStandards", []))}')
        print(f'  applicablePages: {pkg.get("applicablePages", [])}')
    else:
        print(f'Category package too old ({days_old} days) — run category-enricher first')
else:
    print(f'No category package for {cat_key} — running full research')

If category_package is loaded:

Skip Track 2 (standards) — load pkg["applicableStandards"] directly
Skip Track 4 (Bay Area industry context) — load pkg["categoryTechnicalFindings"]
Load pkg["diverseFAQQuestions"] as FAQ seed (do not re-generate)
Still run Tracks 1, 3, 5, 6 for page-specific quantitative data and regulatory anchors
Note reused package in brief: categoryPackageReused: true, packageFile: [filename]

This saves 4–6 searches per page when enricher data is current.

Search track performance check (adaptive — run before any searches)

Read references/track-performance-check.md and execute it. Produces sorted_tracks (execution order) and compressed_tracks (1-probe-only tracks). Update the performance log after the brief is complete using the script at the bottom of that file.

Intelligence freshness check

Load skills/shared/intelligence-utils.md and run load_strategy_weights(). If age > 45 days, flag and continue.

Apply weights when prioritizing search tracks: high weights boost that track's search depth; low weights reduce it. firstMoverWindows from currentSignals → add +1 to related track depth.

Also read `data/audit/researcher-track-performance.json` if it exists — sort `track_order` by `avgDataPoints` descending to front-load high-yield tracks. Skip tracks with `avgDataPoints == 0` after 3+ runs.

Research tracks

Read references/search-tracks.md for the full six-track protocol including search queries, adaptive depth check after Tracks 1–3, and extraction rules per track.

Output format

{
  "topic": "[topic]",
  "targetKeyword": "[keyword]",
  "slug": "[slug]",
  "contentType": "application|location",
  "researchedAt": "[ISO timestamp]",
  "quantitativeDataPoints": [
    {
      "value": "1.8 J/cm²",
      "parameter": "fluence",
      "material": "A36 steel",
      "contaminant": "mill scale",
      "wavelength": "1064nm",
      "result": "complete removal in single pass",
      "source": "https://...",
      "defensibility": "verified"
    }
  ],
  "standardsReferenced": [
    {
      "designation": "SSPC-SP 16",
      "description": "Brush-Off Blast Cleaning of Coated and Uncoated Galvanized Steel",
      "laserRelevance": "Laser cleaning accepted as equivalent per...",
      "source": "https://..."
    }
  ],
  "findings": {
    "technical": [
      { "claim": "...", "source": "https://...", "defensibility": "verified" }
    ],
    "regulatory": [
      { "claim": "...", "ruleNumber": "...", "threshold": "...", "source": "https://..." }
    ],
    "industryContext": [],
    "competitorGaps": [
      { "competitorUrl": "https://...", "missing": ["no fluence data", "no BAAQMD detail"] }
    ],
    "bayAreaContext": []
  },
  "dataDensityAssessment": {
    "quantitativeDataPointsFound": 0,
    "meetsMinimumThreshold": false,
    "gaps": ["No fluence data found for this contaminant — writer must qualify all parameter claims"]
  },
  "suggestedAngles": [],
  "deadEnds": [
    "search query that returned zero usable results — exact string used",
    "another failed query"
  ]
}

deadEnds[] is mandatory: after every track, if a search query returned zero usable results, append the exact query string to deadEnds[]. This prevents future research sessions from repeating failed searches. check-prior-knowledge.py reads this field and surfaces it before any new research begins.

Minimum standard: at least 2 sourced quantitative data points in quantitativeDataPoints; 3 is the target. Each data point must be specific enough to appear inside a claim sentence — a bare unit range with no mechanism or source does not qualify.

If this threshold is not met, set meetsMinimumThreshold: false and list the gaps. The writer uses this flag to calibrate specificity — pages below threshold must carry qualification language and should be flagged for enrichment. Quantity targets exist to ensure specificity; a page with 2 well-sourced figures outranks one with 4 vague ones.

Save to: data/research/content-brief-[slug]-[YYYY-MM-DD].json

Declare runConfig (run after saving brief)

import sys as _sys; _sys.path.insert(0, 'skills/shared')
from registry_utils import append_run, ContentResearcherRunConfig

append_run(
    skill='zbeam-content-researcher',
    tier=2,
    run_config={
        'slug':                    slug,
        'contentType':             content_type,
        'tracksRun':               sorted_tracks,        # ordered list of tracks that ran
        'tracksSkipped':           low_yield,            # tracks skipped due to low historical yield
        'quantitativeDataPoints':  len(quantitativeDataPoints),
        'verifiedDataPoints':      len([d for d in quantitativeDataPoints if d.get('defensibility') == 'verified']),
        'meetsMinimumThreshold':   assessment.get('meetsMinimumThreshold', False),
        'directiveUsed':           bool(directive_file),  # True if research-director scoped this run
        'researchBriefPath':       brief_path,
    }
)

name: zbeam-content-researcher description: "Builds a factual research brief for a Z-Beam page before writing. Run before any new page, in parallel with zbeam-differentiation-researcher."