decoy-deploy

star 0

DECOY SUP deployment — running ./deploy --decoy [scope flags], 5-phase spinup, behavior.json plumbing, audit semantics, hot-patch path. Three GPU tiers via --gpu {v100,rtx,rtx-a} (V100 → gemma4:26b, RTX 2080 Ti non-A pool → gemma4:e4b on B2R/S2R, RTX 2080 Ti A-pool → same model, separate physical cards). Inputs deployments/decoy-controls/config.yaml + /mnt/AXES2U1/feedback/decoy-controls/{controls (un-namespaced), {preset}_v{version}/{dataset} (feedback, needs --preset)}/. Outputs deployments/decoy-{controls,feedback-...}/runs/{run_id}/. Does NOT cover RAMPART AD enterprise (see /rampart-deploy) or GHOSTS NPC clients (see /ghosts-deploy). Cross-type CLI shape, fail-loud contract, and SSH key matrix live in CLAUDE.md.

LampSteven17 By LampSteven17 schedule Updated 6/10/2026

name: decoy-deploy description: DECOY SUP deployment — running ./deploy --decoy [scope flags], 5-phase spinup, behavior.json plumbing, audit semantics, hot-patch path. Three GPU tiers via --gpu {v100,rtx,rtx-a} (V100 → gemma4:26b, RTX 2080 Ti non-A pool → gemma4:e4b on B2R/S2R, RTX 2080 Ti A-pool → same model, separate physical cards). Inputs deployments/decoy-controls/config.yaml + /mnt/AXES2U1/feedback/decoy-controls/{controls (un-namespaced), {preset}_v{version}/{dataset} (feedback, needs --preset)}/. Outputs deployments/decoy-{controls,feedback-...}/runs/{run_id}/. Does NOT cover RAMPART AD enterprise (see /rampart-deploy) or GHOSTS NPC clients (see /ghosts-deploy). Cross-type CLI shape, fail-loud contract, and SSH key matrix live in CLAUDE.md. type: skill

decoy-deploy

DECOY = synthetic user persona VMs that simulate human computer use. Each SUP is a brain (MCHP / BrowserUse / SmolAgents) optionally driven by a PHASE-tuned behavior.json. Brain naming scheme [Brain][Version].[Model] is in CLAUDE.md.

Inputs deployments/decoy-controls/config.yaml, /mnt/AXES2U1/feedback/decoy-controls/controls/{behavior}/{sup}/behavior.json (baseline, un-namespaced), /mnt/AXES2U1/feedback/decoy-controls/{preset}_v{version}/{dataset}/{behavior}/{sup}/behavior.json (feedback, namespaced 2026-06 — needs --preset), INSTALL_SUP.sh + decoys/ cloned from github at install time
Outputs deployments/decoy-{controls,feedback-{preset}-{dataset}-{scope}}/runs/{run_id}/ (inventory.ini, ssh_config_snippet.txt, deployment_type), per-VM /opt/ruse/deployed_sups/{key}/. {preset} = sanitized full-ns token incl. version (stdctrlsv712), so different lineages/versions don't collide
Manifest manifest.json in PHASE source; loaded via core/feedback.py::load_manifest, validated against deploy type via validate_manifest_target
Upstream PHASE feedback engine (feedback_engine.baseline writes controls/; feedback_engine.decoy_generator writes {dataset}/)
Downstream PHASE Zeek pipeline (PHASE.py --decoy), reads experiments.json for active deploys (carries dataset/scope/gpu_tier descriptive fields since 2026-05-22, + preset sanitized-namespace token since 2026-06; see CLAUDE.md "experiments.json schema"). Dredge is a HISTORICAL op — it iterates EVERY decoy entry (active + torn-down), not just live ones, re-dredging stored Zeek conn.log bounded by [start_date, end_date]. Gotcha: stage2_dredge.py does exit 1 (aborts the WHOLE run) when an entry's window yields 0 conn rows. A same-day deploy/teardown (start_date == end_date, no traffic captured) leaves a 0-row entry that kills the pipeline. RUSE-side fix = remove the invalid entry from experiments.json (back it up first; the key is RUSE-owned state, NOT PHASE); durable fix = PHASE skip-on-empty in stage2_dredge.py (PHASE-side, can't touch). Confirmed 2026-06-07: a June-3 decoy-feedback-expctrlsv716-vt50g-all (0-day span) aborted PHASE at [14/26]; removed it, run cleared.
Narrow exceptions C0 (no install — bare Ubuntu, only provisioned + SSH-tested); M0 (upstream MITRE pyhuman, crash-loops on Linux by design — os.startfile() Windows-only)

Topology

decoy-controls/config.yaml provisions 9 VMs (gemma-only; V100 + RTX + CPU pairs):

d-{dep_id}-C0-0          Bare Ubuntu control (no software)
d-{dep_id}-M0-0          Upstream MITRE pyhuman (read-only control)
d-{dep_id}-M1-0          MCHP baseline (no timing, no LLM)
d-{dep_id}-B0-gemma-0    BrowserUse + gemma4:26b on V100
d-{dep_id}-S0-gemma-0    SmolAgents  + gemma4:26b on V100
d-{dep_id}-B0R-gemma-0   BrowserUse + gemma4:e4b on RTX 2080 Ti (flavor rtx2080ti-A-1gpu.14vcpu.28g)
d-{dep_id}-S0R-gemma-0   SmolAgents  + gemma4:e4b on RTX 2080 Ti (flavor rtx2080ti-A-1gpu.14vcpu.28g)
d-{dep_id}-B0C-gemma-0   BrowserUse + gemma4:e2b on CPU
d-{dep_id}-S0C-gemma-0   SmolAgents  + gemma4:e2b on CPU

B0R/S0R baseline the RTX feedback tiers. R-tier reads its OWN baseline (2026-06-12): B2R.gemma → B.gemma/B0R.gemma, S2R.gemma → S.gemma/S0R.gemma — the behavior_dir is B.gemma/S.gemma (family dir; the regex [A-Z]* consumes the R) but the baseline_config keeps its R. The old R-strip (B2R→B0) was removed in both _derive_behavior_paths (spinup.py) and distribute-behavior-configs.yaml because PHASE now emits distinct B0R/S0R configs (differ from B0/S0 by seed/pools/behavior-modifiers/persistent_sessions on-off — 59-141 leaf keys); the strip shipped wrong-tier V100 behavior to the RTX SUPs. The VMs are still correctly on RTX hardware + gemma4:e4b; only the behavior.json content was wrong pre-fix. Placed on the rtx-a pool (rtx2080ti-A-1gpu) 2026-05-26 because the non-A rtx pool was full with sum25+vt1g feedback — the axyear rtx-a feedback deploy was dropped to make room (net-zero on rtx-a).

Feedback template (5 VMs per ./deploy --decoy --feedback). Shape varies by --gpu tier:

V100 tier (default, --gpu v100):

d-{dep_id}-M2-0          MCHP + PHASE timing
d-{dep_id}-B2-gemma-0    BrowserUse + gemma4:26b on V100  (flavor v100-1gpu.14vcpu.28g)
d-{dep_id}-S2-gemma-0    SmolAgents  + gemma4:26b on V100  (flavor v100-1gpu.14vcpu.28g)
d-{dep_id}-B2C-gemma-0   BrowserUse + gemma4:e2b on CPU
d-{dep_id}-S2C-gemma-0   SmolAgents  + gemma4:e2b on CPU

RTX tier (--gpu rtx, dep_name suffix -rtx):

d-{dep_id}-M2-0           MCHP + PHASE timing
d-{dep_id}-B2R.gemma-0    BrowserUse + gemma4:e4b on RTX 2080 Ti  (flavor rtx2080ti-1gpu.14vcpu.28g, PCI alias rtx2080ti:1)
d-{dep_id}-S2R.gemma-0    SmolAgents  + gemma4:e4b on RTX 2080 Ti  (flavor rtx2080ti-1gpu.14vcpu.28g, PCI alias rtx2080ti:1)
d-{dep_id}-B2C-gemma-0    BrowserUse + gemma4:e2b on CPU
d-{dep_id}-S2C-gemma-0    SmolAgents  + gemma4:e2b on CPU

RTX A-pool tier (--gpu rtx-a, dep_name suffix -rtx-a):

d-{dep_id}-M2-0           MCHP + PHASE timing
d-{dep_id}-B2R.gemma-0    BrowserUse + gemma4:e4b on RTX 2080 Ti  (flavor rtx2080ti-A-1gpu.14vcpu.28g, PCI alias 2080ti-rtx-a:1)
d-{dep_id}-S2R.gemma-0    SmolAgents  + gemma4:e4b on RTX 2080 Ti  (flavor rtx2080ti-A-1gpu.14vcpu.28g, PCI alias 2080ti-rtx-a:1)
d-{dep_id}-B2C-gemma-0    BrowserUse + gemma4:e2b on CPU
d-{dep_id}-S2C-gemma-0    SmolAgents  + gemma4:e2b on CPU

RTX and RTX-A use identical B2R.gemma / S2R.gemma behavior keys and the same gemma4:e4b runtime model. Only the OpenStack flavor differs — they map to two distinct physical card pools (separate PCI aliases). The -rtx vs -rtx-a deployment-name suffix lets the OpenStack provision calls land on either pool without VM-name collision, so you can fan across pools when one is exhausted. Each tier deploy is its own independent experiments.json entry — no automatic linkage. If you want to swap sum25 from V100 to RTX-A, run ./teardown on the V100 deployment first, then deploy the RTX-A one; both stay registered independently for as long as their VMs exist.

Plus d-{dep_id}-neighborhood-0 sidecar (feedback only, when any topology_mimicry rate is non-zero).

dep_id = {name_no_hyphens}{run_id} where run_id is MMDDYYHHmmss.

CLI scope flags

# --preset {preset}_v{version} REQUIRED whenever feedback is in scope (2026-06).
./deploy --decoy --preset std-ctrls_v7.1.2                          # controls + ALL feedback in that ns
./deploy --decoy --controls                                        # controls only (no --preset needed)
./deploy --decoy --feedback --preset std-ctrls_v7.1.2              # all feedback in ns (no controls)
./deploy --decoy --feedback --preset std-ctrls_v7.1.2 --target sum24   # single dataset
./deploy --decoy --feedback --preset std-ctrls_v7.1.2 --target sum24,axyear,vt50g  # batch on one tier
./deploy --decoy --feedback --source /path                        # explicit PHASE source (path encodes ns; no --preset)
./deploy --decoy --controls --preset std-ctrls_v7.1.2 --target sum24    # controls + single feedback
./deploy --decoy --feedback --preset std-ctrls_v7.1.2 --gpu rtx --target sum24   # RTX (PCI alias rtx2080ti:1)
./deploy --decoy --feedback --preset exp-ctrls_v7.1.6 --gpu rtx-a --target axall # other lineage + A-pool
./deploy --decoy --exp1 --preset exp-ctrls_v7.1.6                 # static tier plan (see below)

Static tier plans (--exp1, 2026-06-10)

Named operator-curated dataset→tier assignments in core/feedback.py::TIER_PLANS, sized to the physical GPU pools (totals are NOT queryable from OpenStack — operator knowledge baked in). exp1:

Tier Datasets Cards
v100 2025, axall, axyear, fall24, fall25, spr25, sum24, sum25 16 of 19 (controls hold 2, 1 spare)
rtx cptc8, cptc9 4 of 4
rtx-a vt50g 2 of 4 (controls B0R/S0R hold 2)

Rev 2026-06-16: the two non-A rtx slots were swapped from vt1g/vt10g to cptc8/cptc9 (operator decision) — same 4-card footprint. cptc now ships physical connection_shape byte/duration targets (cptc9 also conn_state_mix), so it's a real Phase-1 shape target — this supersedes the old "structurally unreachable" framing (project_service_mix_targets, service-mix era). Its high target_conn_per_minute (185–208) WILL show a red BG/volume audit column — that's a D4-floor measurement artifact (RUSE-fact), see /decoy-audit. Open question (2026-06-19): whether VOLUME matters to cptc's exp-model score is unresolved and PHASE-side — cptc9 is flagged coverage-limited, so do NOT assume "red BG = fine for the score." Resolves on PHASE's re-infer. See /feedback-investigation. vt1g/vt10g are no longer in exp1 — deploy them via --target if wanted. --exp1 implies --feedback, requires --preset, and conflicts with --target/--source/--gpu (each task carries its own gpu_tier, shown as [tier] in the task label and tier= in the plan confirm). Resolution is fail-loud per dataset; the whole plan aborts if any target is missing from the namespace. Tasks run sequentially like any batch. To change the split, edit TIER_PLANS (one dict) — new plans get their own flag wired in __main__.py.

Feedback namespace {preset}_v{version} (--preset, 2026-06)

Canonical reference (rampart/ghosts skills point here). PHASE feedback lives one level deeper, under a lineage namespace inserted between {type}-controls and {dataset}:

OLD:  /mnt/AXES2U1/feedback/{type}-controls/{dataset}/...
NEW:  /mnt/AXES2U1/feedback/{type}-controls/{preset}_v{version}/{dataset}/...
  • --preset NS REQUIRED for any feedback deploy (--feedback, --target, or the default controls+all-feedback). Missing/not-found → fail-loud, lists available namespaces. NOT needed for --controls-only or --source /full/path (the explicit path already encodes the ns). controls/ stays un-namespaced ({type}-controls/controls/, reached via config.yaml behavior_source).
  • Mechanism: core/feedback.py::_type_root(deploy_type, preset) folds the preset into root resolution once → {type}-controls/{preset}. The discovery functions, the resolved source Path, and every downstream spinup/distribute glob inherit the namespace transparently — no per-site path edits. Config JSON schemas (behavior/user-roles/timeline) are UNCHANGED.
  • Lineage-assert: spinup compares each config's stamped _metadata.model_preset/model_version (DECOY) / _phase_metadata.* (RAMPART/GHOSTS) to the deployed ns and aborts on mismatch (absent stamp defers, per the manifest-optional contract).
  • Collision-safety: the deploy NAME stamps the FULL ns incl. version (sanitized [a-z0-9] via _ns_preset_token from source_dir.parent.name) — std-ctrls_v7.1.2decoy-feedback-stdctrlsv712-{ds}-{scope} vs _v7.1.5…stdctrlsv715…. So two lineages OR two versions of one dataset get distinct deployment_name → run_dir → VM prefix (dep_id) → experiments.json key and coexist (no idempotent-refresh teardown clash). experiments.json carries a preset attr (sanitized token) per entry via deploy_metadata.derive_metadata.
  • Hard cutover: RUSE read-side + PHASE write-side migration land together — until PHASE creates the {ns} dirs on the mount, feedback deploys fail-loud (intended). Confirm w/ PHASE: stamp field names (model_preset/model_version), manifest.json placement (RUSE assumes per-dataset {ns}/{dataset}/manifest.json), and that ablation/ is a _metadata field, not a directory RUSE reads.

GPU tier selection via --gpu {v100,rtx,rtx-a} (default v100). RTX tiers swap B2.gemma/S2.gemma → B2R.gemma/S2R.gemma and the V100 flavor → RTX 2080 Ti flavor; M2 + B2C.gemma + S2C.gemma stay identical across tiers. The two RTX tiers target distinct physical card pools — when one pool is exhausted (No valid host was found on B2R/S2R provision), switch to the other. PHASE feedback is portable across gemma4 variants so the same .gemma/ source ships behavior.json for V100, RTX, and RTX-A deploys with no re-roll.

Granular per-config-file flags (--timing, --workflow, --modifiers, --sites, --prompts, --activity, --diversity, --variance, --all-feedback) were removed when PHASE consolidated to a single behavior.json per SUP. There's no longer a per-file filter to apply.

Batch is the default when --feedback is given without a single-target selector. CLI scans /mnt/AXES2U1/feedback/decoy-controls/, prompts confirmation, then deploys each task sequentially. No cross-deploy parallel fan-out — --parallel was removed 2026-05-11 (operator preference: clean inline output and easier debugging beat the wall-time win).

Dataset target aliases (core/feedback.py::DATASET_TARGETS): sum24summer24, spr25spring25, vt1gvt-fall22-1gb, vt50gvt-fall22-50gb, cptc8cptc8-23, axallaxes-all, 2025axes-2025. Resolution is substring against /mnt/AXES2U1/feedback/decoy-controls/.

Deploy plan / confirm (core/plan.py::show_plan_and_confirm)

Before provisioning, the CLI prints a per-task plan and asks y/N. Each task renders a manifest summary (target env, preset, source path, generated_at_utc + age) AND a VMs to provision table (Behavior/Brain/Flavor/LLM model):

  • Feedback tasks: table from the GPU-tier template (FEEDBACK_TEMPLATES_BY_TIER[gpu_tier], 5 VMs) — tier= shown.
  • Controls task: table from decoy-controls/config.yaml's deployments (9 VMs: C0/M0/M1 + V100 B0/S0 + rtx-a B0R/S0R + CPU B0C/S0C) via config_vm_table_lines (added 2026-06; C0/M0 special-cased as bare ubuntu/MITRE pyhuman, not brain SUPs).

Quirk: a plan that is a single controls-only task auto-proceeds with NO y/N (if n==1 and is_controls: return True — "nothing to confirm"); it still prints the plan first. To force the gate on controls, bundle it with feedback (--controls --feedback …) so the plan is multi-task. A manifest.target ≠ deploy-type aborts the whole plan.

Spinup phases (decoy/spinup.py)

  1. _validate_behavior_source — walk every non-C0/M0 SUP's expected {behavior_dir}/{baseline_config}/behavior.json, abort with missing-path list before any VM work 0.5. _teardown_matching_prior_runs — for each runs/{old_rid}/ whose saved config.yaml has SAME gpu_tier AND SAME deployments[] list as the new config, openstack-delete its VMs (wait_until_zero) and safe_rmtree the prior run_dir. Makes ./deploy idempotent against the same logical deploy; orphan accumulation across reruns goes away. Mismatching prior runs are left intact (operator can ./teardown).
  2. Provision VMs (provision-vms.yaml) — abort if < 90% reach ACTIVE
  3. SSH connectivity test (Python concurrent.futures, 20 workers) — abort if < 90% reachable
  4. Install (install-sups.yaml) — stage1 system deps → reboot (exit 100) → stage2 brain deps + systemd service. INSTALL_SUP.sh runs with RUSE_NO_SERVICE_START=1 so the service is enabled but NOT started — distribute starts it (next phase) once behavior.json is on disk. M0 is started here (it skips distribute, expected to crash on Linux). C0 skipped.
  5. Distribute behavior configs (distribute-behavior-configs.yaml) — copy + JSON-validate + on-VM stat assert, then systemd state=started, poll up to 30s for state=active AND NRestarts ≤ 5, abort if either fails. With this ordering NRestarts stays at 0 on a clean deploy (pre-fix it sat at 60-100 from crash-loops in the install→distribute gap). C0/M0 skip via meta: end_host.
  6. Neighborhood sidecar (feedback only, gated on non-zero topology_mimicry)
  7. SSH config install (install_ssh_config() writes block to ~/.ssh/config) + PHASE register (return False → abort with return 1)

Run outcome stamp (2026-06-05): run_dir/deploy_status.json is written failed right after run_dir.mkdir and flipped to ok only on the final clean return (install_result.rc == 0). Any phase abort / exception / kill leaves it failed./teardown --decoy --failed targets it. Runs from before 2026-06-05 are unstamped (unknown) → not matched; use positional teardown or retro-stamp them. See core/run_status.py + the /teardown skill.

Service naming

{behavior_lowercase}.service with dots → underscores:

  • M1m1.service
  • B0.gemmab0_gemma.service
  • S2C.gemmas2c_gemma.service
  • B2R.gemmab2r_gemma.service (RTX, both pools)
  • S2R.gemmas2r_gemma.service (RTX, both pools)

Per-behavior service, NOT generic mchp / bu / smol. Logs redirect to /opt/ruse/deployed_sups/{key}/logs/systemd.log and systemd_error.log — use tail, not journalctl -u.

MCHP maintenance cron (auto-installed for M-brain VMs to mitigate Selenium/pyautogui memleak):

  • 0 3 * * * systemctl restart {svc}.service — daily restart at 03:00 UTC
  • 0 4 * * 0 /sbin/reboot — weekly reboot Sunday 04:00 UTC

SSH access

ssh d-controls050826193122-M1-0
ssh d-controls050826193122-B0-gemma-0 "systemctl status b0_gemma"

# Brain output (NOT journalctl)
ssh d-controls050826193122-B0-gemma-0 \
  "sudo tail -f /opt/ruse/deployed_sups/B0.gemma/logs/systemd.log"

# Structured agent log
ssh d-controls050826193122-B0-gemma-0 \
  "tail -f /opt/ruse/deployed_sups/B0.gemma/logs/latest.jsonl | jq ."

behavior.json schema (PHASE-emitted)

BehavioralConfig.load slices the file into 9 dataclass fields, no key renaming. See decoys/common/behavioral_config.py for the loader; consumers match the shape PHASE emits verbatim.

{
  "_metadata": {"source", "sup_config", "dataset", "current_score", "target_score",
                "generated_at", "mode", "ablation_gate", "timezone": "UTC",
                "seed": int},  // optional; PHASE-emitted, overrides CLI --seed default
                               // via peek_seed() in sup/__main__.py
  "timing": {
    "active_minute_windows": [[start_min, end_min), ...],   // hard 0/1 schedule
    "target_conn_per_minute_during_active": 7.0,
    "min_window_minutes": 15,
    "hard_fence_seconds": 90,
    "burst_percentiles": {
      "connections_per_burst":  {"5","25","50","75","95","max"},
      "idle_gap_minutes":       {"5","25","50","75","95"},
      "burst_duration_minutes": {"5","25","50","75","95"}
    },
    // hourly_distribution / activity_probability_per_hour / long_idle_*
    // were the pre-window soft schedule. Window-mode (2026-05-08)
    // replaced them with active_minute_windows + per-minute target rate.
    // PHASE no longer emits them. RUSE defaults hourly_fractions to
    // uniform [1/24]*24 if absent — windows gate the real schedule.
    "variance": {
      "cluster_size_sigma": 0.5, "idle_gap_sigma": 0.5,
      "hourly_std_targets": {
        "volume":   {"hourly_std_target": [24 floats]},
        "duration": {"hourly_std_target": [24 floats]}
      }
    }
  },
  "content": {
    "workflow_weights": {"BrowseWeb": 0.3, "GoogleSearch": 0.22, ...},
    "site_categories":  {"lightweight": 0.55, "medium": 0.3, "heavy": 0.15},
    "download_url_pool": ["https://...", ...],
    "whois_domain_pool": ["wikipedia.org", ...]
  },
  "behavior": {
    "page_dwell": {"min_seconds": 2, "max_seconds": 43},
    "navigation_clicks": {"min": 10, "max": 30},
    "keep_alive_probability": 0.8,
    "max_steps": 10,
    "enable_whois": true,
    "enable_download": true
  },
  "diversity": {
    "background_services": {
      "dns_per_hour": [24 ints], "http_head_per_hour": [24 ints],
      "ntp_checks_per_day": 4
    },
    "workflow_rotation": {"max_consecutive_same": 2, "min_distinct_per_cluster": 3},
    "topology_mimicry": {"inbound_smb_per_hour": ..., ...},
    "persistent_sessions": {                          // PersistentSession daemon (2026-06-11)
      "enabled": true,
      "session_opens_per_hour": [24 ints, UTC],       // new ssl session opens/hr; non-zero hours = active-hours envelope (day/night gate)
      "keepalive_interval_seconds": 45,               // PHASE upper bound; RUSE clamps actual send to <=10s
      "session_duration_seconds": 120,                // single target (median); RUSE owns lognormal spread, floors at 2s
      "orig_bytes_per_session": 2000,                 // FALLBACK scalar (2026-06-16) — connection_shape.orig_bytes preferred when present
      "endpoint_pool": ["https://...", ...]           // ~8 live external https (TCP-TLS) sites
    },
    "connection_shape": {                             // NEW Phase 1 (2026-06-16); absent → OFF, scalar fallback
      "enabled": true,
      "orig_bytes": {"p25":.., "p50":.., "p75":.., "p90":.., "max":..},  // ACTUATED (persistent-session per-conn sampling)
      "duration":   {"p25":.., "p50":.., "p75":.., "p90":.., "max":..},  // ACTUATED (per-conn lifetime)
      "orig_pkts":  {"p25":.., "p50":.., "p75":.., "p90":.., "max":..},  // parsed, NOT actuated → packetization build #3
      "resp_bytes": {"p25":.., "p50":.., "p75":.., "p90":.., "max":..},  // parsed, NOT actuated → response-endpoint build #2
      "resp_pkts":  {"p25":.., "p50":.., "p75":.., "p90":.., "max":..}   // parsed, NOT actuated → build #2
    },
    "conn_state_mix": {"SF":.., "failed_conn":.., "OTH":.., "RSTR":..}   // NEW Phase 1; FOLD: REJ+S0→failed_conn. ACTUATED: failed_conn rate via scripted_services. SF = uncontrolled baseline (reference only)
  },
  "prompt_content": "... optional free-form prompt guidance ..."
}

_metadata.mode{baseline, dumb_baseline, None}. Baseline mode emits a degenerate timing schema; emulation_loop._reload_behavioral_config detects via fc.mode in {"baseline", "dumb_baseline"} OR by schema sniff (burst_percentiles.burst_duration_minutes is not a dict) and skips CalibratedTiming/variance/activity setup. Workflow gating + content pools still honored.

Per-flag workflow gating

behavior.behavior.{enable_whois, enable_download} controls workflow registration. PHASE feedback_engine.baseline emits both false (controls = single-workflow degenerate mode); feedback proper emits both true (or omits, defaulting true).

Brain Both flags False Both flags True
Smol BrowseWeb, WebSearch, BrowseYouTube (3) + WhoisLookup, DownloadFiles (5)
BU BrowseWeb, WebSearch, BrowseYouTube (3) + WhoisLookup, DownloadFiles (5)
MCHP 7 baseline (no whois, no download) + WhoisLookup, DownloadFiles

Mechanism:

  • Smol/BU loaders — load_workflows(enable_whois=, enable_download=)
  • MCHP — BEHAVIOR_GATED_WORKFLOWS = {'download_files.py': 'enable_download', 'whois_lookup.py': 'enable_whois'} map; _load_workflows skips files whose flag is False
  • All 3 brains read flags via common.behavioral_config.load_workflow_gates(config_dir)

WhoisLookup + DownloadFiles bypass the Agent's tool-decision loop:

  • Smol — dedicated workflow, ONE LiteLLMModel picker → domain/URL from PHASE pool
  • BU — dedicated workflow, ONE Ollama HTTP picker (loopback 127.0.0.1:11434, invisible to Zeek), browser never invoked
  • MCHP — random.choice(pool) no-LLM picker

Helpers in decoys/common/network/: whois.py, downloader.py, probes.py, neighborhood_traffic.py, youtube.py. Brain workflow files import directly — no cross-brain imports.

BrowseYouTube real streaming (2026-06-04)

All three brains now generate REAL video traffic from content.youtube_video_pool, each in-character (NOT unified onto one engine). Before this, none streamed:

  • MCHP / Firefox: autoplay was blocked → player never started. Fix: webdriver_helper.py sets media.autoplay.default=0 → Firefox streams (state 1, currentTime advances). Suggested-video selector fixed: watch-page recommendations live at #secondary a[href*="watch"] (the old By.ID 'video-title' only matched the SEARCH page, returned 0 on watch pages).
    • suggested_videos empty warning fix (2026-06-13, M-brain incl controls M1): the warning was a FALSE ALARM — NOT pool rot (warned videos oEmbed-200 alive, markup-identical to ones that succeed). Real cause = intermittent geckodriver 0.34/Firefox-151 Marionette instability. Fixes: (1) pin geckodriver 0.34.0 → 0.37.0 in INSTALL_SUP.sh = THE fix (canary repro on 0.37: warned video renders 40 #secondary links at 5s); (2) suggested-sidebar wait 5s → 10s (match search path; defensive — 5s was NOT universally short); (3) empty case [WARNING] → [INFO] + logger.info (was step_error, flooded the audit). Stale "dead video" comment removed.
  • BU / Chromium: needed --autoplay-policy=no-user-gesture-required. The 4 duplicate BrowserSession(args=[...]) lists were dedup'd into brains/browseruse/config.py::CHROMIUM_ARGS (single source of truth) with the autoplay arg added.
  • Smol / yt-dlp: no browser by design. In pool mode it now STREAMS the real media over HTTP via common/network/youtube.stream_youtube_video (yt-dlp resolves the googlevideo URL; ~30 MB byte-capped fetch) → genuine CDN traffic, in-character with its HTTP nature. No-pool falls back to the DuckDuckGo research path. Also fixed a name-match bug in smolagents/loop.py ("BrowseYoutube""BrowseYouTube") that meant the pool was NEVER wired into Smol. INSTALL_SUP.sh adds yt-dlp to Smol deps.

Dead-video guard (common/network/youtube.py, all three): the PHASE pool rots (~30% deleted/private). pick_available_video oEmbed-checks each pick (HTTP 401/404 → skip) before navigating/streaming. Root fix is PHASE-side (re-validate the pool at emit time); this is RUSE-side defense-in-depth.

Distribute flow (distribute-behavior-configs.yaml)

  1. Derive baseline config key from versioned key: B2C.gemma → B0C.gemma, B2R.gemma → B0R.gemma, S2R.gemma → S0R.gemma, M2 → M1 (the R-strip was removed 2026-06-12 — R-tier reads its own B0R/S0R, see Topology note)
  2. Resolve {feedback_source}/{behavior_dir}/{baseline_config}/behavior.json
  3. python3 -m json.tool validate on localhost — corrupt aborts before shipping
  4. Copy to /opt/ruse/deployed_sups/{key}/behavioral_configurations/behavior.json
  5. Assert file on disk after copy

Runs for ALL non-C0/M0 SUPs — controls' decoy-controls/config.yaml points behavior_source at /mnt/AXES2U1/feedback/decoy-controls/controls so baselines flow through the same path as feedback.

The controls/ slot is excluded from feedback dataset auto-discovery via core/feedback.py::BASELINE_DATASET_SLOTS = {"controls"} in three call sites: find_all_feedback_sources, auto_detect_feedback_source, find_feedback_by_target. To force PHASE re-roll the baseline: rm -rf /mnt/AXES2U1/feedback/decoy-controls/controls/.

LLM models

Alias Ollama tag Tier Notes
gemma gemma4:26b V100 32GB MoE 25.2B/3.8B active, fits 89% VRAM, ~10 tok/s. Used by B2.gemma / S2.gemma (V100 feedback) and B0.gemma / S0.gemma (V100 controls).
gemmar gemma4:e4b RTX 2080 Ti 11GB Edge 4B variant (~3 GB int4 weights, ~10 GB loaded with KV cache). Used by B2R.gemma / S2R.gemma on both --gpu rtx and --gpu rtx-a deploys. Same model across both pools — only the underlying flavor / PCI alias differs.
gemmac gemma4:e2b CPU only Edge-optimized 2.3B, ~7 tok/s on Smol; BU on CPU times out on big prompts. Used by B2C.gemma / S2C.gemma + B0C.gemma / S0C.gemma.
llama llama3.1:8b (legacy) Kept for back-compat, not in any deploy template

Three gemma4 tiers (V100 / RTX / CPU) keep results structurally comparable — same family, different VRAM-fit variants. PHASE-shipped .gemma/ feedback is portable across all three.

Brain framework versions are PINNED (INSTALL_SUP.sh): browser-use==0.12.7, smolagents==1.25.0. The step-action log parser is keyed to each version's action/tool vocabulary — _BU_ACTION_MAP (brains/browseruse/agent.py) and _SMOL_ACTION_PATTERNS (common/logging/llm_callbacks.py). These libs rename actions between versions (browser-use's pre-0.12 go_to_url/click_element → 0.12 navigate/click), so an unpinned bump silently zeroes out per-step logging (confirmed 2026-05-25: ~99% of BU steps dropped). When bumping a pin, re-derive the maps from a live VM's emitted action names and update both in lockstep. A [parser-drift] [WARNING] (caught by ./audit's Warn column) fires if N consecutive responses parse but map to nothing.

Aliases must agree across four call sites: INSTALL_SUP.sh::MODEL_NAMES, decoys/common/config/model_config.py::MODELS, runner argparse choices=[...] in all three of run_browseruse.py, run_smolagents.py, run_mchp.py. Adding a new alias and missing any runner argparse silently crashes the SUP at startup — INSTALL_SUP.sh generates run_agent.sh with --model={alias}, the runner rejects it with argument --model: invalid choice, the service crash-loops, and NRestarts blows past the install-time 30s grace before distribute-behavior-configs.yaml's service-active assertion catches it (observed when gemmar was added to MODEL_NAMES + model_config.py but missed in the runners — commit f2ad12a is the fix; original miss was in 755fc0c).

get_num_ctx() in model_config.py detects nvidia-smi at runtime: GPU → num_ctx=32768, CPU → num_ctx=16384. Override via SUP_NUM_CTX. Ollama default is 4096 on CPU which silently truncates DOM/tool-use prompts.

Wired in:

  • BrowserUse (brains/browseruse/agent.py) — injected into Ollama client chat() options dict via create_logged_chat_ollama wrapper. Uses kwargs.get('options') or {} (browser_use sometimes passes options=None)
  • SmolAgents (brains/smolagents/agent.py + 3 workflow files) — passed as num_ctx in LiteLLMModel constructor

BrowserUse Agent tuning (brains/browseruse/agent.py)

Non-default settings cap token usage to keep CPU BU forward-progressing:

Agent(
    task=full_prompt, llm=self._get_llm(), browser_session=...,
    use_vision=False,                   # gemma is text-only
    use_judge=False,                    # skip extra LLM eval per step
    max_clickable_elements_length=8000, # ~2K tokens vs 40K default
    max_history_items=5,
    include_attributes=["id", "class", "name", "type", "value",
        "placeholder", "aria-label", "role", "href", "title", "alt"],
    llm_timeout=300,                    # CPU LLM calls can take 2-3 min
)

Per-step uniform delay from behavior.behavior.page_dwell is wired via Agent(register_new_step_callback=...).

PHASE feedback runtime consumption

Loader (load_behavioral_config) → consumers:

behavior.json path BehavioralConfig field Consumer
timing.active_minute_windows + target_conn_per_minute_during_active + min_window_minutes + hard_fence_seconds timing_profile phase_timing.update_window_contract → window gate in emulation_loop + D4 deficit-burst in background_services
timing.burst_percentiles.* timing_profile CalibratedTimingConfig.{burst_duration,idle_gap,connections_per_burst}
timing.hourly_distribution (legacy, vestigial) timing_profile CalibratedTimingConfig.hourly_fractions — defaults uniform when absent; windows gate the real schedule
timing.variance.cluster_size_sigma variance_injection get_cluster_size() lognormal noise
timing.variance.idle_gap_sigma variance_injection get_cluster_delay() lognormal noise
timing.variance.hourly_std_targets.{volume,duration}.hourly_std_target variance_injection D1 per-hour sigma in _init_variance_targets
timing.activity_probability_per_hour activity_pattern should_skip_hour()
timing.long_idle_probability + long_idle_duration_minutes activity_pattern should_take_long_idle()
content.workflow_weights workflow_weights build_workflow_weights() for random.choices()
content.site_categories site_config SmolAgents BrowseWebWorkflow task pool filter
content.download_url_pool download_url_pool Smol/BU DownloadFiles LLM picker (falls back to FALLBACK_URLS)
content.whois_domain_pool whois_domain_pool Smol/BU/MCHP WhoisLookup (falls back to FALLBACK_DOMAINS)
behavior.page_dwell / navigation_clicks behavior_modifiers MCHP BrowseWeb.{min,max}_sleep_time; BU per-step delay
behavior.enable_whois / enable_download (read via load_workflow_gates) Workflow registration
behavior.keep_alive_probability behavior_modifiers MCHP BrowseWeb.keep_alive_probability
behavior.max_steps behavior_modifiers BU/Smol per-workflow max_steps
diversity.background_services.* diversity_injection BackgroundServiceGenerator (D4) — maybe_generate
diversity.background_services.{name}_enabled (read by ScriptedServiceScheduler) Phase-3 scripted protocol probes (scripted_services.py: smb/ldap/imap/doh/mdns/websocket/failed_conn). maybe_run fires from the in-window cluster loop (emulation_loop.py:583, same gating as D4) — never outside an active window. Per-probe cron slots (e.g. failed_conn :17/:47). Catch-up scheduling (2026-06-05, commit 26c2489): fires the latest slot at/before the current minute not yet fired this hour, so a sleepy loop that misses the exact minute still fires; the prior exact-minute match fired 0× over 8.5h. [scripted-svc] {name} ok= state= latency_ms= → stdout/systemd.log (+ jsonl info); [scripted-svc] {name}=enabled config marker is jsonl-only.
diversity.background_services.service_mix_targets NOT CONSUMED — no RUSE reader (reverted). PHASE still emits it on cptc; RUSE silently ignores it. service_mix_targets v1 — ABANDONED / DEAD-END (2026-06-09). DO NOT re-chase; REVERTED out of RUSE same day (revert bed8350 of commits 6ec6b8d+f53f79d). Note: with the service-mix precedence gone, cptc behavior.json's smb_enabled/failed_conn_enabled are honored normally again by ScriptedServiceScheduler (the old covered_services() force-disable is removed) — the scripted smb probe fires in-window as a plain fire-and-observe SYN ([scripted-svc] smb ok=False state=S0 is correct; no responder exists anymore). The idea: own-thread generators (common/network/service_mix.py) + a sidecar responder (common/network/service_responder.py: TCP 445 SMB, TCP 9997 splunk, UDP echo) to emit Zeek service types smb/splunk/udp that no workflow produces, to close the cptc service-mix gap. Built, deployed, validated on the live cptc9 dredge — and it CANNOT work. Conclusive reasons: (1) Vocabulary skew — targets are computed from the CPTC9-competition Zeek (TRAINING_DATA/CPTC9_24.parquet: smb/splunk/udp/dcerpc), but the AXES tap Zeek observing our SUPs never emits splunk (0/65 deploys) or bare udp (it labels discovery 137/5355/5353→dns, 138/1900→None, 123→ntp) and labels SMB only as gssapi,smb (auth'd SMB2), not bare smb. Responder mechanics worked (flows completed: 445 RSTR, 9997 RSTO, udp SF, data both ways) but all landed service=None — minimal SMB1 negotiate isn't what Zeek confirms; splunk has no analyzer on this sensor. (2) Exact-string target match in decoy_generator.py (PHASE, can't edit) — gssapi,smbsmb, and critically quic,sslssl + http,websockethttp: a REAL bug that under-credits the good axes targets. Carry-forward win → ask PHASE to normalize comma-joined Zeek labels component-wise at the target matcher (NOT the frozen LabelEncoder). (3) cptc structurally unreachable — wrong network + wrong sensor. std cptc9 model (MinMax, continuous-blind, log_transform_bytes=False) scored 0.50 = a false positive foolable via shared categorical vocab (dns/http/ssl); exp cptc9 model (quantile, 20 feat, continuous un-blinded) correctly rejects AXES browsing SUPs at ~0 because browsing-scale bytes/timing ≠ cptc9 hostile-competition scale (185 conn/min, 3.2 MB/conn splunk). No categorical responder or PHASE knob closes "wrong activity on wrong network." service_mix_targets is emitted ONLY on cptc (axes/vt are ≤1%-skew, workflow-reachable → field omitted) → it only applies where it can't work. Full write-up: memory project_service_mix_targets.md.
diversity.workflow_rotation.* diversity_injection D2 rotation in emulation_loop
diversity.persistent_sessions.* diversity_injection PersistentSession daemon (2026-06-11, common/network/persistent_session.py). Brain-agnostic background thread (NOT a workflow — never occupies the sequential slot), unlike D4/scripted-svc which are inline. Holds long-lived TCP-TLS sessions to endpoint_pool during the active-hours envelope (non-zero session_opens_per_hour hours, read circularly so student bands wrap midnight), opening new sessions spread across active minutes to close PHASE's ssl-dominant-minute / duration / orig_bytes gap. Start-minute binning → opens not concurrency; resolve-once + connect-by-IP → zero steady-state dns (so ssl starts win the per-minute MODE tie vs dns); lifetime = min(sampled_duration, time-to-block-end) → graceful FIN/SF at workday boundary; orig_bytes front-loaded into the first request. D4 net-out: daemon opens are subtracted from D4's deficit-burst via set_window_state(external_conns=) so total volume stays at target + mix shifts to ssl. Absent block or enabled:false → daemon off (no loader change — rides diversity_injection verbatim like topology_mimicry). Logs [psess] open/close/daemon started. Endpoint caveat: keep-alive longevity is endpoint-dependent (Fastly-fronted hosts cnn/stackoverflow/bloomberg/reddit/cisco server-close ~2 req; github/docs.python.org/azure/wikipedia sustain) — quick-closers still give the ssl-minute + orig_bytes win, only DURATION needs sustainers; RUSE clamps send to ≤10s, PHASE curates the pool.
diversity.connection_shape.{orig_bytes,duration} connection_shape Closed-loop ShapeController (Phase 1, 2026-06-16, common/network/shape_controller.py). Per-connection target percentile distributions {p25,p50,p75,p90,max}. The controller draws a per-conn target (NOT the p50 scalar) for the persistent-session channel and applies a bounded, damped multiplicative bias corrected each minute from an emit-side ledger (each closed session reports bytes_cum/wall-duration/SF/proc can't see per-conn shape, RUSE spec §B.1) toward target p50. max is a HARD post-bias ceiling. When active it owns sampling and supersedes the persistent-session lognormal; absent/enabled:false/malformed-dist → warn-loud + fall back to scalar orig_bytes_per_session/session_duration_seconds (never crashes — additive). orig_pkts/resp_bytes/resp_pkts are PARSED but NOT actuated yet (packetization build #3 / response-endpoint build #2). Logs [shape] bytes_med=.../... bias=... failed_conn_rate=... each minute. Ships dormant until PHASE emits the block.
diversity.conn_state_mix.failed_conn conn_state_mix ShapeController → scripted_services failed_conn rate (Phase 1). FOLD (2026-06-16): PHASE collapses REJ+S0 into a single failed_conn fraction; SF is the uncontrolled baseline (reference only). Controller computes failed_conn_rate_per_min = failed_conn_frac × per-minute aggregate active_opens (own OutboundConnSampler — a valid count use of /proc); ScriptedServiceScheduler fires probe_failed_conn (S0/REJ to closed port) toward that rate via a per-minute budget, bypassing the fixed cron failed_conn slot when a target is present (other cron probes unchanged). Logs [scripted-svc] failed_conn ... src=rate. Honored regardless of the failed_conn_enabled cron toggle. REJ vs S0 split deferred to the RST/responder build #6.
diversity.topology_mimicry.inbound_*_per_hour diversity_injection Neighborhood sidecar daemon
_metadata.mode mode Baseline short-circuit in _reload_behavioral_config
_metadata.ablation_gate ablation_gate DEAD field (2026-06-12): PHASE deleted ablation_gate in the two-shapes simplification (8f91240a, 2026-05-08), so is_ablation_gated() is always False. The old [WARNING]-vs-[INFO] tag logic it drove is removed — section-absent status lines now ALWAYS emit [INFO] (optional under two-shapes; commit bc7aa66). Loader still slices the field if present (forward-compat), but no live consumer depends on it.
_metadata.seed seed sup/__main__.py peeks before random.seed(); overrides CLI --seed. Also propagated into neighborhood-sups.json top-level seed field for sidecar RNG anchor. AgentLogger.session_id derives from this via separate Random() instance (no global RNG consumption)
prompt_content prompt_augmentation.prompt_content G1: BU + Smol prompt prepend

Logging output (jsonl)

Each SUP writes events to /opt/ruse/deployed_sups/{key}/logs/session_{YYYY-MM-DD_HH-MM-SS}_{session_id}.jsonl (+ a latest.jsonl symlink). Envelope on every line: timestamp (naive local ISO; runtime hour-gating uses UTC separately — see CLAUDE.md UTC contract), session_id (8 hex, seed-derived → deterministic across replays), agent_type (config key), event_type, optional workflow, details. None values omitted.

17 event types: session_{start,success,fail,end}, workflow_{start,end}, step_{start,success,error}, llm_{request,response,error}, decision, timing_delay, warning, info, network_sample. PHASE-side consumers and the DuckDB collection (/mnt/AXES2U1/SUP_LOGS/sup-logs-<exp>.duckdb) read these directly. (A transient 18th type, background_service, existed only during the abandoned service_mix_targets v1 window, 2026-06-09 — reverted same day.)

network_sample (2026-06-01) is the representative traffic signal — emitted ~per-minute by background_services.py via OutboundConnSampler (common/network/conn_sampler.py). Workflow/step COUNTS are honest but are NOT a traffic proxy (a BU navigate step = a full page-load with dozens of sub-resource conns; an MCHP step = one local micro-action — ground-truthed 2026-06-01: on the wire BU ~18 conn/min ≫ MCHP ~1 ≫ Smol ~0.27, the inverse of the workflow-count ranking). details: active_opens (real outbound TCP conns opened in the window, incl. short-lived; from /proc/net/snmp Tcp:ActiveOpens delta; minor loopback noise), distinct_hosts (loopback-excluded external peers from /proc/net/tcp{,6}), d4_synthetic (legacy D4-only count, = the [bg-counter] conns= floor), window_s. The [bg-counter] systemd.log line gained matching active_opens=/hosts= fields. Cadence follows the inter-task maybe_generate call, so for slow BU it's per-workflow (minutes), not strictly per-minute — window_s carries the true interval and active_opens is a delta, so volume is still complete.

BU llm_error now also fires on cancelled/timeout (2026-06-01): CPU-slow LLM calls were cancelled mid-flight (CancelledError, a BaseException) and vanished silently (llm_requestllm_response, llm_error=0). The wrapper now logs them (fatal=False) so the request/response gap is reconcilable.

Canonical workflow field (2026-05-25)

The workflow top-level field carries workflow.name — the harmonized cross-brain identifier (BrowseWeb, BrowseYouTube, WebSearch, WhoisLookup, DownloadFiles, DocumentEditor, SpreadsheetEditor, ExecuteCommand, ListFiles, MicrosoftPaint). These match exactly the keys feedback_engine.decoy_generator emits in content.workflow_weights, so log events join to weights directly. Human task text moved to params.description; workflow_class was REMOVED (zero PHASE consumers used it). Workflow names DIVERGE from Python class names in MCHP (google_search.py class GoogleSearch → name WebSearch; browse_web.py class WebBrowse → name BrowseWeb) — the .name is the deliberately harmonized join key; class names stay legacy.

Real per-step outcomes + durations (2026-05-25)

step_success/step_error and duration_ms reflect actual execution from authoritative sources per brain:

Brain Step source Timing
BrowserUse walks AgentHistoryList returned by agent.run() (_log_bu_steps in brains/browseruse/agent.py); pairs model_output.action with ActionResult.error per step batched at workflow-end
SmolAgents CodeAgent(step_callbacks=[make_smol_step_callback(logger)]) over each ActionStep (code_action/error/timing in common/logging/llm_callbacks.py) streamed per step
MCHP hand-instrumented logger.step_start/success/error in each workflow file streamed

⚠️ BU batching caveat for inter-step timing: BU step_start timestamps cluster at workflow-end (since the history is walked once after agent.run() returns), so they're NOT meaningful for inter-step gap analysis (feedback_engine/knob_investigation/inter_step_timing.py). Use llm_request/llm_response timestamps (still streamed via the chat wrapper) for BU inter-step timing. Smol and MCHP stream normally.

Action / step vocabulary (version-coupled — see project_brain_lib_pin_parser_coupling memory)

  • _BU_ACTION_MAP (brains/browseruse/agent.py) maps the full browser-use 0.12.7 Tools.registry (24 actions: navigate, click, input, scroll, search, search_page, extract, find_elements, find_text, screenshot, evaluate, dropdown_options, select_dropdown, read_file, write_file, replace_file, save_as_pdf, upload_file, go_back, switch, close, send_keys, wait; done intentionally skipped). Derive from the registry (python -c "from browser_use.tools.service import Tools; print(sorted(Tools().registry.registry.actions))"), NOT sampled logs — sampling missed half on 2026-05-25 (drift guard caught read_file).
  • _SMOL_ACTION_PATTERNS (common/logging/llm_callbacks.py) is bounded by what we register: web_search/duckduckgo/ DuckDuckGoSearchTool → search, visit_webpage → navigate, requests.get/urllib/fetch → navigate, print → scroll; final_answer skipped. Complete by construction.
  • MCHP: step names hardcoded in workflow files (open_application, edit_content, save_document, download_file, whois_lookup, etc.). No version-coupled vocabulary.

Parser-drift guard

Both BU (_log_bu_steps_bu_note_drift) and Smol (_smol_code_unmatched) count consecutive unmapped action names / unmatched code turns. At threshold (BU=10, Smol=25) they print one [WARNING] [parser-drift] ... to stdout → systemd.log → caught by ./audit's Warn column. Validated 2026-05-27: caught read_file (an action the original observed-sample map missed). Pinned versions (browser-use==0.12.7, smolagents==1.25.0) are in INSTALL_SUP.sh so a silent bump can't break the maps unnoticed.

DownloadFiles / WhoisLookup detail fields (2026-05-26 / -27)

The dedicated workflows now carry rich detail in step_success/_error (previously discarded on success):

  • download_file details: {url, outcome, host, bytes, content_type, elapsed_ms} + real duration_ms. MCHP variant: {source, bytes} from a ~/Downloads scandir-delta snapshot (no common downloader for MCHP).
  • whois_lookup: message = trimmed IANA referral (non-%-comment lines joined: refer / domain / organisation), details = {domain}, real duration_ms (the TCP/43 call time).

Schedule-idle ≠ stuck

Outside behavior.json active_minute_windows, the SUP emits an info event and sleeps without firing a workflow:

  • Feedback: [window] outside windows — sleeping Nmin until next start
  • Controls: [controls] outside windows — sleeping 5.0min

A SUP with workflows=0 AND these info lines AND svc=active (recent file mtime) is correctly idle per schedule — NOT hung. Different datasets have different windows, so simultaneous on-window/off-window splits across the fleet are normal (2026-05-27 redeploy audit: 35 on-window logging, 27 off-window idle, all healthy).

DuckDB collection

Periodic SSH-collection from /opt/ruse/deployed_sups/.../logs/*.jsonl into /mnt/AXES2U1/SUP_LOGS/sup-logs-<experiment>.duckdb events table. First-class extracted columns (queryable without JSON path): timestamp, session_id, agent_type, event_type, workflow, duration_ms, success, error_message, model, action, category, step_name, status, {input,output,total}_tokens, llm_output. The newer details payload fields (bytes, content_type, outcome, host, domain, description) live inside the details JSON column → query via JSON path, e.g. details->>'bytes'.

Topology mimicry (neighborhood sidecar)

Feedback-only. 1 small VM per deploy (d-{dep_id}-neighborhood-0, v1.small, bot-desktop keypair). Daemon common.network.neighborhood_traffic reads /etc/ruse-neighborhood/sups.json and synthesizes inbound TCP/UDP probes at each SUP IP.

10 probe types in decoys/common/network/probes.py: inbound_{smb,ldap,wsus,ntp_receive,printer,ipmi,winrm,mdns,ssdp,scan}_per_hour. Produces mixed conn_state (SF / S0 / REJ / RSTO / unidir) on Zeek rows from the SUP — fights local_orig=1 / ephemeral-port-only / conn_state=SF sandbox signal.

Deploy flow (decoy/spinup.py phase 5, after distribute):

  1. _synthesize_neighborhood_config walks each SUP's behavior.json, collects topology_mimicry rates, writes neighborhood-sups.json if any non-zero (else returns None → skip)
  2. _provision_and_install_neighborhood creates VM, writes neighborhood-inventory.ini, runs install-neighborhood.yaml (asserts ruse-neighborhood service active + NRestarts ≤ 5)

Audit excludes sidecars from orphan check (live in neighborhood-inventory.ini, not sup_hosts). Service-status audit not yet wired to main ./audit.

Hot-patch path

/opt/ruse/deployed_sups/{key}/decoys/ is a copy, not a symlink. Each install copies /opt/ruse/decoys/ → that path. git pull in /opt/ruse does NOT propagate. Hot-patch:

  1. git push from mlserv (INSTALL_SUP.sh and decoys/* are pulled from github at install time — clone URL in deployment_engine/playbooks/decoy/install-sups.yaml::ruse_repo)
  2. SSH the VM, cp changed files into per-deploy decoys/
  3. systemctl restart {svc}.service

Or teardown + redeploy.

Audit (./audit)

Per-VM checks across all DECOY VMs. Key columns:

  • Servicesystemctl is-active + NRestarts + uptime probe. NRestarts is cumulative and never decays, so a service with high restart count from past crash-loops is still treated as OK (N restarts, stable Mm) if it's been continuously active ≥ 600s. Only services active < 600s with NRestarts > 10 are flagged FAIL (crash-looping).
  • M0 — reports EXPECTED (M0 upstream crashes on Linux)
  • Fdbk — checks for exactly 1 behavior.json in /opt/ruse/deployed_sups/*/behavioral_configurations/
  • Warn — counts [WARNING] vs [INFO] separately:
    • Baseline (bc_has_behavior=0): n/a (baseline) — runtime short-circuits
    • Feedback, 0 warn + N INFO: OK (N ablation-gated) — PHASE deliberately omitted sections
    • Feedback, N warn: FAIL (N unexpected warnings) — real bug

VM probe greps /opt/ruse/deployed_sups/{key}/logs/systemd.log for [WARNING] and [INFO].*ablation-gated.

Observability recipes

# What aborted the deploy?
grep -E "FAIL|ABORTING|FAILURES" deployments/logs/session-deploy-*.log | tail -30

# What did Ansible actually say per-task?
grep -E "FAILED|fatal|UNREACHABLE" deployments/logs/ansible-*.log | tail -30

# Per-VM behavior.json present?
./audit | grep Fdbk

# All behavior.json files PHASE wrote for a dataset
ls /mnt/AXES2U1/feedback/decoy-controls/sum24/*/*/behavior.json

Constraints

  • C0 no software, M0 read-only, no LLM fallback, MCHP no LLM (see CLAUDE.md)
  • Models run locally via Ollama
  • Per-deploy decoys/ is a COPY (see hot-patch path above)
  • INSTALL_SUP.sh + decoys/* pulled from github → push before deploy
  • VMs set America/New_York for log readability; runtime hour reads use datetime.now(timezone.utc).hour (UTC contract in CLAUDE.md)
Install via CLI
npx skills add https://github.com/LampSteven17/RUSE --skill decoy-deploy
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator
LampSteven17
LampSteven17 Explore all skills →