name: xrpl-confluence
description: Drive the xrpl-confluence fuzzing harness via the confluence CLI (or legacy Makefile). Use when the user wants to boot/tear down a Kurtosis enclave from a Scenario YAML, run a scenario end-to-end, list/inspect findings, stream logs/events, pull corpus, replay a reproducer, or run the older make soak/make chaos flows. Triggers on intents like "confluence up", "confluence run", "confluence findings", "list findings", "pull reproducer", "replay finding", "tail control events", "run a soak", "start chaos test", "tear down enclave".
xrpl-confluence — fuzzing harness skill
xrpl-confluence orchestrates a multi-node XRPL test network (go-xrpl + rippled) inside a Kurtosis enclave and runs a fuzz sidecar against it. There are two CLI surfaces:
confluenceCLI (preferred) — Cobra binary atsidecar/cmd/confluence/. Scenario-driven. This is the canonical interface.Makefilerecipes (legacy) —make soak/make chaosflows kept for backward compatibility.
All commands run from the repo root: /Users/thomashussenet/Documents/project_goXRPL/xrpl-confluence.
Building the CLI
The confluence binary isn't pre-installed. Build it on first use:
( cd sidecar && go build -o ../bin/confluence ./cmd/confluence )
export PATH="$PWD/bin:$PATH" # or invoke as ./bin/confluence
confluence version
bin/ is gitignored. Rebuild after pulling.
Prerequisites
kurtosisCLI installed and engine running (kurtosis engine status).- Docker running locally — the sidecar runs as a container, scenarios pull/build node images.
- Go 1.25+ to build the CLI.
- For scenarios that drive container chaos (latency/restart/partition): the host-level docker-socket-proxy must be up. Start it with
make docker-proxy(one-time per host).
If a prerequisite is missing, surface it and stop — don't auto-install Docker/Kurtosis.
Global flags (work on every subcommand)
--enclave <name>— target a specific enclave. Defaults to the current-context enclave when running a Scenario.--control-url http://host:port— bypass kurtosis lookup and hit the control service directly.--json— emit machine-readable NDJSON/JSON instead of human tables. Use this when piping intojqor chaining commands.
Subcommand cheat sheet
| Command | Purpose |
|---|---|
confluence up -f scenarios/foo.yaml |
Boot an enclave from a Scenario YAML. |
confluence down [ENCLAVE] |
Tear down the current (or named) enclave. |
confluence ls |
List all confluence enclaves. |
confluence status |
Network status of the current enclave (nodes, peers, ledger). Add -w to refresh every 2 s. |
confluence run SCENARIO |
Boot + run + wait for budget/stop_on + optional tear-down. One-shot end-to-end. |
confluence replay REPRODUCER_ID |
Boot an enclave from a saved reproducer YAML. |
confluence logs -n NODE [-f] [--since 10m] [--grep regex] |
Stream a node's logs. |
confluence events |
Stream control-service SSE events as NDJSON (pipe to jq). |
confluence findings [--kind K] [--since ID] [--limit N] |
List findings from the running enclave. |
confluence finding show ID |
Show one finding in detail. |
confluence pull [--dest .confluence] [--corpus] [--no-findings] |
Mirror findings (and optionally corpus) from the enclave to the host. |
confluence scenario validate PATH |
Validate a Scenario YAML before booting. |
up flags
-f, --scenario PATH (required), --enclave NAME, --package DIR (default .), --tear-down-first (default true), --wait-control 60s,
--boot-hang-threshold 90s (kill the kurtosis CLI if it stays silent past this — watchdog for the 1-in-3 0% CPU hangs; 0 disables),
--rebuild-goxrpl PATH (docker build PATH and tag it with topology.goxrpl.image before booting — so you don't need to remember docker build -t goxrpl:latest <worktree> separately),
--rebuild-rippled PATH (same idea, for topology.rippled.image),
--with-dashboard (force observability.enabled=true regardless of YAML — flips the grafana sidecar on without editing the scenario).
run flags
All up flags above PLUS:
-w, --wait (default true), --timeout DUR (defaults to 2× scenario budget), --down (default true — tear down on finish),
--budget DUR (override the scenario's budget.duration end-to-end — propagates into compile, control budget, and the CLI timeout; e.g. --budget 8h for overnight),
--resume-on-finding (after a stop_on-triggered finding, relaunch the run with the same scenario until --budget elapses — for overnight fuzzing that wants more than one failure per session),
--rotate-logs DIR (tail every service's kurtosis logs into DIR/<svc>.log, rotating at 50 MiB — survives overnight when the in-container ring buffer is too small for a post-mortem).
pull flags
--dest .confluence (default), --findings (default true), --corpus (default false), --fuzz-service NAME (auto-detect if empty).
Common workflows
Validate then boot a scenario
confluence scenario validate scenarios/soak-mixed-3x2.yaml
confluence up -f scenarios/soak-mixed-3x2.yaml
confluence status -w
One-shot run (boot, wait for budget, tear down)
confluence run -f scenarios/soak-mixed-3x2.yaml
Useful for CI or "fire and forget" — exits when the scenario's budget elapses or a stop_on predicate fires.
Overnight session (rebuild from a worktree, dashboard on, log rotation, keep going through findings)
confluence run scenarios/soak-mixed-3x2.yaml \
--rebuild-goxrpl /path/to/goXRPL-branch \
--budget 8h \
--with-dashboard \
--rotate-logs ./logs \
--resume-on-finding
One command pins "this branch, this commit, rebuild if needed" — forgetting it used to silently test stale code.
Inspect findings while a run is in flight
confluence findings --limit 50
confluence findings --kind state_divergence
confluence findings --kind consensus_stall
confluence finding show <id>
confluence events | jq 'select(.type=="finding")'
Two oracles run automatically in the control service for every confluence up (no scenario opt-in needed):
consensus_stall— fires when any node'sclosed_seq - validated_seqexceeds 10 for ≥ 2 min (tunable via--stall-gap-threshold/--stall-sustainonconfluence-control). Catches the "network silently broken, validated_seq frozen forever" case thatstate_diffalone misses becausestate_diffonly ticks whenvalidated_seqadvances.state_divergence— when two nodes report different ledger hashes at the same seq, the finding'sDetailnow embeds aLedgerDiffsnapshot of every node's ledger at that seq: per-node root hashes, common tx hashes,only_on_nodes, andsuspect_tx_types(union ofTransactionTypefrom any tx not on every node).SuspectedComponentsmirrorssuspect_tx_typesfor quick triage. Useconfluence finding show <id>(orjq '.detail'onconfluence pull --findings) to see the full diff — no need to grep container logs or hand-write a replay script.
Mirror findings + corpus to the host
confluence pull --corpus # → .confluence/findings + .confluence/corpus
.confluence/ is per-machine state (gitignored).
Replay a reproducer
confluence replay <reproducer-id>
Tear down
confluence down # current enclave
confluence down xrpl-soak # named
Legacy Makefile flow (soak / chaos)
The Makefile predates the CLI. Use it only when explicitly asked.
make soak # 2 go-xrpl + 3 rippled, tx_rate=5
make soak GOXRPL_COUNT=3 RIPPLED_COUNT=5 TX_RATE=10 OBSERVABILITY=1
make soak-status / soak-tail / soak-pull / soak-down
make docker-proxy # required for chaos (one-time)
make chaos # reads .chaos-schedule.json
make chaos-status / chaos-tail / chaos-pull / chaos-down
Tunables (Make vars): ENCLAVE, GOXRPL_COUNT, RIPPLED_COUNT, TX_RATE, ACCOUNTS, ROTATE_EVERY, MUTATION_RATE, CORPUS, OBSERVABILITY, ALERT_WEBHOOK_URL, CHAOS_SCHEDULE.
Failure-recovery patterns
kurtosis runfailed mid-startup → enclave may be partially up.confluence lsto inspect, thenconfluence down <name>(ormake soak-down).kurtosis boot watchdog tripped after Ns of silence→ the 0% CPU hang fired and was auto-killed;up/runretries the boot once with a fresh enclave. If both attempts trip, surface the message and stop —--boot-hang-threshold 0opts out entirely (don't use unless debugging the watchdog itself).fuzz-soak service not foundon pull → enclave torn down or sidecar never started.confluence lsfirst.- Chaos container actions silently no-op → docker-socket-proxy isn't running.
make docker-proxy. - Port 2375 already bound → another proxy is up;
docker ps | grep docker-socket-proxyand reuse or remove. - Control service health timeout on
up→ bump--wait-control 120s, then checkconfluence logs -n control(or viakurtosis service logs). - Forgetting
docker buildbeforeup→ don'tdocker buildseparately; pass--rebuild-goxrpl <path>/--rebuild-rippled <path>toup/runso the image kurtosis pulls is byte-for-byte the one just built.
Files worth reading before non-trivial changes
sidecar/cmd/confluence/— CLI source of truth (one file per subcommand).sidecar/internal/api/— control-service contract used by the CLI.Makefile— legacy CLI surface.main.star/src/topology.star— Kurtosis enclave topology.scenarios/*.yaml— declarative scenarios consumed byup/run.docs/plans/— milestone design docs (chaos-runner, etc.).
Output discipline
When booting a test the user usually wants:
- Confirmation it started + the enclave name.
- The dashboard URL if observability is on.
- The next command to inspect or pull results.
Never tail logs or events in the foreground unless asked — they don't return. Use run_in_background and report once a pattern of interest is hit. Prefer --json output when chaining commands.