name: investigate-echonet-resync
description: Investigate why an Echonet resync triggered. Use when someone mentions an Echonet resync, a failing block in an echonet-* namespace, or a block_hash_mismatch / transaction_commitment_mismatch / echonet_only_revert / gateway_error. The skill sets up port-forwards, prefers GCP Logs Explorer for history, cross-checks the produced block against mainnet, and recognizes recurring failure patterns before reporting back.
Investigate an Echonet Resync
Echonet is a Starknet replay layer. It pulls mainnet blocks from the public feeder gateway, forwards their txs into a local Apollo sequencer pod, and compares the locally-produced block against mainnet. When something diverges — a gateway reject, an echonet-only revert, a block-hash mismatch, a transaction-commitment mismatch — Echonet scales the sequencer to zero, rewinds state, and restarts. Your job is to figure out what triggered this specific resync.
You'll be given a namespace and a failure block number. Everything else you discover yourself.
1. Discover the environment
Echonet namespaces live in the cluster whose kubeconfig context contains sequencer-dev (or whatever the current dev cluster is called). The K8s objects use canonical names:
- Service:
echonet(port80) - Deployment:
echonet - Apollo sequencer: a separate deployment in the same namespace; find it with
kubectl -n <ns> get pods | grep -iE 'apollo|sequencer'.
kubectl config current-context # confirm you're on the dev cluster
kubectl get ns | grep -i echonet # list all echonet-* namespaces
kubectl -n <ns> get deploy,svc,pods # what's actually running
There can be several echonet namespaces in flight at once (e.g. echonet-committer3, echonet-14-2); always confirm which one the user named.
2. Set up a port-forward (if one isn't already running)
Pick any free local port — the canonical local ports are just convention.
LOCAL=18083 # or any free port
kubectl -n <ns> port-forward svc/echonet ${LOCAL}:80 \
>/tmp/pf-${LOCAL}.log 2>&1 &
curl -fsS "http://127.0.0.1:${LOCAL}/echonet/report/ui" >/dev/null && echo OK
If the port is occupied by a stale process, find it (ss -lntp | grep ${LOCAL} or lsof -iTCP:${LOCAL}) and kill it before retrying. Don't reuse a port without confirming what's on it.
3. Echonet's HTTP surface
Useful endpoints (all under http://127.0.0.1:${LOCAL}):
Reports / triage
/echonet/report/ui— interactive UI; lists past resyncs, lets you drill into pre-resync snapshots./echonet/report/ui/download— same data as a static archive./echonet/report— JSON form of the current live report (post-last-resync state)./echonet/report/text— plain-text version./echonet/block_dump?block_number=<N>&kind=<blob|block|state_update>— the raw payload Echonet stored for block N. Useblobfor the cende blob,blockfor the feeder-gateway-shaped block,state_updatefor the matching state update./echonet/get_block_metadata?block_number=<N>— Echonet's view of block metadata./echonet/get_tx_block_metadata?tx_hash=<H>— which mainnet block a given tx came from./echonet/get_starknet_version— the version Echonet thinks it's on.
Echonet sometimes evicts old blocks from in-memory storage. If a /feeder_gateway/get_block query returns no result, the block may already be archived on the PVC (see §5) or, post-resync, replaced with the now-successful re-run. In that case, the snapshot taken before the resync (§5) is the source of truth.
4. Logs — prefer GCP, not kubectl
kubectl logs only retains a short window (often hours) and is wiped when a pod restarts. Since a resync intentionally restarts the sequencer pod, the logs you actually need are usually already gone from kubectl. Default to GCP Logs Explorer.
GCP Logs Explorer
The cluster's GCP project is whatever project hosts the current kubectl context. Look it up rather than hardcoding:
# Project of the current cluster:
gcloud container clusters list \
--filter="name=$(kubectl config current-context | awk -F_ '{print $NF}')" \
--format='value(name,location,resourceLabels)'
# Or, if you know the context format `gke_<project>_<region>_<cluster>`:
kubectl config current-context | awk -F_ '{print "project="$2, "region="$3, "cluster="$4}'
Then either open the Logs Explorer UI in the browser, or query from the CLI:
PROJECT=<gcp-project>
NS=<namespace> # e.g. echonet-committer3
START="2026-06-07T08:00:00Z" # window around the resync
END="2026-06-07T10:00:00Z"
# Echonet (Flask) logs
gcloud logging read \
"resource.type=\"k8s_container\"
resource.labels.namespace_name=\"${NS}\"
resource.labels.container_name=\"echonet\"
timestamp>=\"${START}\" timestamp<=\"${END}\"" \
--project="${PROJECT}" --limit=2000 --format='value(timestamp,jsonPayload.message,textPayload)' \
> /tmp/echonet-${NS}.log
# Apollo sequencer logs — container name varies; check `kubectl -n <ns> get pod <p> -o yaml | grep -A1 'containers:'`
gcloud logging read \
"resource.type=\"k8s_container\"
resource.labels.namespace_name=\"${NS}\"
resource.labels.container_name=~\"apollo|sequencer\"
timestamp>=\"${START}\" timestamp<=\"${END}\"" \
--project="${PROJECT}" --limit=5000 --format='value(timestamp,jsonPayload.message,textPayload)' \
> /tmp/sequencer-${NS}.log
Useful filters to layer on:
- Resync triggers:
textPayload=~"Resync triggered|record_resync_cause|gateway_error|Forward failed|mismatch|429" - Block builder:
textPayload=~"block_builder|propose_block|consensus|batcher|cende_recorder" - Specific tx:
textPayload=~"<tx_hash>" - Specific block:
textPayload=~"block.{0,8}<block_number>"
A useful UI link template (substitute the four bracketed values):
https://console.cloud.google.com/logs/query;query=resource.type%3D%22k8s_container%22%0Aresource.labels.namespace_name%3D%22<NS>%22%0Aresource.labels.container_name%3D%22echonet%22?project=<PROJECT>
kubectl logs (fallback only)
Only useful if the resync is very recent and the pod hasn't restarted:
kubectl -n <ns> logs deploy/echonet --since=2h | \
grep -E 'Resync triggered|gateway_error|Forward failed|mismatch|429'
kubectl -n <ns> logs <sequencer-pod> --since=2h
5. Pre-resync report files (PVC)
Before every resync, Echonet writes a snapshot of its live state to disk. These are the most trustworthy record of what the system observed right before it gave up — they survive the resync itself, while in-memory state does not.
POD=$(kubectl -n <ns> get pod -l app.kubernetes.io/name=echonet -o jsonpath='{.items[0].metadata.name}')
kubectl -n <ns> exec "${POD}" -- ls -la /data/echonet/reports/ # find the snapshot near your timestamp
kubectl -n <ns> cp "${POD}:/data/echonet/reports/<file>" /tmp/ # pull it locally
Archived blocks evicted from memory live under the same PVC; if /feeder_gateway/get_block returns nothing, check there:
kubectl -n <ns> exec "${POD}" -- ls /data/echonet/ # discover the archive dir name
6. Compare Echonet's block to real mainnet
N=<failure_block_number>
PORT=<your_local_port>
curl "https://feeder.alpha-mainnet.starknet.io/feeder_gateway/get_block?blockNumber=${N}" > /tmp/mainnet_block_${N}.json
curl "https://feeder.alpha-mainnet.starknet.io/feeder_gateway/get_state_update?blockNumber=${N}" > /tmp/mainnet_su_${N}.json
curl "http://127.0.0.1:${PORT}/feeder_gateway/get_block?blockNumber=${N}" > /tmp/echo_block_${N}.json
curl "http://127.0.0.1:${PORT}/echonet/block_dump?block_number=${N}&kind=block" > /tmp/echo_dump_block_${N}.json
curl "http://127.0.0.1:${PORT}/echonet/block_dump?block_number=${N}&kind=state_update" > /tmp/echo_dump_su_${N}.json
Diff transaction_commitment, event_commitment, receipt_commitment, state_diff_commitment, block_hash, then drill into receipts (revert_error, actual_fee, events, messages_sent) and the state diff itself. The differing commitment narrows down which subsystem produced the divergence.
7. Code map (anchors in this repo)
echonet/transaction_sender.py— pulls feeder blocks, forwards txs to the local sequencer, evaluates resync policy.echonet/resync.py—ResyncPolicy.evaluate(trigger logic),ResyncExecutor.execute(scale-down → wipe → scale-up).echonet/shared_context.py— all shared mutable state: resync causes, mismatch tracking, block storage.echonet/echo_center.py— Flask handlers: cende write_blob,/l1RPC mock,/feeder_gateway/*, report UI.echonet/l1_logic/l1_manager.py,l1_blocks.py,l1_client.py— L1_HANDLER lookup against Alchemy.- Rust sequencer side:
crates/apollo_mempool/,crates/apollo_batcher/,crates/apollo_consensus_*/,crates/apollo_gateway/.
Look at recent commits on the active echonet branch for context:
git log --oneline -30 --all -- echonet/
8. Known failure patterns — check these first
If the symptom matches one of these, name the pattern explicitly in the report and confirm the deployed image actually contains the fix:
kubectl -n <ns> get pods -o jsonpath='{.items[*].spec.containers[*].image}' | tr ' ' '\n' | sort -u
git log --all --oneline -- echonet/ crates/apollo_mempool/ crates/apollo_batcher/ # cross-reference SHAs
Each namespace can run a different image; "fix already merged" doesn't mean "fix already deployed here."
- Cairo-native vs CASM revert traces.
echonet_only_revertwhoserevert_errordiffers from mainnet's only by the topmost VM frame (typically missing or extrapc=…). The resync clears the cairo-native cache so the second pass falls back to CASM and matches. Not actually a bug in Echonet — a Blockifier divergence. - Alchemy 429 → L1_HANDLER drop.
transaction_commitmentmismatch on a block that contains anL1_HANDLERtx. Look for HTTP429in Echonet logs near the failure window. Root cause:l1_manager.set_new_txsilently returns whenfind_l1_block_for_txfails, so the L1_HANDLER never reaches the local block.
9. Investigation workflow
- Open the UI report, find the entry for the failing block. Note the trigger tx_hash and reason (
gateway_error,echonet_only_revert,block_hash_mismatch,transaction_commitment_mismatch). - Branch on the reason:
block_hash_mismatch/transaction_commitment_mismatch— diff Echonet's block vs mainnet's (§6). Isolate the differing field; drill into the txs / events / state diff that produced it.gateway_error— pull the tx, grep Echonet + sequencer GCP logs (§4) for the tx_hash, identify the gateway response code and message.echonet_only_revert— fetchrevert_errorfrom both Echonet (block dump) and mainnet (feeder); diff them. Often the cairo-native pattern (§8.1).
- Verify the deployed image actually contains any fix you intend to invoke as "explained" (§8).
- Cross-check against known patterns before assuming a new bug.
- If the resync already replayed cleanly, that's still worth noting — but a successful retry doesn't make the original divergence "harmless"; it just means it's non-deterministic.
10. Safety constraints (hard rules)
- Never read secret files:
*secret*,*keys.json,.env*,echonet/k8s/echonet/secret.yaml. - Don't restart pods, scale deployments, redeploy, or trigger resyncs without explicit approval. The whole point of investigating a resync is preserving the evidence of what caused it.
- Read-only
kubectl logs,kubectl exec -- cat|ls|stat,kubectl cpout of the pod, port-forwards,gcloud logging read, and curling mainnet's public feeder are all fine without asking.
11. Reporting back
Give the user:
- Trigger tx hash and reason.
- Which field/commitment differed, with both values (if applicable).
- Whether it matches a known pattern (cite the section above) or appears novel.
- Where the bug lives (
file:line) and a concrete fix suggestion if you have one, including whether the namespace in question is running an image that already contains a known fix.