name: review-pr description: Review a GitHub pull request against QuestDB client library coding standards argument-hint: [PR number or URL] [--level=0..3] allowed-tools: Bash(gh *), Read, Grep, Glob, Agent
Review the pull request $ARGUMENTS.
Review mindset
You are a senior QuestDB engineer performing a blocking code review. The QuestDB client library is mission-critical software — bugs can cause data loss, silent data corruption, or crashes in customer applications across Rust, C, C++, Python, and downstream language bindings. There is zero tolerance for correctness issues, resource leaks, undefined behavior, or unsound FFI. Be critical, thorough, and opinionated. Your job is to catch problems before they ship, not to be nice.
- Assume nothing is correct until you've verified it. Read surrounding code to understand context — don't just look at the diff in isolation.
- The diff is a hint, not the boundary of the review. The highest-value bugs almost always live at callsites outside the diff that depend on contracts the diff quietly changed. Treat the diff as the entry point, not the scope.
- Flag every issue you find, no matter how small. Do not soften language or hedge. Say "this is wrong" not "this might be an issue".
- Do not praise the code. Skip "looks good", "nice work", "clever approach". Focus entirely on problems and risks.
- Think adversarially. For each change, work through:
- Inputs: which values break this? Consider empty buffers, zero-length strings, boundary integers, max-length symbols.
- Encoding: how does the code behave when a string contains invalid UTF-8, embedded NUL bytes, or oversized lengths?
- Concurrency: what happens under concurrent access or interleaved calls from a host language?
- Failure modes: connection dropping mid-flush, partial write, TLS handshake failure, auth rejection.
- FFI callers: what happens when the caller passes NULL, an unaligned pointer, a freed handle, a buffer length lying about its actual size?
- Check what's missing, not just what's there. Missing tests, missing error handling, missing edge cases, missing documentation for public API changes, C header out of sync with Rust impl.
- Verify every claim. If the PR title says "fix", verify the bug actually existed and the fix is correct. If it says "improve performance", look for benchmarks or reason about the algorithmic change. If it says "simplify", verify the new code is actually simpler and doesn't drop behavior. Treat the PR description as an unverified hypothesis.
- Read the full context of changed files when the diff alone is ambiguous. Use Read/Grep/Glob to inspect surrounding code, callers, and related tests.
- Assess reachability before reporting. For every potential bug, trace the actual callers and inputs. If a problem requires physically impossible conditions (buffer larger than
usize::MAX, NUL-byte injection through an API that already rejects it, panics behind validation guards), it is not a real finding — drop it. Focus on bugs that real workloads can trigger, not theoretical edge cases. debug_assert!andassert!are valid guards for invariants that indicate library-internal bugs. Do NOT flag them as insufficient — they are the preferred mechanism for conditions that should never occur given the FFI contracts. Only flag anassert!/debug_assert!if the condition can plausibly be triggered by callers honoring the documented contract.
Review level
Parse $ARGUMENTS for a level token: --level=N, -lN, or a bare single digit 0-3. If no level is given, default to 0. Strip the level token before feeding the remainder (PR number or URL) to gh commands.
The level controls how much of the review below actually runs. Lower levels keep the same review spirit — adversarial, blocking, no praise — but cut the breadth of the analysis. Higher levels have significantly higher token cost; reserve level 3 for high-stakes PRs (FFI ABI changes, ILP wire format, authentication/TLS, public C/C++ headers, new unsafe blocks, sender or buffer state-machine changes).
| Level | What runs |
|---|---|
| 0 (default) | Steps 1, 2, 4. Skip Step 2.5. Skip Step 3 — no agent spawn; review the diff inline in the main loop, using Read/Grep on demand to resolve ambiguities. Skip Step 3b — verify each finding inline as you write it. Single-pass review covering correctness, FFI safety, panics, tests, and coding standards on the diff itself. |
| 1 | Adds Step 2.5a (semantic delta only — skip 2.5b/2.5c/2.5d). In Step 3, launch only Agent 1 (correctness), Agent 2 (Rust safety), and Agent 7 (tests) in parallel. Skip all other agents. Skip Step 3b — verify findings inline as you draft the report. |
| 2 | Full Step 2.5, but in 2.5b restrict the callsite inventory to pub/pub(crate) Rust symbols plus every #[no_mangle]/extern "C" export. In Step 3, launch Agents 1-8. Skip Agent 9 (cross-context) and Agent 10 (adversarial fresh-context). Step 3b uses a single batched verification agent for all findings instead of one per finding. |
| 3 | Every step below as written, all 10 agents, per-finding verification. The full mission-critical pass. |
State the chosen level in one line at the start of the review so the user knows what they're getting (e.g., "Reviewing PR #141 at level 2"). If the level was defaulted, mention that level 3 exists for full review.
Step 1: Gather PR context
Capture the PR identifier in $PR (the part of $ARGUMENTS left after stripping the level token), then fetch metadata, diff, and review comments in a single bash call so $PR is in scope for all three gh invocations:
PR='<PR number or URL from $ARGUMENTS, with any --level=N / -lN / bare-digit level token removed>'
gh pr view "$PR" --json number,title,body,labels,state
gh pr diff "$PR"
gh pr view "$PR" --comments
Step 2: PR title and description
Check:
- Title is clear and describes the change
- Description speaks to end-user impact, not implementation internals
- If fixing an issue,
Fixes #NNNor a link to the issue is present - Tone is level-headed and analytical
- For public API changes (C header, C++ wrapper, Rust public types), the description calls out the API change explicitly
Step 2.5: Map the change surface
Before launching review agents, produce a structured change surface map. This step is mandatory and must use Grep/Glob — do not reason about callsites from memory. The output of this step is required input for every agent in Step 3.
2.5a Semantic delta per changed symbol
For every modified or added function, method, trait, struct field, public constant, or FFI export, write:
- Symbol: fully-qualified name (e.g.,
questdb::ingress::Sender::flush,line_sender_buffer_column_f64) - Before: signature, return type, error behavior, panic behavior, mutation (
&selfvs&mut self), ordering/idempotency guarantees, allocation behavior, thread-safety, FFI ownership semantics (caller-owned vs callee-owned) - After: same fields
- Delta: one line stating what semantically changed
"Refactored", "cleaned up", "improved", "simplified" are not acceptable deltas. State the actual behavioral difference. If nothing semantically changed, write "no behavioral change" — but only after checking, not as a default.
2.5b Callsite inventory
For every changed symbol that is pub, pub(crate), #[no_mangle], extern "C", exported in the C header, or referenced from the C++ wrapper, run Grep across the entire repository to find every callsite, implementation, override, or reference outside the diff.
Produce a list grouped by file. Search at minimum:
- Rust impls/callers:
grep -r 'symbol_name' questdb-rs/ questdb-rs-ffi/ - Trait implementations:
grep -rn 'impl.*TraitName' questdb-rs/ - C header references:
grep -rn 'symbol_name' include/questdb/ingress/ - C++ wrapper inline functions / templates:
grep -rn 'symbol_name' include/questdb/ingress/*.hpp - C++ tests:
grep -rn 'symbol_name' cpp_test/ - Python ctypes bindings and system tests:
grep -rn 'symbol_name' system_test/ - Examples:
grep -rn 'symbol_name' examples/ - Doc-tests and rendered docs inside the crate:
grep -rn 'symbol_name' questdb-rs/src/
A changed pub/pub(crate)/#[no_mangle] symbol with zero recorded Grep calls in the trace is a skill violation. The model is not allowed to assert "this is only used here" without showing the search.
2.5c Implicit contract list
For each changed symbol, walk this checklist and write one line per item, stating before vs after:
- Panics on which inputs (and whether the panic crosses an FFI boundary)
Result::Errvariants returned and which?chains propagate them- Iteration / flush ordering, idempotency, replay safety
- Re-entrancy and reentrant-call behavior (e.g., calling flush from inside a callback)
- Lock acquisition order and which locks are held on return; tokio runtime/blocking expectations
- Allocation on hot path (buffer build, column write, flush, encode) vs setup path (construction, configuration)
Send/Syncbounds and whether the value can cross threads- Whether the FFI ownership contract changed (caller-frees vs callee-frees, who owns returned pointers, lifetime relative to the parent handle)
- Buffer state on error: does a failed call leave the buffer half-written? Does the sender need re-construction after error?
- C header signature: parameter types,
const-ness, length parameters, NUL-termination expectations - Wire format: any change to the ILP bytes produced or accepted by the protocol parser
2.5d Cross-context exposure list
End this step with an explicit list of "places this change is visible from but the diff does not touch". This is the highest-priority input for the bug-hunting agents in Step 3.
Group the callsites from 2.5b by execution context. Typical contexts in this codebase:
- FFI entry surface: every
extern "C"function inquestdb-rs-ffi/src/lib.rsthat calls (transitively) into the changed code - Buffer build hot path:
Buffer::column_*,Buffer::symbol,Buffer::at*and their callers - Flush path:
Sender::flush*, HTTP/TCP/ILP transports - Auto-flush logic: any callsite that triggers flush implicitly
- Configuration parsing:
SenderBuilder, conf string parsers inquestdb-rs/src/ingress/conf.rs - Authentication / TLS: TLS handshake in
questdb-rs/src/ingress/tls.rs, basic auth, token auth - Async runtime: code that runs inside the tokio runtime started by the client
- C++ wrapper: template instantiations, inline forwarders, RAII destructors in
include/questdb/ingress/*.hpp - C++ doctest suite:
cpp_test/test_line_sender.cpp - Python system tests:
system_test/test.py,system_test/fixture.py, ctypes shims insystem_test/questdb_line_sender.py - Documentation examples:
examples/programs that link the C/C++ API
Every entry on this list must be reviewed in Step 3.
2.5e Build profile facts
This sub-step runs at every level, including levels 0 and 1 where the rest of Step 2.5 is skipped. A single Cargo.toml setting can flip the panic-safety story for the entire crate; agents must reason from the actual profile, not from defaults.
Read questdb-rs/Cargo.toml and questdb-rs-ffi/Cargo.toml and record, with file:line citations:
- panic strategy per profile (
[profile.release],[profile.dev]). Ifpanic = "abort"in either, everycatch_unwindin that crate is a no-op for that profile and every reachable panic is a process abort. Agents 2, 3, and 4 (and the level-0 inline review) must not creditcatch_unwindas a panic guard underpanic = "abort". The only acceptable defense under abort-panic is proving no panic path exists. - overflow-checks per profile. If
overflow-checks = falsein release (the default), integer overflow wraps silently in release builds instead of panicking — bugs that look like panics in test builds disappear into wrong values in production. State which mode applies. [profile.*.package.*]overrides if present — a per-dependency profile can reintroduce unwinding for one crate even when the workspace defaults to abort.#[global_allocator]if defined anywhere in the workspace. A custom allocator changes the OOM behavior (some abort, some unwind, some return null).- lto / codegen-units / strip — informational; flag if they look unusual.
A review without this section is incomplete. State the panic mode in one line at the top of every Step 3 agent prompt so the agent reasons from the right premise.
Step 3: Parallel review
Every agent receives:
- The PR diff
- The full change surface map from Step 2.5 (semantic deltas, callsite inventory, implicit contracts, cross-context exposure list)
Anti-anchoring directive (applies to all agents)
- Bugs at callsites outside the diff outrank bugs inside the diff. A confirmed bug in a file the PR did not touch but that calls a changed symbol is a P0 finding.
- "Looks correct in isolation" is not a valid conclusion. Before clearing a changed symbol, the agent must walk the callsite inventory from 2.5b and explicitly state, per callsite, whether the new behavior is still correct there.
- The diff is the entry point, not the scope. If the change surface map shows the symbol is reachable from N other files, the review covers N+1 files.
- Crate-wide settings affect untouched code. A change to
Cargo.toml(panic strategy, allocator, feature defaults, MSRV, profile overrides), a new#[global_allocator], or a newpanic_handlerretroactively changes the safety story for every existing function in the crate — not just the diff. WhenCargo.toml, build scripts, or workspace-level config files appear in the diff, the review covers the panic/allocation/overflow contract of the entire affected crate, not just the touched lines. The same applies when 2.5e records a profile fact (e.g.panic = "abort") that invalidates existing safety patterns in untouched code. - A single finding of the form "in
test_line_sender.cppthe new behavior ofline_sender_buffer_column_f64causes Y" is worth more than five findings inside the diff.
Agents
Launch the following agents in parallel.
Agent 1 — Correctness & bugs: NULL/None handling, edge cases, logic errors, off-by-one, operator precedence, error paths, integer overflow/truncation, buffer length math. Cross-reference every changed symbol against its callsite inventory and verify the new behavior is correct at each callsite.
Agent 2 — Rust safety, panics, and crash surface: In a client library, anything that aborts the Rust side aborts the host process with no recovery. Flag every reachable instance of:
- Panic sources:
unwrap(),expect(), array indexing without bounds checks,panic!(),unreachable!(),todo!(), integer overflow in release mode,slice::from_raw_partswith invalid inputs,Mutex::lock().unwrap()on a poisoned mutex. - Direct aborts:
std::process::abort(),libc::abort(),std::intrinsics::abort(). - Allocation-failure aborts: any
Vec::with_capacity,Box::new,String::reserve, or similar sized by an untrusted length parameter. Rust's default allocator aborts on OOM — a caller passing a 100 GB length parameter is a crash, not a recoverable error. Check whether the FFI function validates length bounds before allocating. - Stack overflow: unbounded recursion, recursive
Dropimpls, deeply nested data structures decoded from untrusted input. - Panic-in-Drop / double-panic: a panic inside a
Dropimpl while another panic is unwinding aborts the process. Flag anyDropimpl that can panic (callsunwrap, indexes, allocates). - Unsound
unsafe: verify safety invariants are documented and upheld, check pointer validity, aliasing rules, lifetime correctness, dangling pointers, use-after-free, double-free, data races. - C++ exceptions escaping into C: the C++ wrapper (
include/questdb/ingress/*.hpp) is reachable from pure-C callers via inline forwarders. Any path where the wrapper can throw (std::bad_alloc,std::system_error, user-definedthrow) and reach a C caller is undefined behavior. Verify wrapper functions called from C arenoexceptor only invoked from C++ contexts. - SIGPIPE on broken sockets: writing to a closed peer raises SIGPIPE by default on Linux/macOS, killing the process. Verify TCP/HTTP write paths set
MSG_NOSIGNALor mask SIGPIPE.
Panic strategy is the foundation. Before reasoning about any panic guard, look up the panic setting from Step 2.5e:
- Under
panic = "abort",catch_unwindis a no-op — it cannot catch anything because nothing unwinds. Every reachable panic is a process abort regardless of where thecatch_unwindis placed. The only acceptable defense is proving no panic path exists: front-load every length check, replaceunwrap/expect/indexing on wire-derived or caller-supplied values withResult-returning equivalents, validate before allocating, usechecked_*arithmetic. Acatch_unwindwrapper in this mode is misleading documentation, not a safety net — flag it if it gives the reader false confidence. - Under
panic = "unwind", everyextern "C"function must wrap its body incatch_unwindAND everyDropimpl on the unwind path must be panic-free (double-panic aborts the process). Fallible operations must useResult/Optionwith proper error propagation.
State which panic mode applies in the agent's first sentence. Every panic-related finding must be evaluated under the actual mode, not the textbook one.
Agent 3 — FFI boundary safety: Check every #[no_mangle] / extern "C" function. Verify: NULL pointer checks on all pointer arguments, proper error propagation across the FFI boundary (no panics escaping into C), correct ownership transfer semantics (who allocates, who frees), buffer length validation, string encoding correctness (UTF-8 ↔ C strings, NUL handling), and that the C header (include/questdb/ingress/line_sender.h) and C++ wrapper (include/questdb/ingress/line_sender.hpp + the split line_sender_core.hpp / line_sender_array.hpp / line_sender_decimal.hpp) accurately reflect the Rust implementation. If cbindgen.toml is involved, verify generated output matches handwritten headers.
Agent 4 — Concurrency & thread safety: Race conditions, Send/Sync bounds, shared mutable state, lock ordering, correct use of Arc/Mutex/RwLock, tokio runtime safety (blocking vs async calls), thread-safety of data structures. For C/C++ API: verify documented thread-safety guarantees match the implementation. Cross-reference every callsite from 2.5b for violations of the new concurrency contract.
Agent 5 — Resource management & memory: Leaks on all code paths (especially errors), Drop implementations, native memory management, buffer lifecycle, socket/connection cleanup on error paths, TLS session teardown. For C API: verify every allocation has a documented deallocation path, and error paths don't leak. Walk every callsite from 2.5b that constructs, owns, or transfers ownership of changed types and verify cleanup on all paths (Ok branch, ? early return, panic-unwind boundary).
Agent 6 — Performance & allocations: Unnecessary allocations on hot paths (buffer building, column writes, flushing), excessive copying, inefficient serialization, unnecessary syscalls, buffer growth strategy. For each new loop on the data path, analyze how it scales with realistic message volumes (millions of rows per flush, hundreds of columns). Flag any O(n²) pattern. Setup-path allocations (sender construction, configuration parsing) are acceptable; data-path allocations are not.
Agent 7 — Test review & coverage: Coverage gaps, error path tests, NULL/edge-case tests, boundary conditions, regression tests, test quality. Check:
- Rust unit tests in
questdb-rs/src/**/tests.rsand#[cfg(test)] mod testsblocks - C++ doctest tests in
cpp_test/test_line_sender.cpp - Python system tests in
system_test/test.py - Examples in
examples/still build and run
Cross-reference 2.5d: every cross-context exposure should have a test that exercises the changed symbol from that context. Missing tests for cross-context callsites is a high-priority finding.
Agent 8 — Code quality & API design: Public API ergonomics and consistency, backward compatibility of C/C++ headers, naming conventions, dead code, documentation for public items (/// docs on public Rust, Doxygen-style comments on C/C++ headers), clippy and cargo fmt compliance.
Agent 9 — Cross-context caller impact: Walk the callsite inventory from 2.5b. For every callsite, fetch the surrounding code (the calling function plus its callers up two levels) and answer:
- Does this caller pass inputs the new behavior handles incorrectly?
- Does this caller depend on a contract from the implicit contract list (2.5c) that the change broke?
- Is this caller in a context (FFI entry, async runtime, holding a lock, error path, hot loop, auto-flush callback, TLS handshake, panic-unwind boundary,
Dropimpl) where the new behavior misbehaves even if the inputs are valid? - For changed Rust traits: do all impls still satisfy the new contract?
- For changed
extern "C"signatures: does the C header still match? Do the C++ wrapper and Python ctypes binding still pass the right types and lifetimes? - For changed buffer/sender state machines: do all callers respect the new state transitions (e.g., is a buffer cleared after error before being reused; is
flushcalled only when the sender is in a flushable state)?
This agent's output is structured per callsite, not per failure mode. Each callsite gets a verdict: SAFE / BROKEN / NEEDS VERIFICATION. Every BROKEN entry is a P0 finding regardless of whether the file is in the diff.
This agent is not optional even when the diff is small. Small diffs to widely-used symbols (Buffer::column_*, Sender::flush, FFI exports) have the largest blast radius.
Agent 10 — Fresh-context adversarial: Dispatched separately from agents 1-9 to escape checklist anchoring. This agent operates under different rules from the rest:
- It receives ONLY the PR diff and the names of the changed files. It does NOT receive the change surface map from Step 2.5, the implicit contract list, the cross-context exposure list, or any of the review checklists below.
- Its sole instruction: "find ways this code is wrong". No category list, no failure-mode taxonomy, no QuestDB-specific style guide.
- It is free to use Read, Grep, and Glob to explore the repository however it wants.
- Findings are not pre-classified by category. Each finding states: what's wrong, why it's wrong, and the code path that demonstrates it.
The point of this agent is to surface bugs the structured agents cannot see because they are reasoning inside the same frame. A finding here that none of agents 1-9 produced is high signal — it means the structured review missed it. A finding here that overlaps with agents 1-9 is corroboration.
Run this agent in parallel with agents 1-9. It is mandatory regardless of diff size.
Combine all agent findings into a single deduplicated draft report. Do NOT present this draft to the user yet — it goes straight into verification.
Step 3b: Verify every finding against source code
The parallel review agents work from the diff plus the change surface map and frequently produce false positives — especially around memory ownership, unsafe blocks, FFI lifecycle conventions, and Rust control-flow guarantees. Every finding MUST be verified before it is reported.
For each finding in the draft report:
- Read the actual source code at the exact lines cited. Do not rely on the agent's description alone.
- Trace the full code path: follow callers, trait implementations, and generic instantiations. A method called on a trait object may dispatch to a specific impl.
- Check both sides of FFI boundaries: if a finding involves Rust↔C/C++ interaction, read both the Rust FFI function and the C/C++ header/caller (and Python ctypes binding when applicable). Verify ownership transfer, error propagation, and cleanup on both sides.
- For resource leak claims: trace every allocation to its corresponding free/drop on ALL code paths (happy path, error path,
?operator early returns, panic unwind). Check forDropimpls. Before claiming a leak between allocation and cleanup registration, verify that the intervening code can actually fail. - For Rust panic claims: verify whether the panic site is actually reachable. Trace control flow backwards — a preceding guard, match arm, or early return may make it unreachable.
- For Rust panic claims via FFI: trace the C/C++/Python caller to check whether it can actually pass parameters that trigger the panic. If every documented caller validates inputs before the FFI call AND the FFI function itself has a NULL/range guard, the panic is unreachable from the contract — drop it. But if the FFI function is the validation boundary, the panic IS reachable and must be flagged.
- For Rust numeric overflow claims: check whether the overflow is reachable at realistic scale. The QuestDB client handles ILP messages — practical bounds are buffers up to a few hundred MB, message counts up to millions per flush, columns per row in the tens to low hundreds, symbol cardinality in the thousands. If overflow requires values beyond that scale (buffer sizes near
usize::MAX, billions of columns), drop it. - For unsafe soundness claims: verify whether the safety invariants are actually violated. Check preconditions established by callers and documented in the
// SAFETY:comment. - For performance claims: check whether the cost is measurable on a realistic workload. Downgrade to a nit if the saving is negligible relative to the surrounding I/O. Exception: an allocation on the per-row buffer-write path is always worth flagging, even a single one.
- For cross-context findings (Agent 9): re-read the callsite in full, including its callers up two levels, and confirm the broken behavior is reachable from production code paths or test paths users will exercise. Cross-context findings are high-value but also the easiest to overstate — verify carefully.
Classify each finding as:
- CONFIRMED in-diff — the bug is real and inside the diff
- CONFIRMED at out-of-diff callsite — the bug is in an unchanged file because the changed symbol is used there in a way that's now broken (cite the file and the contract from 2.5c that was violated)
- FALSE POSITIVE — the code is actually correct (explain why)
- CONFIRMED with nuance — the issue exists but is less severe than stated (explain)
Move false positives to a separate "Downgraded" section at the end of the report. For each, give a one-line explanation of why it was dismissed. This lets the PR author verify the reasoning and catch verification mistakes.
Launch verification agents in parallel where findings are independent. Each verification agent should read surrounding source files, not just the diff.
Review checklists
Review the diff for:
Correctness & bugs
- NULL/None handling at API boundaries
- Edge cases and error paths
- Logic errors, off-by-one, incorrect bounds, wrong operator precedence
- Integer overflow and truncation (especially in buffer size calculations and length parameters)
- Correct ILP wire format (v1 / v2)
- Reachability expansion: for each changed symbol, list the FFI surfaces, async contexts, error paths, lock-held states, and host languages it can now appear in but didn't before. Verify it works in each.
Rust safety
- No
unwrap()/expect()in library code (only in tests, or on infallible paths with a// SAFETY:/// INVARIANT:justification) - All
unsafeblocks have documented safety invariants - No undefined behavior: dangling pointers, use-after-free, double-free, data races
- Proper
Send/Syncbounds on public types - No panics that can escape FFI boundaries — and the meaning of "escape" depends on the panic strategy (see Step 2.5e). Under
panic = "abort",catch_unwindis a no-op and every reachable panic is a fatal escape; the FFI function must prove no panic path exists. Underpanic = "unwind", everyextern "C"function must wrap its body incatch_unwind.
Crash surface
Anything that aborts the Rust side aborts the host process. The first check is the panic strategy itself — everything else is downstream of it.
- Panic strategy (from Step 2.5e): under
panic = "abort", the entirecatch_unwinddefense collapses — every panic across the entire crate is fatal. Verify the profile before crediting any panic guard. A finding that says "the panic at X is caught bycatch_unwindat Y" is incorrect under abort-panic. - Direct aborts:
std::process::abort(),libc::abort(),std::intrinsics::abort() - Allocation-failure aborts: any allocation sized by an untrusted length parameter must validate the bound before allocating (Rust's default allocator aborts on OOM)
- Stack overflow: unbounded recursion, recursive
Dropimpls, deeply nested untrusted input - Panic-in-
Drop/ double-panic during unwind — everyDropimpl must be panic-free - C++ exceptions escaping the wrapper into C callers — inline forwarders reachable from C must be
noexceptor proven not to throw - SIGPIPE on broken sockets — TCP/HTTP write paths must use
MSG_NOSIGNALor mask SIGPIPE - Mutex poisoning —
Mutex::lock().unwrap()panics if the mutex is poisoned by a prior panic; uselock()with explicit poison handling
FFI boundary
- All pointer arguments validated for NULL before dereference
- Length parameters validated against zero / overflow before slicing
- Error codes and error messages propagated correctly across FFI (no information loss)
- Ownership semantics clear and correct (caller-owned vs callee-owned, who frees)
- C header (
line_sender.h) and C++ wrapper (line_sender.hppand split headers) match Rust implementation signatures cbindgen.tomlproduces output consistent with handwritten headers when both exist- String handling: UTF-8 validation, NUL termination, explicit length parameters, no embedded NULs unless documented
- ABI stability: struct layouts not reordered, enum discriminants not renumbered, calling conventions unchanged
Concurrency
- Race conditions: unsynchronized shared mutable state, missing memory barriers, unsafe publication
- Lock ordering issues that could deadlock (especially around tokio's blocking vs async distinction)
- Thread-safety of types passed across
Sendboundaries - For every changed symbol, check whether it is now reachable from a thread or context (per 2.5d) where the previous concurrency assumptions don't hold
Performance
- Performance regressions: changes that make hot paths slower
- Unnecessary allocations in buffer building, column writes, or flushing paths
- Excessive copying of data that could be passed by reference / slice
- Buffer growth strategy (exponential vs linear)
- Syscall overhead (batching, buffering, TCP write coalescing)
- Algorithmic complexity at realistic scale: millions of rows per flush, hundreds of columns per row. Flag O(n²) on any data path.
Resource management
- Resources properly cleaned up on all code paths (especially error paths)
Dropimplementations correct and complete- Socket/connection cleanup on error
- TLS session cleanup
- Buffer memory freed correctly
- No leaks through FFI boundary (callee returns ownership and caller frees, or vice versa — documented and consistent)
Code quality
- Public API is consistent and ergonomic
- Backward-compatible changes to C/C++ headers (or breaking changes are intentional and called out in the PR body)
- Naming conventions consistent with existing codebase
- No dead code or unused imports
clippyclean (no#[allow]attributes added without justification)cargo fmtapplied
Test review
- Coverage gaps: For every new or changed code path, verify a corresponding test exists. If not, flag it explicitly as "missing test for X".
- Cross-context coverage: For every entry in the cross-context exposure list (2.5d), verify a test exercises the changed symbol from that context. Missing cross-context tests are high-priority findings.
- Error path coverage: Are failure cases, partial writes, connection drops, TLS failures, auth failures, and edge conditions tested — not just the happy path?
- NULL/edge-case tests: Are NULL inputs, empty buffers, zero-length strings, max-length symbols, and boundary values tested?
- FFI tests: Are C/C++ API changes covered by tests in
cpp_test/? - Integration tests: Are protocol-level changes covered by system tests in
system_test/? - Test quality: Are tests actually asserting the right thing? Watch for tests that pass trivially or assert on wrong values.
- Regression tests: If this PR fixes a bug, is there a test that reproduces the original bug and would fail without the fix?
Unresolved TODOs and FIXMEs
- Scan the diff for
TODO,FIXME,HACK,XXX, andWORKAROUNDcomments. For each one found:- Is it a pre-existing comment that was just moved/reformatted, or newly introduced in this PR?
- If newly introduced: does it represent unfinished work that should block the merge, or a known limitation that is acceptable to ship? Flag any that look like deferred bugs or incomplete implementations.
- If the TODO references a ticket/issue number, verify the reference exists.
Commit messages
- Plain English titles, under 50 chars
- Active voice, naming the acting subject
Step 4: Output
Present ONLY verified findings (false positives are excluded from Critical/Moderate/Minor). Structure as:
Critical
Issues that must be fixed before merge. Each must include:
- Exact file path and line numbers (including out-of-diff files)
- Whether the finding is in-diff or out-of-diff
- Code path trace showing why the bug is real
- For out-of-diff findings: the contract from 2.5c that was violated and the callsite that triggers it
- Suggested fix
Moderate
Issues worth addressing but not blocking.
Minor
Style nits and suggestions.
Downgraded (false positives)
Findings from the initial review that were dismissed after source code verification. For each, state:
- The original claim (one line)
- Why it was dismissed (one line, citing the specific code that disproves it)
Summary
- One-line verdict: approve, request changes, or needs discussion
- Highlight any regressions or tradeoffs
- State how many draft findings were verified vs dropped as false positives (e.g., "8 findings verified, 4 false positives removed")
- State the in-diff vs out-of-diff split (e.g., "5 findings in-diff, 3 findings out-of-diff"). If the diff is non-trivial and out-of-diff is zero, the cross-context pass likely underran — re-invoke Agent 9 with a wider grep before finalizing.