review-pr - SKILL.md Agent Skill

name: review-pr description: Review a GitHub pull request against QuestDB client library coding standards argument-hint: [PR number or URL] [--level=0..3] allowed-tools: Bash(gh *), Read, Grep, Glob, Agent

Review the pull request $ARGUMENTS.

Review mindset

You are a senior QuestDB engineer performing a blocking code review. The QuestDB client library is mission-critical software — bugs can cause data loss, silent data corruption, or crashes in customer applications across Rust, C, C++, Python, and downstream language bindings. There is zero tolerance for correctness issues, resource leaks, undefined behavior, or unsound FFI. Be critical, thorough, and opinionated. Your job is to catch problems before they ship, not to be nice.

Assume nothing is correct until you've verified it. Read surrounding code to understand context — don't just look at the diff in isolation.
The diff is a hint, not the boundary of the review. The highest-value bugs almost always live at callsites outside the diff that depend on contracts the diff quietly changed. Treat the diff as the entry point, not the scope.
Flag every issue you find, no matter how small. Do not soften language or hedge. Say "this is wrong" not "this might be an issue".
Do not praise the code. Skip "looks good", "nice work", "clever approach". Focus entirely on problems and risks.
Think adversarially. For each change, work through:
- Inputs: which values break this? Consider empty buffers, zero-length strings, boundary integers, max-length symbols.
- Encoding: how does the code behave when a string contains invalid UTF-8, embedded NUL bytes, or oversized lengths?
- Concurrency: what happens under concurrent access or interleaved calls from a host language?
- Failure modes: connection dropping mid-flush, partial write, TLS handshake failure, auth rejection.
- FFI callers: what happens when the caller passes NULL, an unaligned pointer, a freed handle, a buffer length lying about its actual size?
Check what's missing, not just what's there. Missing tests, missing error handling, missing edge cases, missing documentation for public API changes, C header out of sync with Rust impl.
Verify every claim. If the PR title says "fix", verify the bug actually existed and the fix is correct. If it says "improve performance", look for benchmarks or reason about the algorithmic change. If it says "simplify", verify the new code is actually simpler and doesn't drop behavior. Treat the PR description as an unverified hypothesis.
Read the full context of changed files when the diff alone is ambiguous. Use Read/Grep/Glob to inspect surrounding code, callers, and related tests.
Assess reachability before reporting. For every potential bug, trace the actual callers and inputs. If a problem requires physically impossible conditions (buffer larger than usize::MAX, NUL-byte injection through an API that already rejects it, panics behind validation guards), it is not a real finding — drop it. Focus on bugs that real workloads can trigger, not theoretical edge cases.
debug_assert! and assert! are valid guards for invariants that indicate library-internal bugs. Do NOT flag them as insufficient — they are the preferred mechanism for conditions that should never occur given the FFI contracts. Only flag an assert!/debug_assert! if the condition can plausibly be triggered by callers honoring the documented contract.

Review level

Parse $ARGUMENTS for a level token: --level=N, -lN, or a bare single digit 0-3. If no level is given, default to 0. Strip the level token before feeding the remainder (PR number or URL) to gh commands.

The level controls how much of the review below actually runs. Lower levels keep the same review spirit — adversarial, blocking, no praise — but cut the breadth of the analysis. Higher levels have significantly higher token cost; reserve level 3 for high-stakes PRs (FFI ABI changes, ILP wire format, authentication/TLS, public C/C++ headers, new unsafe blocks, sender or buffer state-machine changes).

Level	What runs
0 (default)	Steps 1, 2, 4. Skip Step 2.5. Skip Step 3 — no agent spawn; review the diff inline in the main loop, using Read/Grep on demand to resolve ambiguities. Skip Step 3b — verify each finding inline as you write it. Single-pass review covering correctness, FFI safety, panics, tests, and coding standards on the diff itself.
1	Adds Step 2.5a (semantic delta only — skip 2.5b/2.5c/2.5d). In Step 3, launch only Agent 1 (correctness), Agent 2 (Rust safety), and Agent 7 (tests) in parallel. Skip all other agents. Skip Step 3b — verify findings inline as you draft the report.
2	Full Step 2.5, but in 2.5b restrict the callsite inventory to `pub`/`pub(crate)` Rust symbols plus every `#[no_mangle]`/`extern "C"` export. In Step 3, launch Agents 1-8. Skip Agent 9 (cross-context) and Agent 10 (adversarial fresh-context). Step 3b uses a single batched verification agent for all findings instead of one per finding.
3	Every step below as written, all 10 agents, per-finding verification. The full mission-critical pass.

State the chosen level in one line at the start of the review so the user knows what they're getting (e.g., "Reviewing PR #141 at level 2"). If the level was defaulted, mention that level 3 exists for full review.

Step 1: Gather PR context

Capture the PR identifier in $PR (the part of $ARGUMENTS left after stripping the level token), then fetch metadata, diff, and review comments in a single bash call so $PR is in scope for all three gh invocations:

PR='<PR number or URL from $ARGUMENTS, with any --level=N / -lN / bare-digit level token removed>'
gh pr view "$PR" --json number,title,body,labels,state
gh pr diff "$PR"
gh pr view "$PR" --comments

Step 2: PR title and description

Check:

Title is clear and describes the change
Description speaks to end-user impact, not implementation internals
If fixing an issue, Fixes #NNN or a link to the issue is present
Tone is level-headed and analytical
For public API changes (C header, C++ wrapper, Rust public types), the description calls out the API change explicitly

Step 2.5: Map the change surface

Before launching review agents, produce a structured change surface map. This step is mandatory and must use Grep/Glob — do not reason about callsites from memory. The output of this step is required input for every agent in Step 3.

2.5a Semantic delta per changed symbol

For every modified or added function, method, trait, struct field, public constant, or FFI export, write:

Symbol: fully-qualified name (e.g., questdb::ingress::Sender::flush, line_sender_buffer_column_f64)
Before: signature, return type, error behavior, panic behavior, mutation (&self vs &mut self), ordering/idempotency guarantees, allocation behavior, thread-safety, FFI ownership semantics (caller-owned vs callee-owned)
After: same fields
Delta: one line stating what semantically changed

"Refactored", "cleaned up", "improved", "simplified" are not acceptable deltas. State the actual behavioral difference. If nothing semantically changed, write "no behavioral change" — but only after checking, not as a default.

2.5b Callsite inventory

For every changed symbol that is pub, pub(crate), #[no_mangle], extern "C", exported in the C header, or referenced from the C++ wrapper, run Grep across the entire repository to find every callsite, implementation, override, or reference outside the diff.

Produce a list grouped by file. Search at minimum:

Rust impls/callers: grep -r 'symbol_name' questdb-rs/ questdb-rs-ffi/
Trait implementations: grep -rn 'impl.*TraitName' questdb-rs/
C header references: grep -rn 'symbol_name' include/questdb/ingress/
C++ wrapper inline functions / templates: grep -rn 'symbol_name' include/questdb/ingress/*.hpp
C++ tests: grep -rn 'symbol_name' cpp_test/
Python ctypes bindings and system tests: grep -rn 'symbol_name' system_test/
Examples: grep -rn 'symbol_name' examples/
Doc-tests and rendered docs inside the crate: grep -rn 'symbol_name' questdb-rs/src/

A changed pub/pub(crate)/#[no_mangle] symbol with zero recorded Grep calls in the trace is a skill violation. The model is not allowed to assert "this is only used here" without showing the search.

2.5c Implicit contract list

For each changed symbol, walk this checklist and write one line per item, stating before vs after:

Panics on which inputs (and whether the panic crosses an FFI boundary)
Result::Err variants returned and which ? chains propagate them
Iteration / flush ordering, idempotency, replay safety
Re-entrancy and reentrant-call behavior (e.g., calling flush from inside a callback)
Lock acquisition order and which locks are held on return; tokio runtime/blocking expectations
Allocation on hot path (buffer build, column write, flush, encode) vs setup path (construction, configuration)
Send / Sync bounds and whether the value can cross threads
Whether the FFI ownership contract changed (caller-frees vs callee-frees, who owns returned pointers, lifetime relative to the parent handle)
Buffer state on error: does a failed call leave the buffer half-written? Does the sender need re-construction after error?
C header signature: parameter types, const-ness, length parameters, NUL-termination expectations
Wire format: any change to the ILP bytes produced or accepted by the protocol parser

2.5d Cross-context exposure list

End this step with an explicit list of "places this change is visible from but the diff does not touch". This is the highest-priority input for the bug-hunting agents in Step 3.

Group the callsites from 2.5b by execution context. Typical contexts in this codebase:

FFI entry surface: every extern "C" function in questdb-rs-ffi/src/lib.rs that calls (transitively) into the changed code
Buffer build hot path: Buffer::column_*, Buffer::symbol, Buffer::at* and their callers
Flush path: Sender::flush*, HTTP/TCP/ILP transports
Auto-flush logic: any callsite that triggers flush implicitly
Configuration parsing: SenderBuilder, conf string parsers in questdb-rs/src/ingress/conf.rs
Authentication / TLS: TLS handshake in questdb-rs/src/ingress/tls.rs, basic auth, token auth
Async runtime: code that runs inside the tokio runtime started by the client
C++ wrapper: template instantiations, inline forwarders, RAII destructors in include/questdb/ingress/*.hpp
C++ doctest suite: cpp_test/test_line_sender.cpp
Python system tests: system_test/test.py, system_test/fixture.py, ctypes shims in system_test/questdb_line_sender.py
Documentation examples: examples/ programs that link the C/C++ API

Every entry on this list must be reviewed in Step 3.

2.5e Build profile facts

This sub-step runs at every level, including levels 0 and 1 where the rest of Step 2.5 is skipped. A single Cargo.toml setting can flip the panic-safety story for the entire crate; agents must reason from the actual profile, not from defaults.

Read questdb-rs/Cargo.toml and questdb-rs-ffi/Cargo.toml and record, with file:line citations:

panic strategy per profile ([profile.release], [profile.dev]). If panic = "abort" in either, every catch_unwind in that crate is a no-op for that profile and every reachable panic is a process abort. Agents 2, 3, and 4 (and the level-0 inline review) must not credit catch_unwind as a panic guard under panic = "abort". The only acceptable defense under abort-panic is proving no panic path exists.
overflow-checks per profile. If overflow-checks = false in release (the default), integer overflow wraps silently in release builds instead of panicking — bugs that look like panics in test builds disappear into wrong values in production. State which mode applies.
[profile.*.package.*] overrides if present — a per-dependency profile can reintroduce unwinding for one crate even when the workspace defaults to abort.
#[global_allocator] if defined anywhere in the workspace. A custom allocator changes the OOM behavior (some abort, some unwind, some return null).
lto / codegen-units / strip — informational; flag if they look unusual.

A review without this section is incomplete. State the panic mode in one line at the top of every Step 3 agent prompt so the agent reasons from the right premise.

Step 3: Parallel review

Every agent receives:

The PR diff
The full change surface map from Step 2.5 (semantic deltas, callsite inventory, implicit contracts, cross-context exposure list)

Anti-anchoring directive (applies to all agents)

Bugs at callsites outside the diff outrank bugs inside the diff. A confirmed bug in a file the PR did not touch but that calls a changed symbol is a P0 finding.
"Looks correct in isolation" is not a valid conclusion. Before clearing a changed symbol, the agent must walk the callsite inventory from 2.5b and explicitly state, per callsite, whether the new behavior is still correct there.
The diff is the entry point, not the scope. If the change surface map shows the symbol is reachable from N other files, the review covers N+1 files.
Crate-wide settings affect untouched code. A change to Cargo.toml (panic strategy, allocator, feature defaults, MSRV, profile overrides), a new #[global_allocator], or a new panic_handler retroactively changes the safety story for every existing function in the crate — not just the diff. When Cargo.toml, build scripts, or workspace-level config files appear in the diff, the review covers the panic/allocation/overflow contract of the entire affected crate, not just the touched lines. The same applies when 2.5e records a profile fact (e.g. panic = "abort") that invalidates existing safety patterns in untouched code.
A single finding of the form "in test_line_sender.cpp the new behavior of line_sender_buffer_column_f64 causes Y" is worth more than five findings inside the diff.

Agents

Launch the following agents in parallel.

Agent 1 — Correctness & bugs: NULL/None handling, edge cases, logic errors, off-by-one, operator precedence, error paths, integer overflow/truncation, buffer length math. Cross-reference every changed symbol against its callsite inventory and verify the new behavior is correct at each callsite.

Agent 2 — Rust safety, panics, and crash surface: In a client library, anything that aborts the Rust side aborts the host process with no recovery. Flag every reachable instance of:

Panic sources: unwrap(), expect(), array indexing without bounds checks, panic!(), unreachable!(), todo!(), integer overflow in release mode, slice::from_raw_parts with invalid inputs, Mutex::lock().unwrap() on a poisoned mutex.
Direct aborts: std::process::abort(), libc::abort(), std::intrinsics::abort().
Allocation-failure aborts: any Vec::with_capacity, Box::new, String::reserve, or similar sized by an untrusted length parameter. Rust's default allocator aborts on OOM — a caller passing a 100 GB length parameter is a crash, not a recoverable error. Check whether the FFI function validates length bounds before allocating.
Stack overflow: unbounded recursion, recursive Drop impls, deeply nested data structures decoded from untrusted input.
Panic-in-Drop / double-panic: a panic inside a Drop impl while another panic is unwinding aborts the process. Flag any Drop impl that can panic (calls unwrap, indexes, allocates).
Unsound unsafe: verify safety invariants are documented and upheld, check pointer validity, aliasing rules, lifetime correctness, dangling pointers, use-after-free, double-free, data races.
C++ exceptions escaping into C: the C++ wrapper (include/questdb/ingress/*.hpp) is reachable from pure-C callers via inline forwarders. Any path where the wrapper can throw (std::bad_alloc, std::system_error, user-defined throw) and reach a C caller is undefined behavior. Verify wrapper functions called from C are noexcept or only invoked from C++ contexts.
SIGPIPE on broken sockets: writing to a closed peer raises SIGPIPE by default on Linux/macOS, killing the process. Verify TCP/HTTP write paths set MSG_NOSIGNAL or mask SIGPIPE.

Panic strategy is the foundation. Before reasoning about any panic guard, look up the panic setting from Step 2.5e:

Under panic = "abort", catch_unwind is a no-op — it cannot catch anything because nothing unwinds. Every reachable panic is a process abort regardless of where the catch_unwind is placed. The only acceptable defense is proving no panic path exists: front-load every length check, replace unwrap/expect/indexing on wire-derived or caller-supplied values with Result-returning equivalents, validate before allocating, use checked_* arithmetic. A catch_unwind wrapper in this mode is misleading documentation, not a safety net — flag it if it gives the reader false confidence.
Under panic = "unwind", every extern "C" function must wrap its body in catch_unwind AND every Drop impl on the unwind path must be panic-free (double-panic aborts the process). Fallible operations must use Result/Option with proper error propagation.

State which panic mode applies in the agent's first sentence. Every panic-related finding must be evaluated under the actual mode, not the textbook one.

Agent 3 — FFI boundary safety: Check every #[no_mangle] / extern "C" function. Verify: NULL pointer checks on all pointer arguments, proper error propagation across the FFI boundary (no panics escaping into C), correct ownership transfer semantics (who allocates, who frees), buffer length validation, string encoding correctness (UTF-8 ↔ C strings, NUL handling), and that the C header (include/questdb/ingress/line_sender.h) and C++ wrapper (include/questdb/ingress/line_sender.hpp + the split line_sender_core.hpp / line_sender_array.hpp / line_sender_decimal.hpp) accurately reflect the Rust implementation. If cbindgen.toml is involved, verify generated output matches handwritten headers.

Agent 4 — Concurrency & thread safety: Race conditions, Send/Sync bounds, shared mutable state, lock ordering, correct use of Arc/Mutex/RwLock, tokio runtime safety (blocking vs async calls), thread-safety of data structures. For C/C++ API: verify documented thread-safety guarantees match the implementation. Cross-reference every callsite from 2.5b for violations of the new concurrency contract.

Agent 5 — Resource management & memory: Leaks on all code paths (especially errors), Drop implementations, native memory management, buffer lifecycle, socket/connection cleanup on error paths, TLS session teardown. For C API: verify every allocation has a documented deallocation path, and error paths don't leak. Walk every callsite from 2.5b that constructs, owns, or transfers ownership of changed types and verify cleanup on all paths (Ok branch, ? early return, panic-unwind boundary).

Agent 6 — Performance & allocations: Unnecessary allocations on hot paths (buffer building, column writes, flushing), excessive copying, inefficient serialization, unnecessary syscalls, buffer growth strategy. For each new loop on the data path, analyze how it scales with realistic message volumes (millions of rows per flush, hundreds of columns). Flag any O(n²) pattern. Setup-path allocations (sender construction, configuration parsing) are acceptable; data-path allocations are not.

Agent 7 — Test review & coverage: Coverage gaps, error path tests, NULL/edge-case tests, boundary conditions, regression tests, test quality. Check:

Rust unit tests in questdb-rs/src/**/tests.rs and #[cfg(test)] mod tests blocks
C++ doctest tests in cpp_test/test_line_sender.cpp
Python system tests in system_test/test.py
Examples in examples/ still build and run

Cross-reference 2.5d: every cross-context exposure should have a test that exercises the changed symbol from that context. Missing tests for cross-context callsites is a high-priority finding.

Agent 8 — Code quality & API design: Public API ergonomics and consistency, backward compatibility of C/C++ headers, naming conventions, dead code, documentation for public items (/// docs on public Rust, Doxygen-style comments on C/C++ headers), clippy and cargo fmt compliance.

Agent 9 — Cross-context caller impact: Walk the callsite inventory from 2.5b. For every callsite, fetch the surrounding code (the calling function plus its callers up two levels) and answer:

Does this caller pass inputs the new behavior handles incorrectly?
Does this caller depend on a contract from the implicit contract list (2.5c) that the change broke?
Is this caller in a context (FFI entry, async runtime, holding a lock, error path, hot loop, auto-flush callback, TLS handshake, panic-unwind boundary, Drop impl) where the new behavior misbehaves even if the inputs are valid?
For changed Rust traits: do all impls still satisfy the new contract?
For changed extern "C" signatures: does the C header still match? Do the C++ wrapper and Python ctypes binding still pass the right types and lifetimes?
For changed buffer/sender state machines: do all callers respect the new state transitions (e.g., is a buffer cleared after error before being reused; is flush called only when the sender is in a flushable state)?

This agent's output is structured per callsite, not per failure mode. Each callsite gets a verdict: SAFE / BROKEN / NEEDS VERIFICATION. Every BROKEN entry is a P0 finding regardless of whether the file is in the diff.

This agent is not optional even when the diff is small. Small diffs to widely-used symbols (Buffer::column_*, Sender::flush, FFI exports) have the largest blast radius.

Agent 10 — Fresh-context adversarial: Dispatched separately from agents 1-9 to escape checklist anchoring. This agent operates under different rules from the rest:

It receives ONLY the PR diff and the names of the changed files. It does NOT receive the change surface map from Step 2.5, the implicit contract list, the cross-context exposure list, or any of the review checklists below.
Its sole instruction: "find ways this code is wrong". No category list, no failure-mode taxonomy, no QuestDB-specific style guide.
It is free to use Read, Grep, and Glob to explore the repository however it wants.
Findings are not pre-classified by category. Each finding states: what's wrong, why it's wrong, and the code path that demonstrates it.

The point of this agent is to surface bugs the structured agents cannot see because they are reasoning inside the same frame. A finding here that none of agents 1-9 produced is high signal — it means the structured review missed it. A finding here that overlaps with agents 1-9 is corroboration.

Run this agent in parallel with agents 1-9. It is mandatory regardless of diff size.

Combine all agent findings into a single deduplicated draft report. Do NOT present this draft to the user yet — it goes straight into verification.

Step 3b: Verify every finding against source code

The parallel review agents work from the diff plus the change surface map and frequently produce false positives — especially around memory ownership, unsafe blocks, FFI lifecycle conventions, and Rust control-flow guarantees. Every finding MUST be verified before it is reported.

For each finding in the draft report:

Read the actual source code at the exact lines cited. Do not rely on the agent's description alone.
Trace the full code path: follow callers, trait implementations, and generic instantiations. A method called on a trait object may dispatch to a specific impl.
Check both sides of FFI boundaries: if a finding involves Rust↔C/C++ interaction, read both the Rust FFI function and the C/C++ header/caller (and Python ctypes binding when applicable). Verify ownership transfer, error propagation, and cleanup on both sides.
For resource leak claims: trace every allocation to its corresponding free/drop on ALL code paths (happy path, error path, ? operator early returns, panic unwind). Check for Drop impls. Before claiming a leak between allocation and cleanup registration, verify that the intervening code can actually fail.
For Rust panic claims: verify whether the panic site is actually reachable. Trace control flow backwards — a preceding guard, match arm, or early return may make it unreachable.
For Rust panic claims via FFI: trace the C/C++/Python caller to check whether it can actually pass parameters that trigger the panic. If every documented caller validates inputs before the FFI call AND the FFI function itself has a NULL/range guard, the panic is unreachable from the contract — drop it. But if the FFI function is the validation boundary, the panic IS reachable and must be flagged.
For Rust numeric overflow claims: check whether the overflow is reachable at realistic scale. The QuestDB client handles ILP messages — practical bounds are buffers up to a few hundred MB, message counts up to millions per flush, columns per row in the tens to low hundreds, symbol cardinality in the thousands. If overflow requires values beyond that scale (buffer sizes near usize::MAX, billions of columns), drop it.
For unsafe soundness claims: verify whether the safety invariants are actually violated. Check preconditions established by callers and documented in the // SAFETY: comment.
For performance claims: check whether the cost is measurable on a realistic workload. Downgrade to a nit if the saving is negligible relative to the surrounding I/O. Exception: an allocation on the per-row buffer-write path is always worth flagging, even a single one.
For cross-context findings (Agent 9): re-read the callsite in full, including its callers up two levels, and confirm the broken behavior is reachable from production code paths or test paths users will exercise. Cross-context findings are high-value but also the easiest to overstate — verify carefully.

Classify each finding as:

CONFIRMED in-diff — the bug is real and inside the diff
CONFIRMED at out-of-diff callsite — the bug is in an unchanged file because the changed symbol is used there in a way that's now broken (cite the file and the contract from 2.5c that was violated)
FALSE POSITIVE — the code is actually correct (explain why)
CONFIRMED with nuance — the issue exists but is less severe than stated (explain)

Move false positives to a separate "Downgraded" section at the end of the report. For each, give a one-line explanation of why it was dismissed. This lets the PR author verify the reasoning and catch verification mistakes.

Launch verification agents in parallel where findings are independent. Each verification agent should read surrounding source files, not just the diff.

Review checklists

Review the diff for:

Correctness & bugs

NULL/None handling at API boundaries
Edge cases and error paths
Logic errors, off-by-one, incorrect bounds, wrong operator precedence
Integer overflow and truncation (especially in buffer size calculations and length parameters)
Correct ILP wire format (v1 / v2)
Reachability expansion: for each changed symbol, list the FFI surfaces, async contexts, error paths, lock-held states, and host languages it can now appear in but didn't before. Verify it works in each.

Rust safety

No unwrap()/expect() in library code (only in tests, or on infallible paths with a // SAFETY: / // INVARIANT: justification)
All unsafe blocks have documented safety invariants
No undefined behavior: dangling pointers, use-after-free, double-free, data races
Proper Send/Sync bounds on public types
No panics that can escape FFI boundaries — and the meaning of "escape" depends on the panic strategy (see Step 2.5e). Under panic = "abort", catch_unwind is a no-op and every reachable panic is a fatal escape; the FFI function must prove no panic path exists. Under panic = "unwind", every extern "C" function must wrap its body in catch_unwind.

Crash surface

Anything that aborts the Rust side aborts the host process. The first check is the panic strategy itself — everything else is downstream of it.

Panic strategy (from Step 2.5e): under panic = "abort", the entire catch_unwind defense collapses — every panic across the entire crate is fatal. Verify the profile before crediting any panic guard. A finding that says "the panic at X is caught by catch_unwind at Y" is incorrect under abort-panic.
Direct aborts: std::process::abort(), libc::abort(), std::intrinsics::abort()
Allocation-failure aborts: any allocation sized by an untrusted length parameter must validate the bound before allocating (Rust's default allocator aborts on OOM)
Stack overflow: unbounded recursion, recursive Drop impls, deeply nested untrusted input
Panic-in-Drop / double-panic during unwind — every Drop impl must be panic-free
C++ exceptions escaping the wrapper into C callers — inline forwarders reachable from C must be noexcept or proven not to throw
SIGPIPE on broken sockets — TCP/HTTP write paths must use MSG_NOSIGNAL or mask SIGPIPE
Mutex poisoning — Mutex::lock().unwrap() panics if the mutex is poisoned by a prior panic; use lock() with explicit poison handling

FFI boundary

All pointer arguments validated for NULL before dereference
Length parameters validated against zero / overflow before slicing
Error codes and error messages propagated correctly across FFI (no information loss)
Ownership semantics clear and correct (caller-owned vs callee-owned, who frees)
C header (line_sender.h) and C++ wrapper (line_sender.hpp and split headers) match Rust implementation signatures
cbindgen.toml produces output consistent with handwritten headers when both exist
String handling: UTF-8 validation, NUL termination, explicit length parameters, no embedded NULs unless documented
ABI stability: struct layouts not reordered, enum discriminants not renumbered, calling conventions unchanged

Concurrency

Race conditions: unsynchronized shared mutable state, missing memory barriers, unsafe publication
Lock ordering issues that could deadlock (especially around tokio's blocking vs async distinction)
Thread-safety of types passed across Send boundaries
For every changed symbol, check whether it is now reachable from a thread or context (per 2.5d) where the previous concurrency assumptions don't hold

Performance

Performance regressions: changes that make hot paths slower
Unnecessary allocations in buffer building, column writes, or flushing paths
Excessive copying of data that could be passed by reference / slice
Buffer growth strategy (exponential vs linear)
Syscall overhead (batching, buffering, TCP write coalescing)
Algorithmic complexity at realistic scale: millions of rows per flush, hundreds of columns per row. Flag O(n²) on any data path.

Resource management

Resources properly cleaned up on all code paths (especially error paths)
Drop implementations correct and complete
Socket/connection cleanup on error
TLS session cleanup
Buffer memory freed correctly
No leaks through FFI boundary (callee returns ownership and caller frees, or vice versa — documented and consistent)

Code quality

Public API is consistent and ergonomic
Backward-compatible changes to C/C++ headers (or breaking changes are intentional and called out in the PR body)
Naming conventions consistent with existing codebase
No dead code or unused imports
clippy clean (no #[allow] attributes added without justification)
cargo fmt applied

Test review

Coverage gaps: For every new or changed code path, verify a corresponding test exists. If not, flag it explicitly as "missing test for X".
Cross-context coverage: For every entry in the cross-context exposure list (2.5d), verify a test exercises the changed symbol from that context. Missing cross-context tests are high-priority findings.
Error path coverage: Are failure cases, partial writes, connection drops, TLS failures, auth failures, and edge conditions tested — not just the happy path?
NULL/edge-case tests: Are NULL inputs, empty buffers, zero-length strings, max-length symbols, and boundary values tested?
FFI tests: Are C/C++ API changes covered by tests in cpp_test/?
Integration tests: Are protocol-level changes covered by system tests in system_test/?
Test quality: Are tests actually asserting the right thing? Watch for tests that pass trivially or assert on wrong values.
Regression tests: If this PR fixes a bug, is there a test that reproduces the original bug and would fail without the fix?

Unresolved TODOs and FIXMEs

Scan the diff for TODO, FIXME, HACK, XXX, and WORKAROUND comments. For each one found:
- Is it a pre-existing comment that was just moved/reformatted, or newly introduced in this PR?
- If newly introduced: does it represent unfinished work that should block the merge, or a known limitation that is acceptable to ship? Flag any that look like deferred bugs or incomplete implementations.
- If the TODO references a ticket/issue number, verify the reference exists.

Commit messages

Plain English titles, under 50 chars
Active voice, naming the acting subject

Step 4: Output

Present ONLY verified findings (false positives are excluded from Critical/Moderate/Minor). Structure as:

Critical

Issues that must be fixed before merge. Each must include:

Exact file path and line numbers (including out-of-diff files)
Whether the finding is in-diff or out-of-diff
Code path trace showing why the bug is real
For out-of-diff findings: the contract from 2.5c that was violated and the callsite that triggers it
Suggested fix

Moderate

Issues worth addressing but not blocking.

Minor

Style nits and suggestions.

Downgraded (false positives)

Findings from the initial review that were dismissed after source code verification. For each, state:

The original claim (one line)
Why it was dismissed (one line, citing the specific code that disproves it)

Summary

One-line verdict: approve, request changes, or needs discussion
Highlight any regressions or tradeoffs
State how many draft findings were verified vs dropped as false positives (e.g., "8 findings verified, 4 false positives removed")
State the in-diff vs out-of-diff split (e.g., "5 findings in-diff, 3 findings out-of-diff"). If the diff is non-trivial and out-of-diff is zero, the cross-context pass likely underran — re-invoke Agent 9 with a wider grep before finalizing.