name: grill description: "Use when you want to stress-test code by having an adversarial reviewer try to break it - finding edge cases, security holes, race conditions, and logical flaws that normal reviews miss." user-invocable: true argument-hint: "[path, function, or module to grill]" allowed-tools: Read, Bash, Glob, Grep
/grill - Adversarial Code Challenge
Stress-test code by systematically trying to break it - finding edge cases, security holes, race conditions, and logical flaws that normal code reviews miss. This is not a review; this is an attack.
Target: $ARGUMENTS
When to Use
- Before deploying critical code (auth, payments, data handling)
- After writing complex business logic
- When code handles user input or external data
- Before security audits (find issues first)
- When you suspect a function has hidden edge cases
- After fixing a bug (verify the fix doesn't break under stress)
- NOT for: general code quality (use
/codereview) - NOT for: architecture assessment (use
@architect) - NOT for: test creation (use
/tdd- but grilling findings feed into test cases)
Iron Rule
╔══════════════════════════════════════════════════════════════╗
║ ASSUME THE CODE IS BROKEN UNTIL PROVEN OTHERWISE ║
║ Your job is to find HOW, not to confirm it works. ║
╚══════════════════════════════════════════════════════════════╝
Workflow
Step 1: Target Analysis
- Read the target code completely
- Understand the contract:
- What are the expected inputs? (types, ranges, formats)
- What are the expected outputs?
- What side effects does it have? (DB writes, API calls, file I/O)
- What invariants should always hold?
- Identify trust boundaries:
- Where does external data enter?
- Where does the code trust input without validation?
- Where are type assertions or casts used?
# Read the target
cat $TARGET 2>/dev/null || find . -name "$TARGET*" -not -path "*/node_modules/*" | head -5
# Find type assertions and unsafe casts
grep -rn "as any\|as unknown\|! \|!\.\|@ts-ignore\|@ts-expect-error" $TARGET --include="*.ts" --include="*.tsx" 2>/dev/null | head -15
Output:
## Target Profile
- **Function/Module:** [name]
- **Contract:** [inputs → outputs]
- **Side effects:** [DB, API, file, state]
- **Trust boundaries:** [where external data enters]
- **Unsafe patterns:** [casts, assertions, ignores]
BLOCKED until target is fully understood.
Step 2: Attack Vector Execution
Systematically attack across 5 categories:
Category 1: Input Attacks
Try to break the code with unexpected inputs:
| Attack | Input | Expected Failure |
|---|---|---|
| Null/undefined | null, undefined |
TypeError, crash |
| Empty values | "", [], {}, 0 |
Logic error, empty result |
| Type coercion | "0", "false", "null" |
Truthy/falsy confusion |
| Boundary values | Number.MAX_SAFE_INTEGER, -1, 0.1 + 0.2 |
Overflow, off-by-one, float precision |
| Long strings | "a".repeat(1_000_000) |
Memory exhaustion, buffer overflow |
| Special characters | <script>, '; DROP TABLE, ../../../etc/passwd |
XSS, SQL injection, path traversal |
| Unicode edge cases | "é", "🚀", "\u0000", RTL text |
Encoding errors, display corruption |
| Nested data | Deeply nested objects (100+ levels) | Stack overflow, infinite recursion |
| Prototype pollution | {"__proto__": {"admin": true}} |
Privilege escalation |
Category 2: State Attacks
Try to break the code through state manipulation:
| Attack | Scenario | Expected Failure |
|---|---|---|
| Concurrent calls | Same function called simultaneously | Race condition, double write |
| Out-of-order execution | Step 3 before step 1 | Undefined behavior |
| Stale state | Use cached/old data with new logic | Inconsistency |
| State pollution | Shared mutable state across calls | Side effect leaks |
| Re-entrance | Function calls itself indirectly | Infinite loop, deadlock |
Category 3: Boundary Attacks
Try to exploit boundaries:
| Attack | Scenario | Expected Failure |
|---|---|---|
| Off-by-one | First/last element, array[length] | IndexError, missing data |
| Empty collection | [], empty Map/Set |
Unexpected behavior on .map(), .reduce() |
| Single item | Array of 1, Map with 1 entry | Logic that assumes >1 items |
| Max limits | Max array size, max string length | Performance degradation |
| Pagination edges | Page 0, page -1, page past end | Error or empty result |
Category 4: Logic Attacks
Try to find logical flaws:
| Attack | Scenario | Expected Failure |
|---|---|---|
| Contradictory inputs | {min: 10, max: 5} |
No validation, unexpected range |
| Impossible states | Admin + banned, published + draft | State machine violation |
| Circular references | Object referencing itself | Stack overflow, infinite loop |
| Time-dependent logic | Timezone differences, DST, leap year | Wrong date calculations |
| Floating point | 0.1 + 0.2 !== 0.3 |
Financial calculation errors |
Category 5: Error Path Attacks
Try to trigger every error path:
| Attack | Scenario | Expected Failure |
|---|---|---|
| Network failure | API returns 500, timeout, DNS failure | Unhandled rejection, crash |
| Permission denied | File system, database, API auth | Silent failure, data loss |
| Resource exhaustion | Disk full, memory full, connection pool exhausted | Crash without graceful degradation |
| Partial failure | 3 of 5 batch operations fail | Inconsistent state |
| Corrupted data | Invalid JSON, truncated response, wrong encoding | Parse error, crash |
Step 3: Exploitation Report
For each vulnerability found:
### Vulnerability [N]: [Title]
**Category:** [Input | State | Boundary | Logic | Error Path]
**Severity:** CRITICAL | HIGH | MEDIUM | LOW
**Location:** [file:line]
**Attack:**
[exact input or scenario that triggers the bug]
**Expected behavior:** [what should happen]
**Actual behavior:** [what actually happens]
**Impact:** [data loss | crash | security breach | incorrect result | performance degradation]
**Proof:** [how to reproduce]
**Fix:** [specific code change to prevent this]
Step 4: Hardening Recommendations
For each vulnerability, provide a concrete fix:
| Pattern | Fix |
|---|---|
| Missing null check | Add guard clause or optional chaining |
| No input validation | Add Zod/Yup schema validation at boundary |
| Race condition | Add mutex, optimistic locking, or idempotency |
| Injection vulnerability | Parameterized queries, input sanitization |
| Missing error handling | Add try/catch with specific error types |
| Unsafe type assertion | Replace as with runtime type guard |
Step 5: Resilience Score
Rate the target on a 0-10 scale:
| Score | Rating | Meaning |
|---|---|---|
| 9-10 | Fortress | Battle-hardened, handles all attack categories |
| 7-8 | Solid | Handles most attacks, minor edge case gaps |
| 5-6 | Average | Common attacks handled, but gaps in 1-2 categories |
| 3-4 | Fragile | Multiple attack categories succeed |
| 0-2 | Vulnerable | Critical flaws, immediate fixes needed |
Formula: Start at 10, subtract points per vulnerability:
- CRITICAL: -3 points
- HIGH: -2 points
- MEDIUM: -1 point
- LOW: -0.5 points (minimum score: 0)
Verification Protocol
Before claiming the grill is complete:
- All 5 attack categories were tested (input, state, boundary, logic, error path)
- Every vulnerability has a severity level and file:line reference
- Every vulnerability has a proof (reproducible attack)
- Every vulnerability has a concrete fix recommendation
- Resilience score was calculated using the formula
- At least 10 distinct attacks were attempted across all categories
- Type assertions and unsafe patterns were specifically targeted
Anti-Rationalization
| Excuse | Reality |
|---|---|
| "The tests already cover edge cases" | Tests cover what you thought of. Grilling covers what you didn't. Testers think in happy paths; attackers think in failure modes. |
| "It's internal code, nobody will send bad input" | Internal code today becomes API tomorrow. Internal doesn't mean safe. One refactor exposes it. |
| "We validate at the API layer" | Defense in depth. If the API layer has a bug, what stops the attack downstream? |
| "Edge cases are rare in production" | Rare cases cause production incidents. Murphy's law is not optional. The rarer the case, the less likely you tested it. |
| "This is overkill for a simple function" | Simple functions in critical paths (auth, payments, data) deserve maximum scrutiny. Simplicity doesn't equal safety. |
| "TypeScript prevents these issues" | TypeScript prevents type errors at compile time. It doesn't prevent logic errors, race conditions, or injection attacks at runtime. |
| "Nobody would actually try this" | Automated scanners, fuzzing tools, and malicious actors try exactly these attacks. If you don't, they will. |
Rules
- Assume broken - Your job is to break the code, not validate it
- All 5 categories - Never skip an attack category
- Proof required - Every vulnerability needs a reproducible attack scenario
- Read-only - Never modify files during a grill (fixes come after)
- No false positives - Only report vulnerabilities you can prove
- Severity justified - Every rating needs a specific impact description
- Fix included - Every vulnerability includes a concrete remediation
- Target scope - Only grill the specified target, don't drift to other files
- Type assertions are targets - Every
as,!, and@ts-ignoreis suspicious until proven safe - Score is mechanical - Follow the formula, no subjective adjustments
Output
──── /grill ────
Target: $ARGUMENTS
Contract: [inputs → outputs]
Trust boundaries: [N identified]
Attacks Attempted: [N]
Input: [N attempted] → [N succeeded]
State: [N attempted] → [N succeeded]
Boundary: [N attempted] → [N succeeded]
Logic: [N attempted] → [N succeeded]
Error: [N attempted] → [N succeeded]
Vulnerabilities Found: [N]
CRITICAL: N
HIGH: N
MEDIUM: N
LOW: N
Top Vulnerabilities:
1. [CRITICAL] [description] - [file:line]
2. [HIGH] [description] - [file:line]
3. [HIGH] [description] - [file:line]
Resilience Score: [N]/10 ([rating])
──── Grill complete ────