name: inversion-thinking description: Proactive failure analysis using the Saboteur Method. Use when designing critical systems, security features, or debugging root causes. license: MIT metadata: version: "1.1"
Inversion Thinking (The Saboteur Method)
"Tell me where I'm going to die so I will never go there." - Charlie Munger
When to Use
- Before writing complex code (Architecture, Security, Critical features).
- During debugging (to find root causes).
- When a plan seems "too simple" or "optimistic".
When NOT to Use This Skill
- Non-adversarial contexts: When the "enemy" is nature/probability, not malice—use probabilistic risk analysis instead.
- Low-stakes/one-off tasks: When failure impact is acceptable and full protocol overhead isn't warranted.
- Well-understood domains: When standard safeguards (frameworks, libraries) already cover known failure modes.
- Time-critical situations: Use the condensed 3-step protocol or defer to rubber-ducking.
Skill Integration Cross-References
- Before Inversion: Use first-principles-thinking to identify base assumptions that become failure targets
- Combined Protocol: inversion-thinking → systems-thinking to check safeguard cascades
- Complementary: Use decision-matrix when weighing multiple safeguard options against cost
- Validation: Use rubber-ducking to verify saboteur reasoning is sound
Severity Tiers
Prioritize failure modes by severity. Under time pressure, focus on Tier 1:
| Tier | Category | Examples | Priority |
|---|---|---|---|
| 1 - Critical | Injection, Auth bypass, Data exposure | SQL injection, session hijacking, unencrypted PII | Audit first |
| 2 - High | Race conditions, Privilege escalation | TOCTOU, IDOR, parameter tampering | Audit second |
| 3 - Medium | DoS, Memory corruption | Resource exhaustion, buffer overflow (managed langs: lower) | Audit third |
| 4 - Low | Social engineering, Informational | Phishing, path traversal (read-only) | Audit last |
[!IMPORTANT] STOP condition: All Tier 1 and Tier 2 failure modes MUST have safeguards before proceeding. If time expires, flag Tier 1 gaps explicitly.
Pareto note: ~80% of incidents come from Tier 1. MUST audit Tier 1 before all else. If time-constrained, audit only Tier 1.
Timeboxing
- Full protocol: 30 minutes maximum
- Condensed protocol: 10 minutes maximum
- Per failure mode: 5 minutes investigation max—if unresolved, log as "unvalidated" and escalate
- Stop condition: All Tier 1 and Tier 2 failure modes have safeguards OR time expires
The Protocol: Become the Saboteur
Do not ask "Does this work?". Ask "How do I break this?". Follow these 4 steps explicitly.
1. Define the Anti-Goal
Explicitly write down the worst-case outcome.
- Goal: "Keep user data safe."
- Anti-Goal: "Leak all user data to the public internet."
Identify Assumptions
Before simulating sabotage, list the implicit assumptions the system makes:
- About trust boundaries (what inputs are trusted?)
- About input validity (what format is expected?)
- About timing (what sequencing is assumed?)
- About environment (what infrastructure is relied upon?)
The Anti-Goal should directly violate at least one of these assumptions.
2. The Saboteur Simulation
Adopt the persona of a malicious actor or the embodiment of Entropy (Murphy's Law).
Structured Attacker Framework
Use MITRE ATT&CK or STRIDE as mental scaffolding:
- STRIDE categories: Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege
- ATT&CK tactics: Initial Access → Execution → Persistence → Privilege Escalation → Defense Evasion → Credential Access → Discovery → Lateral Movement → Collection → Exfiltration → Impact
Ask: "I want to achieve the Anti-Goal. What is the easiest way to do it?"
[!IMPORTANT] CRITICAL: If Tier 1 (Critical) failure modes are found, they MUST have safeguards before proceeding.
3. Proof of Fragility
List concrete, specific pathways to failure. Use references/failure_modes.md for inspiration.
- "I will send a 10GB file to crash the memory."
- "I will send a request with
admin=true." - "I will unplug the database cable mid-transaction."
Validation Step
For each failure pathway, verify it is actually exploitable in your architecture:
- Is the attack surface exposed? (network, API, UI?)
- Does the vulnerability actually exist in your implementation?
- Can you demonstrate exploitability (even theoretically)?
Log unvalidated failures separately. If you cannot verify exploitability, treat as lower confidence.
4. Inversion (The Fix)
Design specific safeguards to block those pathways.
- "Enforce 10MB limit (HTTP 413)."
- "Validate permissions on server side."
- "Use atomic transactions with rollback."
Safeguard Interaction Analysis
Map safeguards as a system—not independent fixes:
| Safeguard | Protected Attack Vector | Dependencies | May Create |
|---|---|---|---|
| Rate limiting | DoS | Redis, clock sync | Legitimate users blocked |
| Input validation | Injection | Parsing library | False positives |
Ask:
- Where do safeguards overlap? (redundancy is good)
- Where are there gaps? (single point of failure)
- Does Safeguard X create dependency on Safeguard Y?
- Does a failed Safeguard Y cause cascading failure?
Second-Order Check
For each safeguard, ask:
- How will users work around this? (security friction creates new vulnerabilities)
- What new attack surface does this create?
- What is the maintenance cost over 3 years?
If safeguard backfires, reconsider the design.
Condensed Protocol (Time-Pressured)
When under 10 minutes:
- Define Anti-Goal (1 min): State the worst case in one sentence.
- Saboteur + Fix pairs (7 min): For each failure mode, immediately state the counter-safeguard. Skip catalog lookup.
- Verify safeguards (2 min): Spot-check one safeguard for interaction effects.
Meta-Inversion: When Inversion Fails
Apply inversion to the skill itself:
- Anti-Goal of inversion-thinking: "Never anticipate failure; assume everything works on first try."
- Saboteur Simulation: "How do I ensure this skill gets skipped?" → Make it too complex, too vague, or too time-consuming.
- Proof of Fragility: The skill fails when:
- Practitioners are under time pressure (shortcut the protocol)
- The skill produces false confidence (checked obvious cases, missed creative ones)
- Safeguards become the attack surface (over-engineered → new vulnerabilities)
- Checklist mentality replaces genuine attacker thinking
When NOT to trust inversion-thinking output:
- Catalog-driven reasoning (relying on failure_modes.md without genuine attack trees)
- No Tier 1 findings after 30 minutes (either system is secure or analysis was superficial)
- Safeguards require >20% of implementation effort (may indicate over-engineering)
Self-Improvement Protocol
Log only novel failure modes or safeguard failures.
## [YYYY-MM-DD] {Brief Description}
**Failure mode**: {what was discovered}
**Safeguard**: {what was added}
**Outcome**: {did it work?}
---
Promote after 3+ similar discoveries.
Resources
Evaluations
Eval 1: Tier 1 Injection Prevention
Scenario: User wants to add file upload feature. Code has system("cat " + filename).
Expected: Identifies as Critical Tier, finds injection vulnerability, proposes input validation + sanitization.
Pass criteria: MUST identify shell injection as Tier 1, produces specific safeguard (not just "be careful").
Eval 2: Safeguard Interaction Cascade
Scenario: Adding rate limiting + IP blocklist. Both involve Redis. Expected: Identifies safeguard dependencies, asks about Redis failure modes, checks for cascading failure. Pass criteria: Maps safeguard dependencies, identifies single point of failure in Redis.
Eval 3: Meta-Inversion (Skill Self-Check)
Scenario: Skill produces 0 Tier 1 findings after 30 minutes. Expected: Triggers meta-inversion warning — either system is unusually secure OR analysis was superficial. Pass criteria: Correctly identifies false confidence risk, suggests deeper investigation.