adversarial-stress-testing - SKILL.md Agent Skill

name: adversarial-stress-testing description: 'Campaign: Logical extreme and boundary testing via reductio ad absurdum and edge-case analysis. Core question: Does this artifact collapse under logical limits and boundary conditions? Methods: Lakatos 1976, Dutilh Novaes 2016, BVA, Flyvbjerg Critical Case, Popper.' type: campaign produces: AdversarialStressReport artifact-types:

gap
hypothesis
research-question
idea
approach
experiment-design
claim dependencies: strategies:
- assumption-negation
- boundary-enumeration
- critical-case-design
- lakatos-heuristics
- stress-test-validity-envelope-mapping tactics:
- boundary-probing
- contradiction-derivation
- counterexample-heuristics sops:
- context-checkpoint
- context-init
- mitigation-proposal
- stress-test-saturation-detection
- verdict-synthesis
- weakness-classification

Adversarial Stress Testing

Core Question: Does this artifact collapse under logical limits and boundary conditions?

Methodology Sources

Lakatos (1976) — Proofs and Refutations: counterexample-driven refinement
Dutilh Novaes (2016) — Adversarial argumentation as dialogical practice
Clarke BVA — Boundary Value Analysis for systematic edge testing
Flyvbjerg (2006) — Critical case methodology: most-likely/least-likely selection
Popper (1959) — Falsificationism: seek conditions where claims break

Strategy Routing

Artifact Type	Primary Strategy	Rationale
claim, hypothesis	assumption-negation	Direct logical attack
gap, research-question	lakatos-heuristics	Counterexample refinement
idea, approach	boundary-enumeration	Parameter space testing
experiment-design	critical-case-design	Decisive test selection
any (synthesis)	validity-envelope-mapping	Comprehensive envelope

Budget Table

Resource	S	M	L
Negation derivation chains	3	6	10
Counterexamples/boundary cases	5	12	25
Parameter dimensions	3	6	10
Validity envelope dimensions	2	4	6

Tactics

contradiction-derivation — Negate, derive, detect contradiction
boundary-probing — Map parameter space, test extremes, find breakpoints
counterexample-heuristics — Generate monsters, bar or incorporate

Context Management

Persist derivation chains and counterexamples across rounds
Track which negations produced genuine contradictions vs. benign outcomes
Accumulate validity envelope boundaries incrementally

Output

Produces AdversarialStressReport containing: identified breakpoints, validity envelope, surviving refined claims, and confidence assessment.

Available Strategies

Optional, no fixed order; the final leaf is always a sop.

Strategy	When to use
assumption-negation	Classic reductio ad absurdum: negate the core claim, derive logical consequences, seek contradiction or absurdity.
boundary-enumeration	Systematic Boundary Value Analysis: identify parameter boundaries, test at and beyond limits, detect breakpoints.
critical-case-design	Flyvbjerg critical case methodology: select most-likely and least-likely cases to maximize inferential power.
lakatos-heuristics	Proofs and Refutations method: generate counterexamples, attempt monster-barring, incorporate surviving counterexamples as lemma refinements.
stress-test-validity-envelope-mapping	Map the complete validity envelope of a claim across all relevant dimensions, synthesizing breakpoints into a bounded region.

Available Tactics

Optional, no fixed order; the final leaf is always a sop.

Tactic	When to use
boundary-probing	Map parameter space, generate extreme values, test at boundaries, detect breakpoints, synthesize validity envelope.
contradiction-derivation	Negate a claim, derive logical consequences step by step, detect whether a genuine contradiction or absurdity emerges.
counterexample-heuristics	Generate counterexamples (monsters), attempt monster-barring, incorporate surviving counterexamples as lemma refinements (Lakatos method).

Available SOPs

Optional, no fixed order; the final leaf is always a sop.

SOP	When to use
context-checkpoint	Append research process and results to the current Phase's context file. Each append MUST contain >=500 lines of markdown covering both process and results. Use this skill at plan-designated checkpoint points — typically after each strategy completes or at key decision nodes within a research Phase.
context-init	Create a new context file for a research Phase. Called once at Phase start to initialize the file that subsequent context-checkpoint calls will append to. Use this skill whenever a new research Phase begins and a fresh context file is needed.
mitigation-proposal	Proposes concrete mitigation strategies for identified weaknesses. Generates prevention, detection, and response measures with feasibility assessment.
stress-test-saturation-detection	Determines whether validation has reached saturation — no new weaknesses or failure modes being discovered. Used by all 5 campaigns as termination signal.
verdict-synthesis	Synthesizes findings from a completed campaign into typed verdict reports. Produces DebateVerdict, RedTeamReport, FailureAnticipationReport, CounterfactualMap, or AdversarialStressReport depending on campaign. Also supports cross-campaign StressTestSummary.
weakness-classification	Classifies discovered weaknesses into severity tiers (fatal/major/minor/cosmetic) with structured justification and exploitability assessment.