deep-thinker-v4 - SKILL.md Agent Skill

name: deep-thinker-v4 description: 'Deep Thinker v4 — pure thinking mode. ONLY THINKS, no execution. Maximum cognitive depth. Creates persistent knowledge base in .deep-think/ that survives context compaction. Targets 6 specific failure modes: satisficing, memory loss, single-path exploration, confirmation bias, missed complexity, and executor-unfriendly output. Use when planning complex features, architectural decisions, before any significant implementation, when building foundations, or when explicitly asked to "think deeply", "ultrathink", "architect this", "deep dive", "plan thoroughly", "/think", "before we code", "analyze completely", "think through", or "comprehensive analysis". Sibling of the v1 deep-thinker skill — invoke this slug explicitly to test v4 behavior side-by-side.'

Deep Thinker v4.0

Pure thinking engine. Every instruction targets a specific cognitive failure mode.

Aspect	Detail
Purpose	Think better than Claude would without this skill
Output	`.deep-think/` — analysis artifacts + agent-executable briefs
Core Value	Broader (DIVERGE), deeper (STRESS), more useful to executors (SYNTHESIZE)
Pipeline	Feeds into super-pipeline via ThinkingCompiler adapter

Why This Exists

Without it, Claude: (1) satisfices, (2) loses plans to compaction, (3) explores one path, (4) doesn't attack its own conclusions, (5) misses complexity, (6) produces executor-unfriendly output. Extended thinking helps #1. This skill fixes #2-#6.

Core Rules

NEVER write implementation code — analysis, plans, pseudocode only
NEVER rush — thoroughness over speed, always
ALWAYS write to .deep-think/ files — they survive compaction
ALWAYS use maximum thinking depth — ultrathink on everything
NO execution — a separate session handles that

Complexity Gate (30 Seconds — Do This First)

Load references/complexity-estimation.md for the full framework.

Level	Signals	Files	Refs Loaded
Trivial	Single file, obvious fix	0 (inline)	0
Small	One component, bounded	3	2: self-review, task-decomposition
Medium	Multiple components, integration	6	5: + complexity-estimation, failure-recovery, context-engineering
Complex	Cross-system, architectural	8	All 9

4 heuristic questions: (1) One file or many? (2) Security/data implications? (3) Reversible? (4) Would executor have questions? Bump up one level for each "yes" to 2/3/4.

When in doubt, go deeper. Overthinking costs tokens. Underthinking costs rework.

The 5 Phases

SCOPE → GROUND → DIVERGE → STRESS → SYNTHESIZE
  |        |         |         |          |
 (#5)     (#1)      (#3)     (#4)       (#6)
 Finds    Prevents  Forces   Attacks    Produces
 real     premature multiple own        agent-ready
 complexity solutions paths   design    outputs

Progressive writing rule: Write and save each file BEFORE starting the next phase. Never hold analysis in working memory.

Phase 1: SCOPE → writes OVERVIEW.md

Target: failure #5 (missed complexity). Verify the problem statement before designing.

Ask: (1) What's the REAL problem? (often different from what was asked) (2) What are we NOT solving? (3) What are the HARD constraints? (security, compliance, backwards compat, perf) (4) What's the blast radius if we get this wrong?

If your scope matches the request word-for-word, you haven't thought hard enough. Checkpoint: "What would invalidate this scope?"

Phase 2: GROUND → writes CURRENT_STATE.md

Target: failure #1 (satisficing). No theorizing without reading code first.

Do: (1) READ every affected file (mandatory) (2) Map ACTUAL data flow, not assumed (3) Identify existing patterns to follow or break (4) Document SURPRISES — things that aren't what you expected.

If nothing surprises you, you're not looking hard enough.

Kill gate: After reading code, reassess complexity. If simpler than scoped → DOWNGRADE (fewer files, skip deeper analysis). Checkpoint: "State your current favorite approach. Now it's forbidden."

Phase 3: DIVERGE → writes ARCHITECTURE.md

Target: failure #3 (single-path exploration). Forces genuinely different approaches, not cosmetic variations.

Structured Disagreement — three approaches by construction:

1. THE OBVIOUS: First solution that comes to mind (what Claude would produce WITHOUT the skill)
2. THE CONTRARIAN: If the obvious were forbidden, what would you do? (genuine innovation)
3. THE MINIMUM: Absolute smallest change that solves the problem (prevents over-engineering)

For each approach, document ALL FOUR:

Strongest argument FOR (one sentence)
Strongest argument AGAINST (one sentence)
Confidence: VERIFIED (code evidence) / HIGH (pattern match) / MEDIUM (reasoning only) / LOW (assumption)
Would fail if: [specific scenario making this the WRONG choice]

Select one. Rationale MUST address the AGAINST arguments. Document what was rejected and WHY — so executor doesn't revisit dead ends.

Transition checkpoint: "What evidence would change your selection?"

Phase 4: STRESS → writes EDGE_CASES.md

Target: failure #4 (confirmation bias). Attack your own design.

4a. Pre-Mortem (MANDATORY): "It's 2 weeks post-deployment. This caused an incident. What happened?" Write 3 failure stories — narrative, not bullets. Each connects multiple linked failures into a causal chain revealing cascading failures tables miss.

4b. Constraint Verification: For EACH hard constraint from Phase 1: "Does the approach satisfy this? HOW specifically?" Forces checking, not assumption.

4c. Systematic Edge Sweep:

Dimension	Normal	Boundary	Adversarial
Input	Valid data	Empty/max	Injection/malformed
State	Expected	Partially done	Corrupted
Timing	Normal	Concurrent	Timeout/race
External	All up	Degraded	Down/changed

4d. Single Biggest Risk — one paragraph forcing prioritization.

Transition checkpoint: "What did the pre-mortem reveal that you'd prefer to ignore?"

Phase 5: SYNTHESIZE → writes IMPLEMENTATION.md → EXECUTION_CHECKLIST.md (last)

Target: failure #6 (executor-unfriendly output). The executor is another Claude session.

IMPLEMENTATION.md: Micro-steps with file:line references, dependencies marked, patterns to follow.

EXECUTION_CHECKLIST.md — self-contained executor brief (adapter-compatible format):

CONFIDENCE:        Analysis [H/M/L] × Execution [H/M/L] — reason
CONTEXT:           Self-contained — executor needs nothing else
APPROACH:          Selected design + WHY (enables executor judgment calls)
STEPS:             Phased, with file:line refs + dependency markers
RISKS:             Pre-mortem findings — specific failure modes to watch
SKILLS:            Which skills to activate per phase
KNOWN GAPS:        Classified: No-Regrets | Options | Blockers

Complex only: Also write OPTIMIZATIONS.md and CREATIVE_IDEAS.md before EXECUTION_CHECKLIST.

File Output

File	Phase	Content
`OVERVIEW.md`	1-SCOPE	Scope, constraints, blast radius
`CURRENT_STATE.md`	2-GROUND	What exists, surprises, patterns
`ARCHITECTURE.md`	3-DIVERGE	3 approaches, selection, rationale
`EDGE_CASES.md`	4-STRESS	Pre-mortem, constraints, attack surface
`IMPLEMENTATION.md`	5-SYNTHESIZE	Micro-steps with file:line
`OPTIMIZATIONS.md`	Complex only	Performance/quality refinements
`CREATIVE_IDEAS.md`	Complex only	Beyond requirements
`EXECUTION_CHECKLIST.md`	5-SYNTHESIZE	Self-contained executor brief

Self-Reflection (Max 2 Iterations)

After all files, run adversarial self-review: (1) "Could the executor build with ZERO questions?" (2) "Unmitigated critical risks from pre-mortem?" (3) "Does ARCHITECTURE selection survive its AGAINST arguments?" (4) "Does every file change an executor decision?" (So What? test) (5) "Would I bet my reputation on this?"

Gap found → surgical update to specific file only. After 2 iterations, remaining gaps → EXECUTION_CHECKLIST.md KNOWN GAPS. Fix or flag — never silently accept gaps.

Expert Selection

Match to domain — not generic panels. Select 2-4 experts with natural tension. Always include a Red Teamer (finds how the design breaks). Experts enrich Phases 2-4 but do NOT replace Structured Disagreement — those are three approaches, not three opinions. Load references/skill-integration.md for skill annotations.

Success Criteria (Tiered)

All levels:

Complexity assessed, correct file count produced
Self-reflection completed (1-2 iterations)
Gaps fixed or flagged — none silently accepted

Small+:

OVERVIEW has scope boundaries and hard constraints
ARCHITECTURE has 3 genuinely different approaches with confidence tags
EDGE_CASES has at least one pre-mortem failure story

Medium+:

CURRENT_STATE has file:line references and documented surprises
Constraint verification table complete
EXECUTION_CHECKLIST has two-dimensional confidence + skill annotations

Complex:

3 narrative pre-mortem stories with cascading causes
KNOWN GAPS classified (No-Regrets / Options / Blockers)
Executor can build with ZERO questions AND no unmitigated critical risks

Anti-Patterns

Pattern	Problem	Fix
Agreement Theater	3 "approaches" = same idea	Obvious/Contrarian/Minimum can't converge by construction
Risk Lists	Misses cascading failures	Pre-mortem narratives: "what DID go wrong"
Template Filling	8 shallow files	Success = thinking quality, not file count
Anchoring	First idea dominates	GROUND→DIVERGE checkpoint: "favorite approach is now forbidden"
Optimism Bias	"This should work"	STRESS→SYNTHESIZE checkpoint: "what would you prefer to ignore?"

Example

User: "Add mobile number login alongside email"

SCOPE: Real problem = multi-identifier auth. Hard constraints: existing sessions can't break, OTP rate limits, country codes. Blast radius: auth is foundational.

DIVERGE:

OBVIOUS: Parallel OTP (email+SMS), shared session → Confidence: HIGH → Fails if: SMS unreliable in target markets
CONTRARIAN: Universal identifier with smart routing → Confidence: MEDIUM → Fails if: ambiguous identifiers (numeric emails?)
MINIMUM: Phone as alias, reuse email OTP infra + SMS adapter → Confidence: VERIFIED (pattern exists in codebase) → Fails if: users want phone-only accounts

STRESS Pre-Mortem: "Week 2. Rural India. SMS OTP delivery: 73%. Retry logic assumes 30s delivery but carriers delay 2+ min. Users hammer resend → rate limiting triggers → 400% support spike."

Executor brief: Analysis HIGH × Execution MEDIUM. Known gap (Options): SMS delivery reliability unverified for target markets.

Deep Thinker v4.0 — Cognitive failure mode compensation engine SCOPE → GROUND → DIVERGE → STRESS → SYNTHESIZE

Foundations: Klein's Pre-Mortem, Heuer's Analysis of Competing Hypotheses, McKinsey Pyramid Principle, Kahneman's Outside View, Research-Analyst Procedural Debiasing