name: confidence-levels description: Forces honest, calibrated confidence assessment instead of vague certainty. Expresses confidence as a percentage, shows the evidence behind it, explains every gap below 100%, and gathers more evidence before concluding when possible. Use when about to state a root cause, diagnosis, or conclusion ("the problem is", "definitely", "clearly the issue", "complete clarity"), when an answer would mislead if it looked more certain than it is, or any time you're presenting an investigation result the user will act on. version: 1.1.0
Confidence Levels
Overview
Express confidence as a percentage, not vague certainty.
A thorough analysis that looks certain but isn't can mislead users into wrong decisions. The danger is conflating explanation quality with evidence quality: a detailed, well-written report feels authoritative even when it rests on unverified assumptions. Thoroughness of presentation is not correctness. A percentage forces you to separate what you've proven from what you've guessed, and showing the math lets the user see where the uncertainty lives.
When to Use
Apply this skill when:
- About to state a root cause, diagnosis, or conclusion the user will act on
- Presenting investigation or debugging results
- About to use conclusive language ("definitely", "certainly", "obviously", "the problem is", "complete clarity")
- Making a non-obvious claim that the type system or a test cannot verify
When NOT to use: mechanical facts you can confirm directly (a file exists, a test passes), or when the user explicitly wants a fast guess and has signalled they'll verify it themselves. Don't theatre-wrap a trivially checkable fact in a percentage.
Related: research-first is the upstream half: gather evidence before concluding, and this skill scores what that evidence supports. critical-peer challenges conclusions that lack evidence. concise-output keeps the confidence block tight, not padded with hedging prose.
Critical Rules
| Rule | Enforcement |
|---|---|
| Express confidence as % | Not "probably". Use "70% confident" |
| Explain gaps below 95% | Mandatory "Why not 100%?" |
| Validate before presenting | If you can gather evidence, do it |
| Show your math | Evidence adds confidence, gaps subtract |
Confidence Scale
| Range | Icon | Meaning |
|---|---|---|
| 0-30% | ๐ด | Speculation, needs significant validation |
| 31-60% | ๐ก | Plausible, evidence exists but gaps remain |
| 61-85% | ๐ | Likely, strong evidence, minor gaps |
| 86-94% | ๐ข | High confidence, validated, minor uncertainty |
| 95-100% | ๐ฏ | Confirmed, fully validated |
Calibration Guide
| Level | Meaning |
|---|---|
| 20% | One possibility among several |
| 40% | Evidence points this direction, key assumptions unverified |
| 60% | Evidence supports this, alternatives not ruled out |
| 80% | Strong evidence, assumptions verified, alternatives less likely |
| 95% | Validated with direct evidence, alternatives ruled out |
| 100% | Mathematical/logical certainty only |
Pre-Conclusion Checkpoint
Before claiming ANY conclusion, complete this:
- Evidence Inventory: What hard evidence supports this? Is it direct (code/logs that prove it)? What's the strongest piece?
- Falsifiability Check: What would INVALIDATE this theory? Have I looked for that data? If no: why not?
- Assumption Audit: What am I assuming WITHOUT verification? Mark each
[VERIFIED]or[ASSUMED]. - Alternative Possibilities: What else could explain these symptoms? Why is my conclusion more likely?
Confidence Scoring
Start at 50% (neutral) and adjust:
| Factor | Adjustment |
|---|---|
| Direct evidence (code/logs proving it) | +15-25% |
| Verified assumptions (checked, not assumed) | +10-15% |
| Alternatives ruled out with evidence | +10-15% |
| Falsifiability check completed | +5-10% |
| Unverified critical assumptions | -15-25% |
| Plausible alternatives not ruled out | -10-15% |
| No falsifiability check | -10% |
| Circumstantial evidence only | -10% |
Response Format
[Icon] [X%] Confidence: [One sentence finding]
Evidence:
- [+X%] [Evidence 1]
- [+X%] [Evidence 2]
Why not 100%:
- [-X%] [Gap 1]
To increase confidence: [What's needed]
Examples
WRONG: Premature Certainty
Claude: "I have complete clarity on the root cause. The issue
is definitely the database connection pool being exhausted.
Here's my detailed analysis..."
[200 lines of report]
User: [provides data that contradicts analysis]
Claude: "This changes everything..."
Problem: Claimed certainty with unverified assumptions.
CORRECT: Honest Assessment
๐ก 40% Confidence: Database connection pool may be exhausted
Evidence:
- [+15%] Error pattern matches connection timeout
- [+10%] Load spike occurred around failure time
Why not 100%:
- [-25%] Haven't verified actual pool metrics
- [-15%] Haven't ruled out: network issues, DNS problems
- [-10%] This is mostly pattern matching, not direct evidence
To increase confidence: I need the connection pool metrics
from the time of failure. Can you provide those?
WRONG: Confidence in Explanation Quality
Claude: "Based on my thorough analysis of the code flow,
I'm certain the bug is in the authentication middleware."
Problem: Thoroughness of presentation โ correctness. Detailed reasoning without verified evidence is speculation.
CORRECT: Evidence-Based Confidence
๐ 75% Confidence: Bug is in authentication middleware
Evidence:
- [+20%] Error message explicitly mentions "token expired"
- [+15%] Code trace shows validation at auth/middleware.ts:45
- [+10%] Token timestamp is from 2 hours ago
Why not 100%:
- [-10%] Haven't verified the clock sync between servers
- [-15%] Could be a different middleware in the chain
To increase confidence: Let me check the server time sync
and trace the full middleware chain.
Self-Validation Rule
Before returning to the user with questions you can answer yourself:
Can I gather more evidence myself?
โโ Search codebase for confirming/denying data?
โโ Fetch a file that validates an assumption?
โโ Check actual state vs assumed state?
โโ Run a test to verify?
If YES โ DO IT. Then reassess confidence.
If NO โ Present with honest confidence + what you need.
If confidence is below 80% and you CAN gather more evidence, gather it first. This is the bridge to research-first.
Trigger Words
Auto-invoke this skill when about to claim:
- "root cause is", "the problem is"
- "complete clarity", "definitely", "certainly"
- "clearly the issue", "obviously"
- Any conclusive claim during investigation
Common Rationalizations
| Rationalization | Reality |
|---|---|
| "My analysis is thorough, so I'm confident" | Thoroughness of explanation is not evidence quality. A detailed report on an unverified assumption is still speculation. |
| "A percentage looks unprofessional" | A false certainty that gets contradicted looks far worse. The percentage is what lets the user calibrate trust. |
| "I'm 90% sure, close enough to just assert it" | Then state 90% and the 10% gap. The gap is the information the user needs to decide whether to verify. |
| "Adding the gap analysis is too verbose" | The gap is the most actionable part. It tells the user what evidence would settle it. Keep it concise, don't drop it. |
| "I can't measure confidence numerically" | You don't need precision. The act of subtracting for each unverified assumption is the discipline, not the exact number. |
Red Flags
- "Complete clarity" / "definitely" / "obviously" with no evidence list
- Presenting a long report whose confidence rests on
[ASSUMED]items - No "Why not 100%?" section on a non-trivial conclusion
- Skipping the falsifiability check ("what would prove me wrong?")
- Returning to the user with a question you could answer by reading a file or running a test
- "It's probably X": vague certainty with no percentage and no gap
Quick Reference
- Did I express confidence as a percentage?
- Did I explain what's stopping 100%?
- Did I show evidence for the % claimed?
- Could I gather more evidence myself?
- Did I check for falsifying evidence?