confidence-levels

star 2

Force honest confidence assessment. Express confidence as percentage, explain gaps, validate assumptions before presenting conclusions.

jagreehal By jagreehal schedule Updated 1/11/2026

name: confidence-levels description: Forces honest, calibrated confidence assessment instead of vague certainty. Expresses confidence as a percentage, shows the evidence behind it, explains every gap below 100%, and gathers more evidence before concluding when possible. Use when about to state a root cause, diagnosis, or conclusion ("the problem is", "definitely", "clearly the issue", "complete clarity"), when an answer would mislead if it looked more certain than it is, or any time you're presenting an investigation result the user will act on. version: 1.1.0

Confidence Levels

Overview

Express confidence as a percentage, not vague certainty.

A thorough analysis that looks certain but isn't can mislead users into wrong decisions. The danger is conflating explanation quality with evidence quality: a detailed, well-written report feels authoritative even when it rests on unverified assumptions. Thoroughness of presentation is not correctness. A percentage forces you to separate what you've proven from what you've guessed, and showing the math lets the user see where the uncertainty lives.

When to Use

Apply this skill when:

  • About to state a root cause, diagnosis, or conclusion the user will act on
  • Presenting investigation or debugging results
  • About to use conclusive language ("definitely", "certainly", "obviously", "the problem is", "complete clarity")
  • Making a non-obvious claim that the type system or a test cannot verify

When NOT to use: mechanical facts you can confirm directly (a file exists, a test passes), or when the user explicitly wants a fast guess and has signalled they'll verify it themselves. Don't theatre-wrap a trivially checkable fact in a percentage.

Related: research-first is the upstream half: gather evidence before concluding, and this skill scores what that evidence supports. critical-peer challenges conclusions that lack evidence. concise-output keeps the confidence block tight, not padded with hedging prose.

Critical Rules

Rule Enforcement
Express confidence as % Not "probably". Use "70% confident"
Explain gaps below 95% Mandatory "Why not 100%?"
Validate before presenting If you can gather evidence, do it
Show your math Evidence adds confidence, gaps subtract

Confidence Scale

Range Icon Meaning
0-30% ๐Ÿ”ด Speculation, needs significant validation
31-60% ๐ŸŸก Plausible, evidence exists but gaps remain
61-85% ๐ŸŸ  Likely, strong evidence, minor gaps
86-94% ๐ŸŸข High confidence, validated, minor uncertainty
95-100% ๐Ÿ’ฏ Confirmed, fully validated

Calibration Guide

Level Meaning
20% One possibility among several
40% Evidence points this direction, key assumptions unverified
60% Evidence supports this, alternatives not ruled out
80% Strong evidence, assumptions verified, alternatives less likely
95% Validated with direct evidence, alternatives ruled out
100% Mathematical/logical certainty only

Pre-Conclusion Checkpoint

Before claiming ANY conclusion, complete this:

  1. Evidence Inventory: What hard evidence supports this? Is it direct (code/logs that prove it)? What's the strongest piece?
  2. Falsifiability Check: What would INVALIDATE this theory? Have I looked for that data? If no: why not?
  3. Assumption Audit: What am I assuming WITHOUT verification? Mark each [VERIFIED] or [ASSUMED].
  4. Alternative Possibilities: What else could explain these symptoms? Why is my conclusion more likely?

Confidence Scoring

Start at 50% (neutral) and adjust:

Factor Adjustment
Direct evidence (code/logs proving it) +15-25%
Verified assumptions (checked, not assumed) +10-15%
Alternatives ruled out with evidence +10-15%
Falsifiability check completed +5-10%
Unverified critical assumptions -15-25%
Plausible alternatives not ruled out -10-15%
No falsifiability check -10%
Circumstantial evidence only -10%

Response Format

[Icon] [X%] Confidence: [One sentence finding]

Evidence:
- [+X%] [Evidence 1]
- [+X%] [Evidence 2]

Why not 100%:
- [-X%] [Gap 1]

To increase confidence: [What's needed]

Examples

WRONG: Premature Certainty

Claude: "I have complete clarity on the root cause. The issue
is definitely the database connection pool being exhausted.
Here's my detailed analysis..."

[200 lines of report]

User: [provides data that contradicts analysis]

Claude: "This changes everything..."

Problem: Claimed certainty with unverified assumptions.

CORRECT: Honest Assessment

๐ŸŸก 40% Confidence: Database connection pool may be exhausted

Evidence:
- [+15%] Error pattern matches connection timeout
- [+10%] Load spike occurred around failure time

Why not 100%:
- [-25%] Haven't verified actual pool metrics
- [-15%] Haven't ruled out: network issues, DNS problems
- [-10%] This is mostly pattern matching, not direct evidence

To increase confidence: I need the connection pool metrics
from the time of failure. Can you provide those?

WRONG: Confidence in Explanation Quality

Claude: "Based on my thorough analysis of the code flow,
I'm certain the bug is in the authentication middleware."

Problem: Thoroughness of presentation โ‰  correctness. Detailed reasoning without verified evidence is speculation.

CORRECT: Evidence-Based Confidence

๐ŸŸ  75% Confidence: Bug is in authentication middleware

Evidence:
- [+20%] Error message explicitly mentions "token expired"
- [+15%] Code trace shows validation at auth/middleware.ts:45
- [+10%] Token timestamp is from 2 hours ago

Why not 100%:
- [-10%] Haven't verified the clock sync between servers
- [-15%] Could be a different middleware in the chain

To increase confidence: Let me check the server time sync
and trace the full middleware chain.

Self-Validation Rule

Before returning to the user with questions you can answer yourself:

Can I gather more evidence myself?
โ”œโ”€ Search codebase for confirming/denying data?
โ”œโ”€ Fetch a file that validates an assumption?
โ”œโ”€ Check actual state vs assumed state?
โ””โ”€ Run a test to verify?

If YES โ†’ DO IT. Then reassess confidence.
If NO  โ†’ Present with honest confidence + what you need.

If confidence is below 80% and you CAN gather more evidence, gather it first. This is the bridge to research-first.

Trigger Words

Auto-invoke this skill when about to claim:

  • "root cause is", "the problem is"
  • "complete clarity", "definitely", "certainly"
  • "clearly the issue", "obviously"
  • Any conclusive claim during investigation

Common Rationalizations

Rationalization Reality
"My analysis is thorough, so I'm confident" Thoroughness of explanation is not evidence quality. A detailed report on an unverified assumption is still speculation.
"A percentage looks unprofessional" A false certainty that gets contradicted looks far worse. The percentage is what lets the user calibrate trust.
"I'm 90% sure, close enough to just assert it" Then state 90% and the 10% gap. The gap is the information the user needs to decide whether to verify.
"Adding the gap analysis is too verbose" The gap is the most actionable part. It tells the user what evidence would settle it. Keep it concise, don't drop it.
"I can't measure confidence numerically" You don't need precision. The act of subtracting for each unverified assumption is the discipline, not the exact number.

Red Flags

  • "Complete clarity" / "definitely" / "obviously" with no evidence list
  • Presenting a long report whose confidence rests on [ASSUMED] items
  • No "Why not 100%?" section on a non-trivial conclusion
  • Skipping the falsifiability check ("what would prove me wrong?")
  • Returning to the user with a question you could answer by reading a file or running a test
  • "It's probably X": vague certainty with no percentage and no gap

Quick Reference

  • Did I express confidence as a percentage?
  • Did I explain what's stopping 100%?
  • Did I show evidence for the % claimed?
  • Could I gather more evidence myself?
  • Did I check for falsifying evidence?
Install via CLI
npx skills add https://github.com/jagreehal/jagreehal-claude-skills --skill confidence-levels
Repository Details
star Stars 2
call_split Forks 1
navigation Branch main
article Path SKILL.md
More from Creator