autonomy-ladder - SKILL.md Agent Skill

name: autonomy-ladder description: Design and manage a progressive autonomy system (L0–L3) for AI agents, with defined promotion criteria, demotion triggers, and calibrated review cadences at each level. when-to-use: Use when deploying any autonomous agent or automated decision-making workflow. Use when an existing agent's performance warrants promotion to higher autonomy. Use after any incident to review demotion appropriateness. principles: [Autonomy Ladder, HITL Design, Rate of Improvement, Confidence Scoring]

Autonomy Ladder Skill

Purpose

Manage how much autonomy an agent has in a disciplined, evidence-based way. Autonomy is earned through demonstrated performance — never granted upfront.

Agent Instructions

You are an agent autonomy manager and performance reviewer.

The Four Levels

Level	Name	What the Agent Does	Human Role
L0	Observe	Detects patterns, reports findings; proposes nothing	Reviews daily brief; decides all responses
L1	Recommend	Proposes actions with expected value and risk; presents options	Reviews recommendations; decides what to execute
L2	Approve-to-Execute	Generates complete executable payload; executes only after explicit approval	Approves or rejects in review interface
L3	Guardrailed Auto	Executes autonomously within guardrail envelope; alerts on anomalies	Audits weekly; responds to alerts; reviews anomalies

graph LR
    L0["L0
OBSERVE
Detects and reports
No proposals
No execution"]
    L1["L1
RECOMMEND
Proposes actions
Expected value
Human decides"]
    L2["L2
APPROVE-TO-EXEC
Full payload ready
Human approves
Then executes"]
    L3["L3
GUARDRAILED AUTO
Executes in envelope
Alerts on anomaly
Weekly audit"]

    L0 -->|"PROMOTE: 4+ weeks
accurate detection
no incidents"| L1
    L1 -->|"PROMOTE: 4+ weeks
ROI above threshold
errors below 1%"| L2
    L2 -->|"PROMOTE: 4+ weeks
provable value
guardrail adherence"| L3

    L3 -->|"DEMOTE:
anomaly detected"| L2
    L2 -->|"DEMOTE:
incident occurs"| L1
    L1 -->|"DEMOTE:
kill switch"| L0

    NEW["New agent
or capability"] -->|"Always start here"| L0

    style L0 fill:#1a3a5c,color:#fff,stroke:#4a9ede
    style L1 fill:#1a4a2e,color:#fff,stroke:#4ade80
    style L2 fill:#3a3a1a,color:#fff,stroke:#facc15
    style L3 fill:#1a4a2e,color:#fff,stroke:#4ade80
    style NEW fill:#2a2a2a,color:#fff,stroke:#6b7280

Promotion Criteria (L→L+1)

ALL of the following must be true for 4+ consecutive weeks:

Criterion	How to Verify
ROI > defined threshold (e.g., 5×)	Positive expected value realized from executed actions
Error rate < defined threshold (e.g., 1%)	Percentage of actions producing unintended outcomes
No incidents attributable to this agent	No human-reported problems triggered by agent actions
Guardrail adherence	No guardrail violations (blocked actions indicate miscalibration)

Promotion process:

Review evidence for all 4 criteria across last 4 weeks
Document the evidence in an autonomy review record
Human decision: promote, hold, or demote
Update agent spec with new autonomy level
Notify team of change and new review schedule

Demotion Triggers (immediate, any one sufficient)

Trigger	Action
Severe anomaly (spend spike, data breach, system failure)	Demote to L0 immediately; activate kill switch
Incident attributable to agent automation	Demote to L1; investigate root cause before re-promotion
Manual kill switch activation	Demote to L2 pending investigation
Sustained error rate > threshold	Demote one level; recalibrate before review
Data quality failure	Suspend (hold at current level, pause execution)

Demotion is not failure — it is the system working correctly. An agent that gets demoted and is recalibrated before re-promotion is safer than one that was never demoted.

Review Cadence by Level

Level	Review After	What to Review
L0	2 weeks	Is detection accurate? Are reports actionable?
L1	4 weeks	Are recommendations high quality? What % are accepted?
L2	6 weeks	Are approvals routine or frequently rejected? Is confidence calibrated?
L3	Weekly alert review	Anomaly log; guardrail violations; KPI trends

Starting Level for New Agents

Always start new capabilities at L0:

Observe first; understand what the agent actually detects
Verify accuracy of detection before allowing recommendations
This is not optional, even for "simple" cases

Applying to Batches of Action Types

An agent can be at different levels for different action types:

Action type A (well-tested): L3
Action type B (new capability): L0
Action type C (medium risk): L2

Manage autonomy at the action-class level, not just the agent level.

Output Format

Autonomy review document:

Current level and action-type breakdown
Promotion evidence (4 criteria × 4 weeks)
Promotion/hold/demotion recommendation with rationale
New review schedule
Any open anomalies or incidents to resolve before re-promotion