agent-cold-start-diagnostic - SKILL.md Agent Skill

name: agent-cold-start-diagnostic description: Evaluate an existing agent setup (CLAUDE.md, SOUL.md, memory files, cron config, agent.yaml) and score it on operational completeness -- flagging missing decision frameworks, vague delegation instructions, absent escalation rules, and gaps in rhythm/schedule coverage. Use when the user says "cold start diagnostic", "check agent config", "agent health score", "is my agent configured properly", "audit agent setup", or before deploying a new agent persona.

Agent Cold-Start Diagnostic

Score any agent's configuration artifacts against an operational completeness rubric. Catches the gaps that make agents fail silently: missing escalation paths, vague task boundaries, absent rhythms, no error-handling policy.

Different from /doctor (which checks service connectivity) and agent-readiness-audit (which scores a codebase for agent compatibility). This skill scores the agent's own config -- the files that tell it who it is, what it owns, and how to behave.

When to Use

Before deploying a new the agent platform agent persona
After editing an agent's CLAUDE.md / SOUL.md / memory files
As a scheduled weekly audit across all active agents
When an agent is producing poor results and you suspect config weakness

Inputs

Path to agent config directory (or auto-detect from cwd)
Optional: agent name (for multi-agent setups where configs live in subdirectories)

Phase 1: Discover Config Artifacts

Scan the target directory for all configuration files an agent might consume:

File	Purpose	Required?
`CLAUDE.md`	Core instructions, rules, conventions	Yes
`SOUL.md` / `USER.md`	Persona, user context, relationship model	Recommended
`agent.yaml`	Skill manifest, dispatch config	Yes (multi-agent)
`HEARTBEAT.md`	Recurring rhythms, scheduled checks	Recommended
`memory/*.md`	Persistent memory files	Recommended
`cron schedule`	Scheduled tasks (crontab, schedule-cli)	If agent has recurring duties
`.env` / `env.shared`	Environment variables, API keys	Check for referenced keys
`hooks/`	Pre/Post tool-use hooks	Optional

Report: found artifacts, missing artifacts, unexpected files.

Phase 2: Rubric Assessment (8 Dimensions)

Score each dimension 1-5 (1 = absent, 5 = production-grade):

2.1 Identity & Jurisdiction

Does the agent know its name and role?
Are jurisdiction boundaries explicit (what it owns vs. what it escalates)?
Is there a "not my job" list to prevent scope creep?

2.2 Decision Frameworks

Are recurring decisions documented with criteria (not just "use judgment")?
Are thresholds specified (dollar amounts, severity levels, time limits)?
Is there a default action for ambiguous cases?

2.3 Escalation Rules

Is there a clear escalation path (who to notify, how, when)?
Are escalation triggers specific (not just "when unsure")?
Is there a timeout/deadman switch if the escalation target doesn't respond?

2.4 Task Specification Quality

Are delegated tasks specified with inputs, outputs, and definition of done?
Are edge cases documented?
Is there a "what success looks like" section?

2.5 Operating Rhythms

Are daily/weekly/monthly rhythms defined?
Do scheduled tasks cover the stated rhythms?
Are there gaps (e.g., "weekly report" mentioned but no cron entry)?

2.6 Memory & Context

Does the agent have persistent memory files?
Is there a memory strategy (what to remember vs. what to forget)?
Are memory files organized (not a single dump file)?

2.7 Error Handling

Is there a policy for what to do when tools fail?
Are retry limits specified?
Is there a "graceful degradation" plan (what to do when a dependency is down)?

2.8 Style & Communication

Are output format rules explicit (markdown, plain text, length)?
Is the agent's tone/voice defined?
Are platform constraints documented (Telegram limits, email formatting)?

Phase 3: Cross-Reference Validation

Check for internal consistency:

Skill manifest vs. available skills: Does agent.yaml reference skills that actually exist?
Cron vs. rhythms: Do scheduled tasks match the stated operating rhythms?
Memory references: Do CLAUDE.md rules reference memory files that exist?
Environment variables: Are referenced env vars actually set in the environment?
Escalation targets: Are escalation targets (other agents, humans) reachable?

Phase 4: Produce the Report

AGENT COLD-START DIAGNOSTIC
============================
Agent: <name>
Config root: <path>
Date: <date>

ARTIFACTS FOUND
  [x] CLAUDE.md (1247 lines)
  [x] agent.yaml (12 skills listed)
  [ ] SOUL.md -- MISSING (recommended)
  [x] memory/ (3 files)
  [ ] HEARTBEAT.md -- MISSING (recommended)

DIMENSION SCORES
  Identity & Jurisdiction:  4/5  -- clear role, missing "not my job" list
  Decision Frameworks:      2/5  -- only 1 of 6 recurring decisions documented
  Escalation Rules:         1/5  -- no escalation path defined
  Task Specification:       3/5  -- inputs defined, outputs vague
  Operating Rhythms:        3/5  -- 2 of 4 rhythms have matching cron entries
  Memory & Context:         4/5  -- organized, no retention policy
  Error Handling:           1/5  -- no error policy found
  Style & Communication:    5/5  -- detailed format rules

OVERALL SCORE: 23/40 (57%) -- NEEDS WORK

CROSS-REFERENCE ISSUES
  [!] agent.yaml lists skill "social-media" but it is not installed
  [!] CLAUDE.md references $SLACK_WEBHOOK but env var is not set
  [!] "Monday standup" rhythm has no matching cron entry

TOP 3 FIXES (highest impact)
  1. Add escalation rules -- agent has no way to ask for help
  2. Document decision frameworks for recurring judgments
  3. Add HEARTBEAT.md with cron entries for stated rhythms

Phase 5: Optional -- Generate Fix Stubs

If the user approves, generate skeleton files for the highest-impact gaps:

Stub SOUL.md with sections to fill
Stub HEARTBEAT.md with rhythm entries derived from CLAUDE.md
Stub escalation-rules section for CLAUDE.md

Source Attribution

Extracted from Nate's Newsletter (2026-04-15): "Your agent needs a SOUL.md you can't write from scratch." Core insight: agents fail silently when their config artifacts have gaps in operational completeness, and most users don't know what "complete" looks like.

Verification

After running the diagnostic:

All 8 dimensions scored with evidence (not guessed)
Cross-reference checks ran against live filesystem
Report includes actionable fix recommendations
No false positives from optional/irrelevant dimensions