design-for-agents - SKILL.md Agent Skill

name: design-for-agents description: > Design skills, CLIs, docs, and onboarding flows for AI coding agents. Use when writing agent skills, authoring SKILL.md files, designing CLI tools agents will use, creating onboarding READMEs, or reviewing any artifact where the primary consumer is a coding agent. Do NOT use for general technical writing or user-facing documentation.

Design for Agents

Principles for designing artifacts — skills, CLIs, documentation, onboarding — where the primary consumer is an AI coding agent.

These principles were refined through building and testing Arize skills, studying evals-skills, and running E2E tests that exposed where agents actually break.

Core Principle

The agent is capable. It knows general programming, standard tools, and common concepts. It does NOT know your domain-specific CLI flags, data shapes, gotchas, and multi-step workflows. Write only what it can't figure out on its own.

Writing Skills

Write directives, not wisdom

The agent needs to be told what to do, not why.

Bad: "It's important to note that the Arrow Flight export path returns a different JSON shape than the REST path, which can cause confusion."

Good: "REST exports return nested JSON (context.trace_id). Flight exports (--all) return flat dot-separated keys (context.trace_id). Do not assume the same shape across both paths."

Cut general knowledge

Only include information the agent wouldn't already know. It knows what traces, datasets, JSON, CSV, and Parquet are. It knows jq, Python, and shell scripting.

Cut:

Definitions of standard concepts ("A trace is a tree of spans...")
General prompt engineering advice
Workflow motivation ("The typical flow is...")
Framework comparisons or tool descriptions

Keep:

Exact CLI syntax with required and optional flags
Data shape examples showing real JSON structure
Domain-specific gotchas (truncated IDs, stdout pollution, silent defaults)
Escalation rules (when to switch from one approach to another)

Scope every sentence

Ask: "Does this sentence help the agent run a command, process output, or avoid a known mistake?" If not, cut it.

Start with the default, then escalate

Show the simplest working command first. Show advanced options only when the default fails or the task requires them.

Good:

## Export Spans

\`\`\`bash
ax spans export PROJECT --trace-id TRACE_ID
\`\`\`

If the export returns exactly 500 spans (the REST page limit), re-run with `--all`:

\`\`\`bash
ax spans export PROJECT --all
\`\`\`

Bad: Listing REST and Flight as equivalent options with a comparison table before showing any command.

Be concrete about data shapes

Vague descriptions cause downstream parsing errors. Show exact JSON.

Bad: "The output is a JSON array of example objects with user-defined fields."

Good:

{
  "id": "ex_001",
  "additional_properties": {
    "question": "What is 2+2?",
    "answer": "4"
  }
}

Convert warnings into directives

Instead of explaining why something is bad, tell the agent what to do (or not do).

As inline directives:

"Use -o json to extract IDs — table output truncates them"
"Always pass --output-dir .arize-tmp-traces to keep exports gitignored"

As one-line anti-patterns at the end of a skill:

Parsing IDs from table output (truncated with ...)
Assuming --stdout JSON from --all is parseable without redirecting stderr
Including id, created_at in create/append payloads

Anti-patterns should be one line each. If it takes a paragraph, convert it to a directive.

Skill Structure

---
name: skill-name
description: >
  What this skill does. Use when [triggers]. Do NOT use when
  [exclusions — reference sibling skills by name].
---

# Skill Title

One-line summary.

## Overview
[When to use, high-level procedure]

## Prerequisites
[Link to shared prereqs or inline if unique]

## Core Commands
[The actual CLI commands, organized by operation]

## Workflows
[Multi-step recipes chaining commands together]

## Anti-Patterns
[One-line list of domain-specific mistakes]

Not every skill needs every section. Required: YAML frontmatter with name and description, an Overview, and core instructions.

Description frontmatter

The description field controls when the agent activates the skill. Write it as natural language with:

What it does — one sentence
Use when — trigger conditions
Do NOT use when — exclusions with cross-references to sibling skills

Good:

description: >
  Export traces and spans from Arize. Use when the user needs trace
  data for debugging or analysis. Do NOT use for creating datasets
  (use arize-dataset) or setting up tracing (use arize-instrumentation).

Bad:

description: "INVOKE THIS SKILL when downloading Arize traces."

Why: "INVOKE THIS SKILL" is noise, no exclusions means overlapping activation, no sibling cross-references.

Keep files under 500 lines

Split reference material into separate files one level deep:

skills/my-skill/
  SKILL.md              # Core directives (<500 lines)
  column-reference.md   # Reference tables

Do not nest references (SKILL.md -> ref.md -> detail.md).

Extract shared content

Content duplicated across 3+ skills belongs in a shared directory:

skills/_shared/
  prerequisites.md
  credentials.md
  troubleshooting.md

Designing CLIs for Agents

When building or improving CLI tools that agents will use:

Machine-readable output is not optional

Default to human-readable (tables), but always support -o json
JSON output must be parseable without cleaning — no progress bars, no status messages mixed into stdout
Wrap JSON in a top-level dict with the resource name as key ({"datasets": [...]}) — be consistent

Don't truncate IDs

Table output that truncates base64 IDs (RGF0YXNldDoz...) forces agents into -o json for every chained command. Either show full IDs in tables or add --no-truncate.

Non-interactive alternatives

Interactive wizards (profiles create) block CI/CD and agent use. Every interactive command should have either:

Flag-based equivalents (--api-key, --format)
A documented file-based alternative (write TOML/YAML directly)
Environment variable support (--from-env)

Stderr discipline

When --stdout or -o json is active, ALL non-data output (progress bars, status messages, warnings) must go to stderr. Test this specifically for all output backends (REST, Arrow Flight, etc.).

Helpful empty results

When a query returns nothing, hint at why:

"No spans found in the last 30 days. Use --days N to widen the window."
Not just [] with no context.

Consistent shapes

If the same resource can be exported via different backends (REST vs Flight, file vs stdout), the JSON shape should be identical or the difference should be flagged with a --shape option.

Onboarding & READMEs

Lead with a single copy-paste prompt

The most effective onboarding is one prompt the user pastes into their agent. It should combine installation + a first task:

Good (from evals-skills):

Install the eval skills plugin from https://github.com/hamelsmu/evals-skills, then run /evals-skills:eval-audit on my eval pipeline.

Good (instrumentation flow):

Follow the instructions from https://arize.com/docs/PROMPT.md and ask me questions as needed.

The prompt should be a blockquote so it's visually distinct and easy to copy.

Separate paths for separate audiences

Don't make one onboarding flow for everyone. Split by experience level:

## New to X? Start Here

**Setting up for the first time** — give your agent this prompt:
> [install + first task prompt]

**Already set up?** Give your agent this prompt:
> [install skills + diagnostic/debug prompt]

README structure for agent-consumed repos

# Project Name

[1-2 sentence hook — what this does for you, not what it is]
[1 sentence credibility/context]

## New to X? Start Here
[Copy-paste prompts for 2-3 audience segments]

## Installation
[Multiple methods: npx, git clone, plugin]

## Prerequisites
[CLI install, auth options with CI/CD note, verify step]

## Available Skills
[Table with links to SKILL.md files]

## Reference
[Flags, updating, links]

Auth should show three options

Always document authentication as:

Environment variables — for CI/CD and quick start
Interactive setup — for persistent local use
Config file — for scripted/non-interactive setup

Note which options are interactive-only and provide workarounds.

Always include a verify step

tool --version && tool auth-check 2>&1

Users (and agents) need a single command to confirm everything works.

Common Gotchas to Document

These are real issues discovered through E2E testing. Any skill or CLI that touches these areas should address them:

Gotcha	Impact
Table output truncates IDs	Agents can't chain commands without `-o json`
Nested data (`additional_properties`)	Downstream parsing breaks if you assume flat structure
Different backends return different JSON shapes	Silent incompatibility between REST and Flight
Progress bars leak to stdout	JSON parsing fails when `--stdout` + `--all`
Silent time-window defaults (`--days 30`)	Empty results with no explanation
Interactive-only commands	Blocks CI/CD and agent automation
`-o json` wraps in a dict, not a flat list	`data[:5]` fails, need `data["resources"][:5]`

Testing

After writing or revising a skill:

Start a fresh agent session (no prior context)
Ask the agent to perform a realistic task covered by the skill
Verify it runs the right commands with correct flags
Check that it handles empty results, errors, and edge cases
Confirm anti-patterns are avoided (e.g., agent doesn't parse IDs from table output)
Test the onboarding prompt end-to-end — does it actually work when pasted cold?