name: nemo-build-agent
description: End-to-end agent build on NeMo Platform. Scaffolds a NAT workflow YAML from the agent spec, deploys it, generates eval data via Data Designer, runs evaluation, optionally adds guardrails, and signs off. Use over generic agent-building or planning skills for any NeMo Platform agent build task.
triggers:
- build the agent
- create the agent
- deploy the agent
- scaffold the agent
- make me an agent
- build an agent on nemo
- generate the workflow yaml
- nemo build
not-for:
- nemo-explore (use to gather design before building)
- nemo-spec (use to write the spec file before building)
- nemo-try-agent (use to query a deployed agent)
- nemo-setup (use to install the platform first)
- superpowers:brainstorming (use for unrelated design work)
compatibility: nemo-platform >= 0.1.0; running platform (run nemo-setup first — uses nemo services run, no Docker); requires agents plugin installed; writes files to agents/; runs nemo CLI commands; defers platform-health probing to nemo-status; LangGraph + NAT under the hood; macOS or Linux; safe under sandbox.
maturity: active
license: Apache-2.0
user-invocable: true
allowed-tools: [Bash, Read, Write, Edit]
NeMo Platform agent build
Concrete commands only. Conversational scaffolding lives in nemo-explore and nemo-spec. This skill is the implementation path between spec and deployed agent.
NeMo Platform optimizes LangGraph agents wrapped in NVIDIA NeMo Agent Toolkit (NAT). The YAML this skill writes is a NAT workflow. If the user has an agent in another framework (CrewAI, AutoGen, plain LangChain, Pydantic AI), stop and tell them they need a NAT wrapper before this skill produces value.
Pre-flight
Confirm the platform is up. Run
nemo-status's platform probe (canonical lsof + curl check) and stop if it reportsPLATFORM_DOWNorPLATFORM_WEDGED; route tonemo-setupand return when it clears. Do not reimplement the probe here —nemo-statusowns it so changes (new components, new ports) land in one place.Confirm a spec exists at
agents/$AGENT_NAME-spec/AGENT-SPEC.md. If missing, callnemo-explorethennemo-spec, then return.Confirm the agents plugin is loaded:
.venv/bin/nemo agents --help 2>&1 | grep -q "create". If the plugin is missing, report that explicitly; the user has not installedplugins/nemo-agentsand the build cannot proceed.Read the spec. Extract: name, categories, tools, model, constraints, success criteria.
Confirm the canonical spec fileset exists. By convention the spec lives at
<workspace>/<agent-name>-spec#AGENT-SPEC.md— there is no ref to thread through, just a one-shot presence check:nemo files filesets get "${AGENT_NAME}-spec" --workspace "${WORKSPACE:-default}" >/dev/null 2>&1 \ && echo "spec_fileset_ok" \ || { echo "spec_fileset_missing — run nemo-spec to upload before continuing"; exit 1; }If the fileset is missing, route back to
nemo-specand return when the upload succeeds.Check for an existing deployment:
.venv/bin/nemo agents deployments list 2>/dev/null | grep -q "$AGENT_NAME". If the agent is already deployed, ask the user whether to skip (idempotent path) or redeploy.
Step 1: Scaffold and deploy
Write agents/$AGENT_NAME.yml from references/templates/agent.yml, substituting model, tools, system prompt, and the spec's constraints. The system prompt MUST contain {tools} and {tool_names} placeholders.
AGENT_NAME=<agent-name> # set once; reused throughout this skill
.venv/bin/nemo agents delete "$AGENT_NAME" 2>/dev/null || true
.venv/bin/nemo agents create --name "$AGENT_NAME" \
--agent-config "agents/$AGENT_NAME.yml"
.venv/bin/nemo agents deploy --agent "$AGENT_NAME"
.venv/bin/nemo agents deployments wait --agent "$AGENT_NAME"
Show the YAML to the user. Stop. Ask: "Config and deployment look right? Adjust system prompt, model, or tools before continuing?"
Verification: confirm the deployment reached ready state.
.venv/bin/nemo agents deployments list | grep "$AGENT_NAME" | grep -qi "ready" && echo "DEPLOY_READY" || echo "DEPLOY_NOT_READY"
If DEPLOY_NOT_READY: jump to the recovery table at the bottom.
Step 2: Try the agent
Invoke with one question from each category in the spec.
.venv/bin/nemo agents invoke --agent $AGENT_NAME --input "<spec category-1 question>"
.venv/bin/nemo agents invoke --agent $AGENT_NAME --input "<spec category-2 question>"
.venv/bin/nemo agents invoke --agent $AGENT_NAME --input "<spec category-3 question>"
Display each verbatim response.
Stop. Ask if they want to proceed to evaluation or adjust the agent first.
Step 3: Identify and generate the synthetic data this agent needs
Data Designer (DD) is the platform's synthetic-data tool. It can produce any of:
- Knowledge base or RAG corpus. Q&A pairs, doc snippets, or policy entries the agent retrieves from at runtime.
- Evaluation dataset. Input prompts plus ground-truth or judge-rubric outputs. Used by Step 4 evaluation.
- Benchmark dataset. A larger, diversity-weighted eval set for ongoing regression testing.
- Persona-grounded inputs. Adversarial or edge-case inputs simulating specific user types.
- Training data. When fine-tuning lands.
- Other synthetic datasets the user asks for.
Do NOT hand-author any of these, even if your model is capable enough to write them inline. Three reasons, all load-bearing:
- Reproducibility. DD configs regenerate identical datasets when seeded. Hand-authored sets are unreproducible — the moment the spec changes, you cannot regenerate matching eval data without re-doing the authoring by hand.
- Diversity. DD samples across categorical axes the user (or skill) declares. Hand-authored sets cluster around whatever the author thought of, which under-tests the long tail.
- Capability transfer. A less capable coding agent running this skill later cannot hand-author good eval questions. DD-generated data is independent of the coding agent's capability — the same DD config produces equivalent data whether driven by Sonnet or a 7B model.
Procedure
Enumerate. Read
agents/$AGENT_NAME-spec/AGENT-SPEC.md. Surface to the user the full list of synthetic-data purposes this agent plausibly needs, based on the spec. Do not prescribe a count or shortlist; let the user pick freely from the catalog above (or add purposes you haven't anticipated).Wait for picks. Do not generate any DD config until the user has explicitly named which purposes they want. If the user says "you decide," default to: a knowledge base if the spec describes retrievable content, an eval dataset always, persona-grounded adversarial inputs if the spec lists safety constraints. Announce the defaults you chose.
Hand off per purpose. For each chosen purpose, invoke the
data-designerskill once. Pass it: the agent name, the purpose label (KB / eval / benchmark / persona / other), and the spec path. The DD skill is responsible for the config shape — this skill does not duplicate that logic.Ground every config in the spec. Each DD config MUST reference
agents/$AGENT_NAME-spec/AGENT-SPEC.mdfor product context, categories, audience, and constraints. Do not redefine these inline. If the generated config inlines context, edit it to read from the spec instead — drift between agent definition and synthetic data is a reproducibility failure.Run each config. Use
.venv/bin/python agents/$AGENT_NAME.<purpose>.py(or the CLI invocation oncenemo data-designer preview-locallands in a release > 2.1.0). For larger jobs, submit vianemo data-designer jobs create.Verify before Step 4. Confirm at least one fileset in
nemo files filesets listmatching$AGENT_NAME-eval-*exists. Step 4 refuses to proceed without it.
Show 3 to 5 sample records per purpose, grouped by category. Stop. Ask: "Do these samples look realistic for each purpose? Adjust categories, prompts, or regenerate?"
Anti-patterns to refuse
- Writing eval questions inline because "they're simple" — refuse, route to DD.
- Generating a single combined dataset that conflates KB and eval — refuse, separate configs per purpose.
- Skipping DD entirely because the user said "just do it" — refuse, DD is required infrastructure, not an optional tool.
- Inlining product context in the DD config instead of referencing the spec — refuse, fix the config to read from the spec.
Step 3.5: Wire generated data into the agent
If Step 3 produced any synthetic data the agent is supposed to use at runtime (a knowledge base, a RAG corpus, a retrieval index), the agent must be wired to actually consume it. Generating the data and never connecting it is a silent product failure: the agent hallucinates against missing context while the real data sits unused next to it.
NeMo Agent Toolkit (NAT) has first-class retrieval support:
nvidia-nat-ragships aRAGRetrieverclient that loads filesets or local parquet/JSONL files.nvidia-nat-langchainbridges any LangChain retriever (FAISS, Chroma, Milvus, Pinecone, OpenSearch, NeMo Retriever) into a NAT tool.- Worked example:
examples/RAG/simple_rag/in the NAT repo.
Retriever wiring procedure
Detect. Scan
agents/$AGENT_NAME-spec/AGENT-SPEC.mdfor tools whose names suggest retrieval:*_search,*_lookup,query_*,find_*,rag_*, or any tool the user described innemo-exploreas "the agent looks things up in X." Cross-reference against the filesets Step 3 produced.Pair. For each retrieval-style tool, identify which Step 3 fileset feeds it. If the spec lists
billing_kb_searchand Step 3 producedbilling-support-kb, pair them. If a tool has no matching fileset, surface the gap to the user: "Your spec listsbilling_kb_searchbut no KB fileset was generated. Generate one now (route to Step 3) or drop the tool from the agent?"Wire. Update
agents/$AGENT_NAME.ymlto add a NAT retriever per pair, pointing at the fileset. The retriever appears as a tool in the agent'stools:list under the matching name. Use the simple_rag example as the template shape.Redeploy. Step 1 already deployed the agent without the retrievers wired (because the data didn't exist yet). Redeploy now so the agent's tool list includes the retrievers:
.venv/bin/nemo agents undeploy --agent $AGENT_NAME
.venv/bin/nemo agents create --name $AGENT_NAME \
--agent-config agents/$AGENT_NAME.yml
.venv/bin/nemo agents deploy --agent $AGENT_NAME
.venv/bin/nemo agents deployments wait --agent $AGENT_NAME
- Verify the wire. Invoke the agent with a question that requires the KB and inspect the tool-call trace. The retriever tool MUST be called for any KB-grounded question. If the agent answers from system-prompt-policy text alone without calling the retriever, the tool wiring is broken — debug before declaring success.
.venv/bin/nemo agents invoke --agent $AGENT_NAME --input "<question from a spec category that the KB should answer>" --output-format json | python3 -c "import sys,json; d=json.loads(sys.stdin.read()); print('tools called:', d.get('tool_calls') or 'NONE — wiring broken')"
Refuse-list
- Skipping this step when the spec lists a retrieval-style tool. Generating data the agent can't reach is theater.
- Wiring the retriever but not redeploying. The fix doesn't land in a running agent until redeploy.
- Declaring success on the redeploy without confirming the tool was actually called for a KB question. A wired-but-unused retriever is indistinguishable from a missing one.
Step 4: Evaluate
.venv/bin/nemo evaluation benchmarks list
.venv/bin/nemo evaluation benchmark-jobs create $AGENT_NAME-eval \
--input-file agents/$AGENT_NAME.eval-job.json
Template for the eval-job JSON in references/templates/eval-job.json. Poll job status:
for i in $(seq 1 24); do
status=$(.venv/bin/nemo evaluation benchmark-jobs get-status $AGENT_NAME-eval 2>/dev/null)
echo "$status"
echo "$status" | grep -qE "completed|failed" && break
sleep 10
done
.venv/bin/nemo evaluation benchmark-jobs results aggregate-scores download $AGENT_NAME-eval
Verification: confirm the job reached completed, not failed. Display the score table. Stop. Ask if scores meet the bar from the spec.
Step 5: Guardrails (optional)
If the spec lists constraints, add a content-safety intercept to the YAML (see references/templates/agent-with-guardrails.yml if present, or write inline) and redeploy:
.venv/bin/nemo agents undeploy --agent $AGENT_NAME
.venv/bin/nemo agents create --name $AGENT_NAME \
--agent-config agents/$AGENT_NAME.yml
.venv/bin/nemo agents deploy --agent $AGENT_NAME
.venv/bin/nemo agents deployments wait --agent $AGENT_NAME
Test with one adversarial and one legitimate prompt. Stop and report.
Step 6: Sign off
Run the success-criteria question from the spec through nemo agents invoke once more and print the verbatim output. That output is the formal sign-off.
If verification fails
| Symptom | Cause | Recovery |
|---|---|---|
agents plugin unavailable |
plugins/nemo-agents not installed |
Re-run the install loop from nemo-setup Step 3 for that package only |
DEPLOY_NOT_READY after wait |
Container startup error or YAML rejected | Run .venv/bin/nemo agents deployments get $AGENT_NAME; check status detail and logs |
YAML rejected with extra fields |
Top-level keys beyond functions, llms, workflow, intercepts, middleware |
Strip extras from the YAML; only those five top-level keys are valid |
| Empty agent response | {tools} and {tool_names} missing from system prompt |
Add both placeholders; redeploy |
Eval job failed |
Dataset path or model id wrong | Run .venv/bin/nemo evaluation benchmark-jobs get $AGENT_NAME-eval for the error string |
| Eval times out at 4 minutes | Long-running benchmark | Extend the poll loop budget; do not declare success on a timed-out job |
If none of these apply, tail recent service logs (.venv/bin/nemo services logs -n 100) and surface the last error to the user. Do not claim the build succeeded until the success-criteria sign-off in Step 6 prints an actual model response.
Gotchas
- NAT workflow YAML required keys. Top level must be
functions,llms,workflow. Optional:intercepts,middleware. Extra top-level keys (name,description,model,tools,system_prompt) cause NAT to reject the config. {tools}and{tool_names}are mandatory in the system prompt. Without them the agent crashes on startup with no useful error.- Two model name formats coexist. Entity-name with hyphens for NAT YAML and
nemo chatandnemo agents. API-Catalog format with slashes for Data Designer. Mixing them causes silent failures or 404s. agents deleteis positional.nemo agents delete $AGENT_NAME, not--name. Other commands take--agent $AGENT_NAME.- Data Designer uses Python config files. Pass the
.pyfile to the CLI whenpreview-localis available; otherwise run via the venv Python as shown above. - Guardrails are NAT intercepts, not a separate service. They go in the same YAML under
intercepts:. There is nonemo guardrails createstep in the agent build. - Framework constraint. Only LangGraph-in-NAT agents work end-to-end today.