name: evaluate-embabel description: Sets up evaluation of Embabel agents using Dokimos. Use this skill when the user wants to evaluate, test, or benchmark an Embabel agent, its tool calls, or its execution trace. Also use when the user mentions Embabel evaluation or integrating Dokimos with an Embabel project.
Evaluate Embabel
Set up Dokimos evaluation for an Embabel agent. The user will describe their agent and evaluation goals via $ARGUMENTS.
Requires Java 21 or later: Embabel's published artifacts are built for Java 21. The rest of Dokimos stays on Java 17.
Where things live
- Embabel support:
dokimos-embabel/src/main/java/dev/dokimos/embabel/EmbabelSupport.java - Trace collector:
dokimos-embabel/src/main/java/dev/dokimos/embabel/EmbabelTraceCollector.java - Maven dependency:
dev.dokimos:dokimos-embabel
Before writing code, read EmbabelTraceCollector.java to understand how events map to a trace.
How it works
Embabel reports tool calls through per-event AgenticEventListener callbacks during a run, not as a return value. EmbabelTraceCollector implements that listener and assembles an AgentTrace from the ToolCallResponseEvents it observes.
EmbabelSupport.attach(ProcessOptions, collector)— registers the collector onProcessOptions, returning new options to run with.EmbabelSupport.attach(AgentInvocation.Builder)— registers a fresh collector on an invocation builder and returns it.collector.trace()— materializes theAgentTraceafter the run.EmbabelSupport.toToolDefinitions(collector)— synthesizesToolDefinitions from the observed tool names. These carry an empty input schema, soToolDescriptionReliabilityEvaluatorcoverage is weakened; build the definitions by hand if you need full schema coverage.
The collector is single-run and not thread-safe. Reuse one instance only after calling reset().
Evaluation pattern
EmbabelTraceCollector collector = new EmbabelTraceCollector();
ProcessOptions options = EmbabelSupport.attach(new ProcessOptions(), collector);
AgentInvocation<String> invocation =
AgentInvocation.builder(platform).options(options).build(String.class);
invocation.invoke(input);
AgentTrace trace = collector.trace();
List<ToolDefinition> tools = EmbabelSupport.toToolDefinitions(collector);
EvalTestCase testCase = trace.toTestCase(input, tools);
var validity = ToolCallValidityEvaluator.builder().build().evaluate(testCase);
var correctness = ToolCorrectnessEvaluator.builder().build().evaluate(testCase);
Always construct evaluators with XEvaluator.builder()...build(); they have private constructors.
Reading tool results and arguments back typed
A captured ToolCall keeps its arguments as a Map and its result as the string Embabel returned. Read them typed with call.argumentsAs(MyArgs.class) and, when the result is JSON, call.resultAs(MyResult.class) (or OutputType for generics).
Dependencies
<dependency>
<groupId>dev.dokimos</groupId>
<artifactId>dokimos-embabel</artifactId>
<version>${dokimos.version}</version>
</dependency>
Embabel itself is a provided-scope dependency: the user brings their own version (com.embabel.agent:embabel-agent-api).
Steps
- Understand from
$ARGUMENTSwhat the Embabel agent does and which tools it calls - Confirm the project builds on Java 21
- Attach an
EmbabelTraceCollectorto the run and capturecollector.trace() - Convert to an
EvalTestCasewithtrace.toTestCase(input, tools) - Score with the agent evaluators (prefer deterministic ones for CI)
- For the full agent evaluator set, use the
evaluate-agentskill