name: evaluate-spring-ai description: Sets up evaluation of Spring AI applications using Dokimos. Use this skill when the user wants to evaluate, test, or benchmark a Spring AI app, ChatClient, RAG pipeline, or advisor chain. Also use when the user mentions Spring AI evaluation, Spring Boot LLM testing, or integrating Dokimos with Spring AI projects.
Evaluate Spring AI
Set up Dokimos evaluation for a Spring AI application. The user will describe their application and evaluation goals via $ARGUMENTS.
Where things live
- Spring AI support:
dokimos-spring-ai/src/main/java/dev/dokimos/springai/SpringAiSupport.java - Examples:
dokimos-examples/src/main/java/dev/dokimos/examples/springai/ - Full Spring Boot example:
dokimos-examples/src/main/java/dev/dokimos/examples/springai/tutorial/ - Maven dependency:
dev.dokimos:dokimos-spring-ai
Before writing code, read SpringAiSupport.java to understand the available utilities.
Key utilities
SpringAiSupport provides:
asJudge(ChatClient.Builder)— wraps a Spring AIChatClient.Builderinto aJudgeLMasJudge(ChatModel)— wraps aChatModeldirectly into aJudgeLMtoTestCase(EvaluationRequest)— converts Spring AI'sEvaluationRequestto DokimosEvalTestCasetoEvaluationResponse(EvalResult)— converts DokimosEvalResultback to Spring AIEvaluationResponse
Evaluation patterns
Simple ChatClient evaluation
@SpringBootTest
class MyChatEvaluationTest {
@Autowired
private ChatClient.Builder chatClientBuilder;
@Test
void evaluateChatbot() {
ChatClient chatClient = chatClientBuilder.build();
Task task = example -> {
String response = chatClient.prompt()
.user(example.input())
.call()
.content();
return Map.of("output", response);
};
JudgeLM judge = SpringAiSupport.asJudge(chatClientBuilder);
ExperimentResult result = Experiment.builder()
.name("Chatbot Evaluation")
.dataset(Dataset.fromJson(Path.of("src/test/resources/datasets/qa.json")))
.task(task)
.evaluator(LLMJudgeEvaluator.builder()
.name("answer-quality")
.judge(judge)
.criteria("Is the response helpful and accurate?")
.evaluationParams(List.of(
EvalTestCaseParam.INPUT,
EvalTestCaseParam.ACTUAL_OUTPUT,
EvalTestCaseParam.EXPECTED_OUTPUT))
.threshold(0.7)
.build())
.build()
.run();
}
}
RAG evaluation with advisors
Task task = example -> {
String input = example.input();
ChatClient.ChatClientRequestSpec request = chatClient.prompt().user(input);
request.advisors(new QuestionAnswerAdvisor(vectorStore));
String response = request.call().content();
List<Document> docs = vectorStore.similaritySearch(input);
List<String> context = docs.stream().map(Document::getText).toList();
return Map.of("output", response, "context", context);
};
Converting between Spring AI and Dokimos types
EvaluationRequest request = new EvaluationRequest(userText, documents, responseContent);
EvalTestCase testCase = SpringAiSupport.toTestCase(request);
EvalResult result = evaluator.evaluate(testCase);
EvaluationResponse response = SpringAiSupport.toEvaluationResponse(result);
Dependencies
<dependency>
<groupId>dev.dokimos</groupId>
<artifactId>dokimos-spring-ai</artifactId>
<version>${dokimos.version}</version>
</dependency>
Spring AI itself is a provided-scope dependency — the user must bring their own version.
Evaluating an agent, not just chat or RAG
If the Spring AI app calls tools, evaluate its tool calls with the agent evaluators. An AssistantMessage carries the tool calls the model made; the results come back in the ToolResponseMessages. SpringAiSupport.toAgentTrace(assistantMessage, toolResponseMessages) builds an AgentTrace (results matched to calls by tool-call id), and toToolDefinitions(defs) converts the tool definitions.
AgentTrace trace = SpringAiSupport.toAgentTrace(assistantMessage, toolResponseMessages);
List<ToolDefinition> tools = SpringAiSupport.toToolDefinitions(toolDefinitions);
EvalTestCase testCase = trace.toTestCase(userMessage, tools);
var validity = ToolCallValidityEvaluator.builder().build().evaluate(testCase);
For the full agent evaluator set, use the evaluate-agent skill.
Steps
- Understand from
$ARGUMENTSwhat the Spring AI application does - Determine if it's a simple ChatClient app or uses RAG advisors
- Choose appropriate evaluators for the use case
- Create a dataset matching the application's domain
- Wire evaluation using
SpringAiSupportutilities - For Spring Boot apps, set up tests with
@SpringBootTest