name: spring-ai description: >- Build Spring AI application features with ChatClient, prompt templates, structured output, tool calling, advisors, chat memory, embeddings, vector stores, RAG, and MCP integration. Use when configuring chat models, wiring vector store backends, designing RAG pipelines, or integrating AI tool calling into Spring services.
Spring AI
Boundaries
Use spring-ai for model-facing application seams, retrieval flow, Spring-managed AI integration, and provider-neutral model abstractions.
- Non-AI message routing, adapters, and Enterprise Integration Patterns are outside this skill's scope.
- Keep provider SDK details at the configuration edge. Application services should depend on Spring AI abstractions such as
ChatClient,EmbeddingModel,VectorStore,ImageModel,TranscriptionModel,TextToSpeechModel, orModerationModel. - Keep business rules outside prompts and outside tool implementations. Spring AI should orchestrate model interaction, not replace core domain logic.
Official surface map
Use this map to keep the official Spring AI surface visible without pushing the common path into references/.
| Surface | Start here when | Open a reference when |
|---|---|---|
| Chat + prompt templates | The feature reads text and returns text or structured data | Provider fit or model capability is the blocker in references/provider-selection-and-model-capability-fit.md |
| Structured output | Downstream code needs fields, records, or typed objects | Upgrade or provider behavior changes the output contract in references/upgrade-notes-and-migration-branches.md |
| Tool calling | The model may request a narrow, side-effect-safe application capability | Sequential tool choreography is the blocker in references/advanced-tool-orchestration.md, tool-set curation is the blocker in references/tool-set-curation.md, or fallback policy is the blocker in references/tool-failure-and-fallback.md |
| Advisors + chat memory | Requests need prompt decoration, recursive advisor behavior, history, token-window control, reasoning augmentation, or content safety | Advisor ordering or persistent memory is the blocker in references/advisors-memory-and-conversation-state.md |
| RAG + vector stores | The answer must use retrieved enterprise context | ETL pipeline, ingestion, chunking, embeddings, store choice, or advanced RAG flow design is the blocker in references/rag-pipeline-and-vector-store-decisions.md |
| MCP | Tools or prompts cross a process or service boundary | Client/server choice or transport setup is the blocker in references/mcp-client-server-boundaries.md |
| Vision + image generation | The feature must inspect images or generate images from prompts | Vision payload shape is the blocker in references/image-generation-and-vision-inputs.md, multiple-image comparison is the blocker in references/multiple-image-comparison.md, or image-model output is the blocker in references/image-generation.md |
| Audio transcription + speech | The feature transcribes audio or returns synthesized speech | Transcription or TTS configuration is the blocker in references/audio-transcription-and-speech-output.md |
| Moderation | The application needs input or output safety gates | Moderation placement or category thresholds are the blocker in references/moderation-and-safety-gates.md |
| Effective agents | One bounded workflow must route, chain, plan, or iteratively refine work | Routing is the blocker in references/routing-workflow.md, chaining is the blocker in references/chain-workflow.md, stepwise planning is the blocker in references/planning-and-stepwise-execution.md, or loop bounds are the blocker in references/loop-bounds-and-iteration-control.md |
| Evaluation + testing | Prompt, retrieval, or tool behavior needs repeatable checks | Evaluation harness design is the blocker in references/testing-and-evaluation-harnesses.md |
| Usage + observability | You need token accounting, latency, tracing, or production debugging | Telemetry or incident diagnosis is the blocker in references/observability-and-production-debugging.md |
| Local development infra | You need Docker Model Runner, development-time services, Testcontainers, local models, vector stores, or containerized dev services | Local model runtime is the blocker in development services and infra, local vector store provisioning is the blocker in local vector store setup, or full containerized bootstrap is the blocker in containerized development environment |
| Upgrade and migration | Version changes alter starters, APIs, defaults, or provider behavior | Upgrade mechanics are the blocker in references/upgrade-notes-and-migration-branches.md |
Common path
The ordinary Spring AI job is:
- Pin one Spring AI BOM version and add only the starters needed for the first production use case.
- Start with one provider-neutral
ChatClientseam around an application service. - Use prompt templates and structured output before adding tools, memory, or retrieval.
- Expose only narrow, side-effect-safe tools when the plain prompt path is already correct.
- Add advisors or chat memory only when the use case needs request decoration or multi-turn continuity.
- Add RAG only after the non-RAG path is testable and the retrieval boundary is explicit.
- Add image, audio, moderation, MCP, or effective-agent workflows only for concrete blockers, not by default.
- Validate prompt assembly, output mapping, tool safety, conversation scoping, retrieval behavior, token usage, and production telemetry before rollout.
Dependency baseline
Spring AI 2.0.x supports Spring Boot 4.0.x and 4.1.x.
Import the Spring AI BOM and add only the starters needed for the current model and optional retrieval path.
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>2.0.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-validation</artifactId>
</dependency>
</dependencies>
Add retrieval, image, audio, moderation, or MCP starters only when that surface is part of the current job. Open references/upgrade-notes-and-migration-branches.md when the target Spring AI version differs from the version pinned here.
First safe setup
Minimal provider properties
spring:
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
model: gpt-4o-mini
Start with one provider, one model, and one environment-backed secret. Open references/provider-selection-and-model-capability-fit.md when model family, context window, latency, cost, or provider fit is still unclear.
Provider-neutral ChatClient seam
@Configuration
class AssistantAiConfiguration {
@Bean
ChatClient releaseChatClient(ChatClient.Builder builder) {
return builder.defaultSystem("You summarize release changes for backend engineers.").build();
}
}
Keep the application seam on ChatClient or another Spring AI abstraction. Do not let controllers or domain code depend directly on a provider SDK.
Prompt templating and structured output
Use prompt templates before introducing tools, memory, or retrieval. Keep variables explicit and keep prompt text reviewable in code.
record ReleaseSummary(String version, List<String> breakingChanges, List<String> actions) {}
@Service
class ReleaseSummaryService {
private final ChatClient chatClient;
ReleaseSummaryService(ChatClient chatClient) {
this.chatClient = chatClient;
}
ReleaseSummary summarize(String releaseNotes) {
return chatClient.prompt()
.user(user -> user.text("Summarize the release notes and list required migration actions. Notes: {notes}").param("notes", releaseNotes))
.call()
.entity(ReleaseSummary.class);
}
}
- Keep prompt variables named and explicit.
- Put reusable system instructions on the
ChatClientbuilder or a dedicated service seam. - Start with Spring AI mapping such as
.entity(...)so the application stays on Spring AI's portable output contract instead of binding ordinary flows to one provider's JSON mode. - Keep provider-native JSON modes as an optimization, not the default path, and open references/upgrade-notes-and-migration-branches.md when version or provider changes threaten the output contract.
Tool boundary
Add tools only when the model genuinely needs a bounded application capability.
@Component
class InventoryTools {
private final InventoryRepository inventoryRepository;
InventoryTools(InventoryRepository inventoryRepository) {
this.inventoryRepository = inventoryRepository;
}
@Tool(description = "Look up available inventory for a SKU")
InventorySnapshot inventoryForSku(String sku) {
return inventoryRepository.findSnapshotBySku(sku);
}
}
@Service
class ShippingAssistantService {
private final ChatClient chatClient;
private final InventoryTools inventoryTools;
ShippingAssistantService(ChatClient chatClient, InventoryTools inventoryTools) {
this.chatClient = chatClient;
this.inventoryTools = inventoryTools;
}
String answer(String question) {
return chatClient.prompt()
.user(question)
.tools(inventoryTools)
.call()
.content();
}
}
record InventorySnapshot(String sku, int availableQuantity) {}
- Start with read-only or otherwise side-effect-safe tools.
- Treat tool selection as an application contract.
ToolCallingAdvisoris auto-registered when tools are configured (2.0.0+). Disable globally withspring.ai.chat.client.tool-calling.enabled=falseor per-call withAdvisorParams.toolCallingAdvisorAutoRegister(false).- Open references/advanced-tool-orchestration.md when one tool call must explicitly feed the next.
- Open references/tool-set-curation.md when the blocker is exposing only a curated tool set.
- Open references/tool-failure-and-fallback.md when the blocker is explicit fallback behavior after tool failure.
- Open references/mcp-client-server-boundaries.md when the tool boundary may need MCP instead of an in-process Spring bean.
Memory and retrieval escalation
Use advisors when the request or response must be decorated around the model call. Use ChatMemory through an advisor instead of manually appending prior turns.
Spring AI ships built-in advisors: MessageChatMemoryAdvisor (message history), VectorStoreChatMemoryAdvisor (vector store-backed memory retrieval), QuestionAnswerAdvisor (naive RAG), RetrievalAugmentationAdvisor (modular RAG with query transformation, document post-processing, and context augmentation), ReReadingAdvisor (re-reading reasoning improvement), and SafeGuardAdvisor (content safety gate). Use in-memory MessageWindowChatMemory for demos and tests; use repository-backed memory for production multi-session workloads.
@Bean
ChatClient supportChatClient(ChatClient.Builder builder, ChatMemory chatMemory) {
return builder.defaultAdvisors(MessageChatMemoryAdvisor.builder(chatMemory).build()).build();
}
String answer(ChatClient chatClient, String conversationId, String question) {
return chatClient.prompt()
.advisors(advisors -> advisors.param(ChatMemory.CONVERSATION_ID, conversationId))
.user(question)
.call()
.content();
}
@Service
class KnowledgeSearchService {
private final VectorStore vectorStore;
KnowledgeSearchService(VectorStore vectorStore) {
this.vectorStore = vectorStore;
}
List<Document> search(String question) {
return vectorStore.similaritySearch(SearchRequest.builder()
.query(question)
.topK(4)
.similarityThreshold(0.75)
.build());
}
}
- Keep the conversation identifier explicit at the call site.
- Use in-memory chat memory only for demos, tests, or single-instance transient flows.
- Add RAG only after the non-RAG path is correct and testable.
- Keep
EmbeddingModelas the portable seam for vector generation and treat the concreteVectorStoreimplementation as a deployment decision. - Open references/advisors-memory-and-conversation-state.md when advisor ordering, persistent memory repositories, token buffering, or conversation isolation becomes the blocker.
- Open references/rag-pipeline-and-vector-store-decisions.md when chunking, embeddings, metadata filters, vector-store choice, or advanced retrieval tuning is the blocker.
Secondary official surfaces
These surfaces are part of official Spring AI scope, but they are not on the ordinary path unless the use case requires them.
- Open references/image-generation-and-vision-inputs.md when the feature must attach single-image vision input to a chat request.
- Open references/multiple-image-comparison.md when the blocker is comparing or cross-referencing several images in one request.
- Open references/image-generation.md when the blocker is producing generated image artifacts instead of text.
- Open references/audio-transcription-and-speech-output.md when the feature must transcribe audio or synthesize speech.
- Open references/moderation-and-safety-gates.md when input or output moderation is required.
- Open references/routing-workflow.md when routing is the blocker.
- Open references/chain-workflow.md when one bounded model step must explicitly feed the next.
- Open references/planning-and-stepwise-execution.md when the task is too large for one safe pass and needs a bounded plan first.
- Open references/loop-bounds-and-iteration-control.md when iterative refinement needs an application-level bound.
Usage handling and observability
Treat token accounting as part of the application contract, not as an afterthought.
- Read
Usagefrom the finalChatResponsewhen cost, token budgets, or provider drift matter. - Record prompt, completion, and total token counts together with latency and tool or retrieval activity.
- Open references/observability-and-production-debugging.md when usage accounting, tracing, or production debugging becomes the blocker.
Minimal validation
Verify the first Spring AI path before expanding scope.
- Verify prompt assembly without a live provider where possible.
- Verify structured-output mapping for one valid and one invalid response shape.
- Verify tool methods behave like normal application APIs, including validation and authorization boundaries.
- Verify advisor and chat-memory scoping with an explicit conversation ID.
- Verify retrieval returns the expected documents and that empty-context behavior is explicit.
- Verify token usage, latency, and tool-call identity are observable in the final path.
Open references/testing-and-evaluation-harnesses.md when the task needs repeatable evaluation datasets, regression checks, or infrastructure-backed integration tests. Open references/observability-and-production-debugging.md when adding usage accounting, tracing, or production incident diagnostics.
Production guardrails
- Externalize provider credentials, endpoints, model names, moderation settings, and retrieval settings.
- Keep prompts, tool contracts, retrieval settings, and structured-output types versioned and reviewable.
- Put timeouts, retries, fallback behavior, and provider switching at the provider edge.
- Log latency, token usage, retrieval count, and tool usage without leaking secrets or personal data.
- Treat image, audio, and moderation model choices as explicit configuration, not ambient defaults.