spring-ai

name: spring-ai description: >- Build Spring AI application features with ChatClient, prompt templates, structured output, tool calling, advisors, chat memory, embeddings, vector stores, RAG, and MCP integration. Use when configuring chat models, wiring vector store backends, designing RAG pipelines, or integrating AI tool calling into Spring services.

Spring AI

Boundaries

Use spring-ai for model-facing application seams, retrieval flow, Spring-managed AI integration, and provider-neutral model abstractions.

Non-AI message routing, adapters, and Enterprise Integration Patterns are outside this skill's scope.
Keep provider SDK details at the configuration edge. Application services should depend on Spring AI abstractions such as ChatClient, EmbeddingModel, VectorStore, ImageModel, TranscriptionModel, TextToSpeechModel, or ModerationModel.
Keep business rules outside prompts and outside tool implementations. Spring AI should orchestrate model interaction, not replace core domain logic.

Official surface map

Use this map to keep the official Spring AI surface visible without pushing the common path into references/.

Surface	Start here when	Open a reference when
Chat + prompt templates	The feature reads text and returns text or structured data	Provider fit or model capability is the blocker in references/provider-selection-and-model-capability-fit.md
Structured output	Downstream code needs fields, records, or typed objects	Upgrade or provider behavior changes the output contract in references/upgrade-notes-and-migration-branches.md
Tool calling	The model may request a narrow, side-effect-safe application capability	Sequential tool choreography is the blocker in references/advanced-tool-orchestration.md, tool-set curation is the blocker in references/tool-set-curation.md, or fallback policy is the blocker in references/tool-failure-and-fallback.md
Advisors + chat memory	Requests need prompt decoration, recursive advisor behavior, history, token-window control, reasoning augmentation, or content safety	Advisor ordering or persistent memory is the blocker in references/advisors-memory-and-conversation-state.md
RAG + vector stores	The answer must use retrieved enterprise context	ETL pipeline, ingestion, chunking, embeddings, store choice, or advanced RAG flow design is the blocker in references/rag-pipeline-and-vector-store-decisions.md
MCP	Tools or prompts cross a process or service boundary	Client/server choice or transport setup is the blocker in references/mcp-client-server-boundaries.md
Vision + image generation	The feature must inspect images or generate images from prompts	Vision payload shape is the blocker in references/image-generation-and-vision-inputs.md, multiple-image comparison is the blocker in references/multiple-image-comparison.md, or image-model output is the blocker in references/image-generation.md
Audio transcription + speech	The feature transcribes audio or returns synthesized speech	Transcription or TTS configuration is the blocker in references/audio-transcription-and-speech-output.md
Moderation	The application needs input or output safety gates	Moderation placement or category thresholds are the blocker in references/moderation-and-safety-gates.md
Effective agents	One bounded workflow must route, chain, plan, or iteratively refine work	Routing is the blocker in references/routing-workflow.md, chaining is the blocker in references/chain-workflow.md, stepwise planning is the blocker in references/planning-and-stepwise-execution.md, or loop bounds are the blocker in references/loop-bounds-and-iteration-control.md
Evaluation + testing	Prompt, retrieval, or tool behavior needs repeatable checks	Evaluation harness design is the blocker in references/testing-and-evaluation-harnesses.md
Usage + observability	You need token accounting, latency, tracing, or production debugging	Telemetry or incident diagnosis is the blocker in references/observability-and-production-debugging.md
Local development infra	You need Docker Model Runner, development-time services, Testcontainers, local models, vector stores, or containerized dev services	Local model runtime is the blocker in development services and infra, local vector store provisioning is the blocker in local vector store setup, or full containerized bootstrap is the blocker in containerized development environment
Upgrade and migration	Version changes alter starters, APIs, defaults, or provider behavior	Upgrade mechanics are the blocker in references/upgrade-notes-and-migration-branches.md

Common path

The ordinary Spring AI job is:

Pin one Spring AI BOM version and add only the starters needed for the first production use case.
Start with one provider-neutral ChatClient seam around an application service.
Use prompt templates and structured output before adding tools, memory, or retrieval.
Expose only narrow, side-effect-safe tools when the plain prompt path is already correct.
Add advisors or chat memory only when the use case needs request decoration or multi-turn continuity.
Add RAG only after the non-RAG path is testable and the retrieval boundary is explicit.
Add image, audio, moderation, MCP, or effective-agent workflows only for concrete blockers, not by default.
Validate prompt assembly, output mapping, tool safety, conversation scoping, retrieval behavior, token usage, and production telemetry before rollout.

Dependency baseline

Spring AI 2.0.x supports Spring Boot 4.0.x and 4.1.x.

Import the Spring AI BOM and add only the starters needed for the current model and optional retrieval path.

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>2.0.0</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

<dependencies>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-starter-model-openai</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-validation</artifactId>
    </dependency>
</dependencies>

Add retrieval, image, audio, moderation, or MCP starters only when that surface is part of the current job. Open references/upgrade-notes-and-migration-branches.md when the target Spring AI version differs from the version pinned here.

First safe setup

Minimal provider properties

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        model: gpt-4o-mini

Start with one provider, one model, and one environment-backed secret. Open references/provider-selection-and-model-capability-fit.md when model family, context window, latency, cost, or provider fit is still unclear.

Provider-neutral `ChatClient` seam

@Configuration
class AssistantAiConfiguration {
    @Bean
    ChatClient releaseChatClient(ChatClient.Builder builder) {
        return builder.defaultSystem("You summarize release changes for backend engineers.").build();
    }
}

Keep the application seam on ChatClient or another Spring AI abstraction. Do not let controllers or domain code depend directly on a provider SDK.

Prompt templating and structured output

Use prompt templates before introducing tools, memory, or retrieval. Keep variables explicit and keep prompt text reviewable in code.

record ReleaseSummary(String version, List<String> breakingChanges, List<String> actions) {}

@Service
class ReleaseSummaryService {
    private final ChatClient chatClient;

    ReleaseSummaryService(ChatClient chatClient) {
        this.chatClient = chatClient;
    }

    ReleaseSummary summarize(String releaseNotes) {
        return chatClient.prompt()
            .user(user -> user.text("Summarize the release notes and list required migration actions. Notes: {notes}").param("notes", releaseNotes))
            .call()
            .entity(ReleaseSummary.class);
    }
}

Keep prompt variables named and explicit.
Put reusable system instructions on the ChatClient builder or a dedicated service seam.
Start with Spring AI mapping such as .entity(...) so the application stays on Spring AI's portable output contract instead of binding ordinary flows to one provider's JSON mode.
Keep provider-native JSON modes as an optimization, not the default path, and open references/upgrade-notes-and-migration-branches.md when version or provider changes threaten the output contract.

Tool boundary

Add tools only when the model genuinely needs a bounded application capability.

@Component
class InventoryTools {
    private final InventoryRepository inventoryRepository;

    InventoryTools(InventoryRepository inventoryRepository) {
        this.inventoryRepository = inventoryRepository;
    }

    @Tool(description = "Look up available inventory for a SKU")
    InventorySnapshot inventoryForSku(String sku) {
        return inventoryRepository.findSnapshotBySku(sku);
    }
}

@Service
class ShippingAssistantService {
    private final ChatClient chatClient;
    private final InventoryTools inventoryTools;

    ShippingAssistantService(ChatClient chatClient, InventoryTools inventoryTools) {
        this.chatClient = chatClient;
        this.inventoryTools = inventoryTools;
    }

    String answer(String question) {
        return chatClient.prompt()
            .user(question)
            .tools(inventoryTools)
            .call()
            .content();
    }
}

record InventorySnapshot(String sku, int availableQuantity) {}

Start with read-only or otherwise side-effect-safe tools.
Treat tool selection as an application contract.
ToolCallingAdvisor is auto-registered when tools are configured (2.0.0+). Disable globally with spring.ai.chat.client.tool-calling.enabled=false or per-call with AdvisorParams.toolCallingAdvisorAutoRegister(false).
Open references/advanced-tool-orchestration.md when one tool call must explicitly feed the next.
Open references/tool-set-curation.md when the blocker is exposing only a curated tool set.
Open references/tool-failure-and-fallback.md when the blocker is explicit fallback behavior after tool failure.
Open references/mcp-client-server-boundaries.md when the tool boundary may need MCP instead of an in-process Spring bean.

Memory and retrieval escalation

Use advisors when the request or response must be decorated around the model call. Use ChatMemory through an advisor instead of manually appending prior turns.

Spring AI ships built-in advisors: MessageChatMemoryAdvisor (message history), VectorStoreChatMemoryAdvisor (vector store-backed memory retrieval), QuestionAnswerAdvisor (naive RAG), RetrievalAugmentationAdvisor (modular RAG with query transformation, document post-processing, and context augmentation), ReReadingAdvisor (re-reading reasoning improvement), and SafeGuardAdvisor (content safety gate). Use in-memory MessageWindowChatMemory for demos and tests; use repository-backed memory for production multi-session workloads.

@Bean
ChatClient supportChatClient(ChatClient.Builder builder, ChatMemory chatMemory) {
    return builder.defaultAdvisors(MessageChatMemoryAdvisor.builder(chatMemory).build()).build();
}

String answer(ChatClient chatClient, String conversationId, String question) {
    return chatClient.prompt()
        .advisors(advisors -> advisors.param(ChatMemory.CONVERSATION_ID, conversationId))
        .user(question)
        .call()
        .content();
}

@Service
class KnowledgeSearchService {
    private final VectorStore vectorStore;

    KnowledgeSearchService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    List<Document> search(String question) {
        return vectorStore.similaritySearch(SearchRequest.builder()
            .query(question)
            .topK(4)
            .similarityThreshold(0.75)
            .build());
    }
}

Keep the conversation identifier explicit at the call site.
Use in-memory chat memory only for demos, tests, or single-instance transient flows.
Add RAG only after the non-RAG path is correct and testable.
Keep EmbeddingModel as the portable seam for vector generation and treat the concrete VectorStore implementation as a deployment decision.
Open references/advisors-memory-and-conversation-state.md when advisor ordering, persistent memory repositories, token buffering, or conversation isolation becomes the blocker.
Open references/rag-pipeline-and-vector-store-decisions.md when chunking, embeddings, metadata filters, vector-store choice, or advanced retrieval tuning is the blocker.

Secondary official surfaces

These surfaces are part of official Spring AI scope, but they are not on the ordinary path unless the use case requires them.

Open references/image-generation-and-vision-inputs.md when the feature must attach single-image vision input to a chat request.
Open references/multiple-image-comparison.md when the blocker is comparing or cross-referencing several images in one request.
Open references/image-generation.md when the blocker is producing generated image artifacts instead of text.
Open references/audio-transcription-and-speech-output.md when the feature must transcribe audio or synthesize speech.
Open references/moderation-and-safety-gates.md when input or output moderation is required.
Open references/routing-workflow.md when routing is the blocker.
Open references/chain-workflow.md when one bounded model step must explicitly feed the next.
Open references/planning-and-stepwise-execution.md when the task is too large for one safe pass and needs a bounded plan first.
Open references/loop-bounds-and-iteration-control.md when iterative refinement needs an application-level bound.

Usage handling and observability

Treat token accounting as part of the application contract, not as an afterthought.

Read Usage from the final ChatResponse when cost, token budgets, or provider drift matter.
Record prompt, completion, and total token counts together with latency and tool or retrieval activity.
Open references/observability-and-production-debugging.md when usage accounting, tracing, or production debugging becomes the blocker.

Minimal validation

Verify the first Spring AI path before expanding scope.

Verify prompt assembly without a live provider where possible.
Verify structured-output mapping for one valid and one invalid response shape.
Verify tool methods behave like normal application APIs, including validation and authorization boundaries.
Verify advisor and chat-memory scoping with an explicit conversation ID.
Verify retrieval returns the expected documents and that empty-context behavior is explicit.
Verify token usage, latency, and tool-call identity are observable in the final path.

Open references/testing-and-evaluation-harnesses.md when the task needs repeatable evaluation datasets, regression checks, or infrastructure-backed integration tests. Open references/observability-and-production-debugging.md when adding usage accounting, tracing, or production incident diagnostics.

Production guardrails

Externalize provider credentials, endpoints, model names, moderation settings, and retrieval settings.
Keep prompts, tool contracts, retrieval settings, and structured-output types versioned and reviewable.
Put timeouts, retries, fallback behavior, and provider switching at the provider edge.
Log latency, token usage, retrieval count, and tool usage without leaking secrets or personal data.
Treat image, audio, and moderation model choices as explicit configuration, not ambient defaults.