name: implement-agent-tests description: Implements tests for the LangGraph AI agent including graph routing tests with FakeChatModel, multi-turn conversation flow tests against staging with real Gemini, tenant isolation tests, WhatsApp routing tests, and LangSmith production tracing setup. Use when asked to add agent tests, test LangGraph routing, test AI conversation flows, test tenant isolation, add LangSmith tracing, or test WhatsApp agent routing. Corresponds to tasks T9, T10, T11 in docs/test/IMPLEMENTATION.md.
Implement Agent Tests
Tests for the AI agentic commerce system built on LangGraph + Gemini. Covers three concerns: graph structure correctness (fast, no real LLM), conversation quality (staging, real Gemini), and production observability (LangSmith).
Project Documentation References
- @testing-strategy.md - Testing patterns and strategies
- STRATEGY.md - AI Agent Testing section
- IMPLEMENTATION.md - Task T9, T10, T11 prompts and acceptance criteria
- ARCHITECTURE.md - Agent architecture overview
- TOOLS_REFERENCE.md - Tool definitions and contracts
- WHATSAPP_INTEGRATION.md - WhatsApp routing logic
Context
Vendin is an agentic e-commerce platform. The AI agent is a first-class product feature — it allows customers to shop and merchants to manage their store through natural language via chat or WhatsApp. The agent runs on LangGraph with Gemini, stores conversation state in Redis (keyed by {tenantId}:{role}:{threadId}), and calls MedusaJS modules as tools.
Three Test Types
T9 — Graph Routing Tests (fast, no real LLM)
Can start immediately. No staging needed.
Use FakeChatModel from @langchain/core/utils/testing — already installed, no new deps.
Files to read first:
packages/medusa/plugins/ai/agent/src/graph/index.ts
packages/medusa/plugins/ai/agent/src/graph/state.ts
packages/medusa/plugins/ai/agent/src/__tests__/service.spec.ts
Key pattern — inject fake model:
import { FakeChatModel } from "@langchain/core/utils/testing";
import { MemorySaver } from "@langchain/langgraph";
import { AIMessage, HumanMessage, ToolMessage } from "@langchain/core/messages";
// Simulates model returning a tool call
const modelWithToolCall = new FakeChatModel({
responses: [
new AIMessage({
content: "",
tool_calls: [{ id: "t1", name: "search_products", args: { query: "red shoes" } }],
}),
],
});
// Inject into graph — requires createAgentGraph to accept optional model param
const graph = await createAgentGraph(mockContainer, {
model: modelWithToolCall,
checkpointer: new MemorySaver(), // no Redis in tests
});
What to test:
- Graph routes to
toolsnode when model returns a tool call → routes to__end__after tool result - Graph ends at
__end__when model returns plain text (no tool calls) - Customer graph does not expose admin tools
- Two different
thread_idconfigs produce independent state (no bleed) - System prompts: customer prompt has no admin content, admin prompt is a superset
Target: < 100ms per test. Must pass with GEMINI_API_KEY unset.
T10 — Conversation Flow Tests (staging, real Gemini)
Requires T2 (demo store seeded with products).
Create at tests/agent/ in the monorepo root. Separate vitest config with 60s timeout.
What to test:
Customer shopping flow — multi-turn, verifies tool chaining:
- Turn 1: "Show me products" → assert
search_productswas called - Turn 2: "Add the first one to my cart" → assert
get_or_create_cart+add_item_to_cartcalled
- Turn 1: "Show me products" → assert
Ambiguous input — agent asks for clarification, does not hallucinate:
- "I want the blue one" with no prior context → response asks "Which product?"
Not-found graceful handling:
- Request a product SKU that does not exist → helpful error, no invented data
Tenant isolation (can use MemorySaver — no real Redis needed):
- Spin two instances with different
tenantIdbut samethreadId - Verify messages of tenant A are absent from tenant B's state
- Verify
search_productsfor tenant A returns only tenant A's catalogue
- Spin two instances with different
WhatsApp routing (control-plane):
- POST mock webhook
to= demo store phone → routes to tenant withrole=customer - POST mock webhook
to= Vendin admin number,from= registered admin phone →role=admin
- POST mock webhook
Add "test:agent": "vitest run tests/agent --config tests/agent/vitest.config.ts" to root package.json.
T11 — LangSmith Production Tracing
Can start immediately. No code changes — only config.
LangGraph auto-instruments when these env vars are present:
LANGSMITH_TRACING=trueLANGSMITH_API_KEY=<key>LANGSMITH_PROJECT=vendin-production
Steps:
- Add
LANGSMITH_API_KEYto GCP Secret Manager and reference it indeploy-tenant-instance.yml - Set
LANGSMITH_TRACING=falsein.env.example(opt-in for devs) - Create
docs/ai-agent/LANGSMITH.md— dashboard access, what traces show, how to debug
Do not install the langsmith npm package — auto-instrumentation via env vars is sufficient.
Constraints
- T9 must pass with no real API keys (Gemini, Redis) — use mocks and MemorySaver
- T10 tests that call Gemini must run in staging only, never in unit CI
- T11 is config-only — no changes to agent business logic
- Tenant isolation test is the highest-priority test in this skill — a regression here is a data breach