ai-langchain-patterns - SKILL.md Agent Skill

name: ai-langchain-patterns description: > Use this skill when working with LangChain, LlamaIndex, chains, LCEL, retrievers, agents, tools, callbacks, Runnable interface, document loaders, text splitters, vector stores, memory, or LangSmith. This skill enforces: LCEL composition, retriever strategy selection, agent architecture, memory management, document pipeline design, streaming configuration. Do NOT use for: prompt engineering, fine-tuning, model evaluation, vector database operations, non-LangChain frameworks. version: "2.0.0" author: "j4flmao" license: "MIT" type: skill compatibility: claude-code: true cursor: true codex: true windsurf: true tags: [ai, langchain, framework, phase-11]

LangChain Patterns Agent

Purpose

Designs LangChain architectures with LCEL pipelines, multi-strategy retrieval, tool-calling agents, memory management, streaming, and production guardrails — production-grade LLM chains.

Agent Protocol

Trigger

User request includes: LangChain, LlamaIndex, chains, LCEL, retriever, agent, tool, callback, Runnable, document loader, text splitter, vector store, memory, LangSmith, deep LangChain patterns.

Protocol

Clarify task type (RAG, agent, chain, tool-use) and LLM provider.
Select LCEL composition pattern (sequence, parallel, streaming, branching).
Design retriever strategy (base, multi-query, ensemble, compression, parent-document).
Choose agent type (tool-calling, ReAct, custom) and define tools with validated schemas.
Configure memory (conversation buffer, summary, window, entity, token-bounded).
Set up document pipeline (loaders, splitters, transformers).
Wire streaming and event handlers.
Add production guardrails: retry, circuit breaker, cost tracking, observability.

Output

LangChain architecture with chain/retriever/agent patterns, memory strategy, streaming setup, production config.

Response Format

## LangChain Architecture
### LCEL Pipeline
Type: {RunnableSequence/RunnableParallel/RunnableBranch}
Steps: [{step1}, {step2}, ...]
Streaming: {async/event handlers} | Batch Size: {N}

### Retriever
Strategy: {base/multi-query/ensemble/compression/parent-document}
Retrievers: [{type, top-K, params}]
Final Top-K: {N}

### Agent
Type: {tool-calling/ReAct/custom}
Tools: [{name, description, schema}]
Max Iterations: {N} | Max Execution Time: {seconds}
Early Stopping: {generate/error}

### Memory
Type: {conversation/summary/buffer/window/entity}
Window Size: {N turns} | Summary LLM: {model}
Entity Store: {enabled/disabled}

### Document Pipeline
Loaders: [{source, type}]
Splitter: {strategy} | Chunk Size: {tokens} | Overlap: {tokens}
Transformer: {type, params}

### Production
Retry: {strategy} | Cache: {type}
Tracing: {LangSmith project}
Cost Tracking: {enabled}

No preamble. No postamble. No explanations. No filler/hedging/transitions. Compress output.

Completion Criteria

LCEL pipeline uses RunnableSequence/RunnableParallel/RunnableBranch as appropriate.
Retriever strategy matches data distribution and query patterns.
Agent tools have validated schemas and error handling.
Memory strategy bounds token consumption.
Document pipeline handles source heterogeneity.
Streaming is wired for production (async generators or event handlers).
Error handling and retry logic are specified.
LangSmith tracing is configured for observability.

Decision Trees

Chain Type Decision Tree

Is it a single-turn Q&A?
  ├─ With external knowledge ──► RAG chain (retriever + prompt + llm)
  ├─ No external knowledge ────► LLM chain (prompt + llm + parser)
  └─ Multiple docs ───────────► Stuff/MapReduce/Refine chain
Is it multi-turn conversation?
  ├─ With memory ──► ConversationChain + memory
  ├─ With retrieval ──► ConversationalRetrievalChain
  └─ With tools ──► Agent with memory
Is it structured output?
  ├─ JSON ──► LCEL + JsonOutputParser
  ├─ Pydantic ──► LCEL + PydanticOutputParser
  └─ Enum/category ──► RunnableBranch or RouterChain
Is it long document processing?
  ├─ Single doc ──► Stuff chain (if fits context)
  ├─ Summarize ──► MapReduce / Refine chain
  └─ Extract ──► LCEL + structured extraction

Memory Type Decision Tree

Does the conversation need full history?
  ├─ Yes, short (< 10 turns) ──► ConversationBufferMemory
  ├─ Yes, long (10+ turns) ────► ConversationSummaryMemory
  └─ Yes, bounded budget ─────► ConversationBufferWindowMemory
Does it need entity tracking?
  ├─ Entity extraction required ──► ConversationEntityMemory
  └─ Hybrid approach ──► WindowMemory + entity extraction callback
Does it need persistence?
  ├─ Session-level ──► RedisChatMessageHistory (with TTL)
  ├─ Durable storage ───► PostgresChatMessageHistory
  ├─ Dev/local ──► SQLChatMessageHistory (SQLite)
  └─ Custom ──► Subclass BaseChatMessageHistory
Is memory shared across users/sessions?
  ├─ Shared context ──► Redis with session_id scoping
  └─ Per-user ──► Session ID isolation in all backends

Retriever Type Decision Tree

Are queries typically well-specified?
  ├─ Yes, clear intent ──► Base retriever (dense / BM25)
  ├─ No, ambiguous ──► MultiQueryRetriever (N=3 min)
  └─ Multi-lingual ──► MultiQueryRetriever with translation variants
Do you have multiple retrieval methods?
  ├─ Dense + sparse ──► EnsembleRetriever (weighted fusion)
  └─ Use weights: semantic corpus → dense-heavy, keyword corpus → sparse-heavy
Do you need to reduce noise?
  ├─ Documents are long/noisy ──► ContextualCompressionRetriever
  ├─ LLM-based compression ──► LLMChainExtractor
  └─ Embedding-based filtering ──► EmbeddingsFilter
Do you need full document context?
  └─ Small chunks for search, big chunks for reading ──► ParentDocumentRetriever

Decision Functions

Chain Type Selector

from typing import Literal

ChainArch = Literal["rag", "llm", "conversational", "structured",
                     "map_reduce", "refine", "stuff", "agent"]

def select_chain_architecture(
    has_knowledge_base: bool,
    is_multi_turn: bool,
    needs_structured_output: bool,
    document_count: int = 0,
    document_size_tokens: int = 0,
) -> ChainArch:
    if has_knowledge_base and not is_multi_turn:
        return "rag"
    if has_knowledge_base and is_multi_turn:
        return "agent"
    if needs_structured_output:
        return "structured"
    if is_multi_turn:
        return "conversational"
    if document_count > 1:
        return "map_reduce" if document_size_tokens > 4000 else "stuff"
    return "llm"

def select_memory_type(
    expected_turns: int,
    needs_entity_tracking: bool,
    max_token_budget: int = 4000,
) -> str:
    if needs_entity_tracking:
        return "entity"
    if expected_turns > 20 and max_token_budget < 8000:
        return "summary"
    if expected_turns > 10:
        return "window"
    return "buffer"

def select_retriever_strategy(
    ambiguity_level: float,
    has_sparse_retriever: bool,
    has_dense_retriever: bool,
    noise_tolerance: float,
) -> str:
    if ambiguity_level > 0.7:
        return "multi_query"
    if has_sparse_retriever and has_dense_retriever:
        return "ensemble"
    if noise_tolerance < 0.3:
        return "compression"
    return "base"

Architectural Patterns

Pattern 1: RAG Chain

from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template(
    "Answer using only this context:\n{context}\n\nQuestion: {question}"
)

def format_docs(docs):
    return "\n\n".join(d.page_content for d in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

Pattern 2: Conversational RAG with History

from langchain_core.messages import HumanMessage, AIMessage
from operator import itemgetter

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Use context if provided."),
    ("placeholder", "{chat_history}"),
    ("human", "{question}"),
])

def serialize_history(messages):
    return [HumanMessage(content=m["input"]) if m["role"] == "user"
            else AIMessage(content=m["output"])
            for m in messages]

conversational_rag = (
    RunnablePassthrough.assign(
        chat_history=lambda x: serialize_history(x.get("chat_history", [])),
        context=lambda x: format_docs(retriever.invoke(x["question"])),
    )
    | prompt
    | llm
    | StrOutputParser()
)

Pattern 3: Multi-Step Reasoning Chain

decomposition_prompt = ChatPromptTemplate.from_template(
    "Break this question into sub-questions:\n{question}"
)
answer_prompt = ChatPromptTemplate.from_template(
    "Context: {context}\nQuestion: {sub_q}\nAnswer:"
)
synthesis_prompt = ChatPromptTemplate.from_template(
    "Synthesize these answers into a final response:\n{answers}\n\nOriginal question: {question}"
)

def retrieve_for_each(sub_questions: str, retriever):
    questions = [q.strip() for q in sub_questions.split("\n") if q.strip()]
    contexts = [format_docs(retriever.invoke(q)) for q in questions]
    return "\n\n".join(contexts)

multi_step_chain = (
    RunnablePassthrough.assign(
        sub_questions=decomposition_prompt | llm | StrOutputParser()
    )
    .assign(context=lambda x: retrieve_for_each(x["sub_questions"], retriever))
    .assign(answers=answer_prompt | llm | StrOutputParser())
    .assign(final=synthesis_prompt | llm | StrOutputParser())
)

Pattern 4: Parallel Multi-Source Retrieval

from langchain_core.runnables import RunnableParallel

parallel_retrieval = RunnableParallel(
    wiki=wiki_retriever,
    docs=internal_docs_retriever,
    web=web_search_retriever,
)

def fuse_sources(sources: dict) -> str:
    parts = []
    for src, docs in sources.items():
        for d in docs:
            d.metadata["source_type"] = src
            parts.append(d)
    return format_docs(sorted(parts, key=lambda x: x.metadata.get("score", 0), reverse=True)[:10])

fused_rag = (
    {"context": parallel_retrieval | fuse_sources, "question": RunnablePassthrough()}
    | prompt | llm | StrOutputParser()
)

Pattern 5: Conditional Routing (RunnableBranch)

from langchain_core.runnables import RunnableBranch

tech_prompt = ChatPromptTemplate.from_template("Answer tech question: {query}")
general_prompt = ChatPromptTemplate.from_template("Answer: {query}")
code_prompt = ChatPromptTemplate.from_template("Write code for: {query}")

branch_chain = RunnableBranch(
    (lambda x: any(tag in x["query"].lower() for tag in ["python", "java", "code"]),
     code_prompt | llm | StrOutputParser()),
    (lambda x: any(tag in x["query"].lower() for tag in ["server", "api", "database"]),
     tech_prompt | llm | StrOutputParser()),
    general_prompt | llm | StrOutputParser(),
)

Pattern 6: Tool-Calling Agent

from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.tools import tool
from langchain_core.prompts import ChatPromptTemplate

@tool
def search_kb(query: str, top_k: int = 5) -> str:
    """Search the knowledge base. Use for factual questions."""
    results = vectorstore.similarity_search(query, k=top_k)
    return "\n\n".join(d.page_content for d in results)

@tool
def calculator(expression: str) -> str:
    """Evaluate a math expression. Use for calculations."""
    try:
        import ast, operator
        ops = {ast.Add: operator.add, ast.Sub: operator.sub,
               ast.Mult: operator.mul, ast.Div: operator.truediv}
        tree = ast.parse(expression, mode="eval")
        return str(eval(compile(tree, "", "eval")))
    except Exception as e:
        return f"Error: {e}"

tools = [search_kb, calculator]
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant with tools. Use them when needed."),
    ("placeholder", "{chat_history}"),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    max_iterations=10,
    max_execution_time=30,
    early_stopping_method="generate",
    handle_parsing_errors=True,
    return_intermediate_steps=True,
)

Pattern 7: Streaming with astream_events

from typing import AsyncIterator

async def stream_response(chain, input_data: dict) -> AsyncIterator[dict]:
    async for event in chain.astream_events(input_data, version="v2"):
        kind = event["event"]
        if kind == "on_chat_model_stream":
            chunk = event["data"]["chunk"]
            if hasattr(chunk, "content") and chunk.content:
                yield {"type": "token", "content": chunk.content}
        elif kind == "on_retriever_end":
            docs = event["data"]["output"]
            yield {"type": "sources", "documents": [
                {"content": d.page_content[:200], "source": d.metadata.get("source", ""),
                 "score": d.metadata.get("score", 0.0)} for d in docs
            ]}
        elif kind == "on_tool_end":
            yield {"type": "tool_result", "name": event.get("name", ""),
                   "output": str(event["data"]["output"])[:200]}
        elif kind == "on_chain_end":
            yield {"type": "done"}

Pattern 8: Callback for Cost Tracking

from langchain.callbacks.base import BaseCallbackHandler
from collections import defaultdict

class CostTrackingHandler(BaseCallbackHandler):
    MODEL_COSTS = {
        "gpt-4o": {"input": 2.50 / 1e6, "output": 10.00 / 1e6},
        "gpt-4o-mini": {"input": 0.15 / 1e6, "output": 0.60 / 1e6},
        "claude-sonnet-4-20250514": {"input": 3.00 / 1e6, "output": 15.00 / 1e6},
        "claude-haiku-3-5-20241022": {"input": 0.80 / 1e6, "output": 4.00 / 1e6},
    }

    def __init__(self):
        self.total_input_tokens = 0
        self.total_output_tokens = 0
        self.total_cost = 0.0
        self.model_name = "unknown"
        self.run_counts = defaultdict(int)

    def on_llm_start(self, serialized, prompts, **kwargs):
        self.model_name = serialized.get("kwargs", {}).get("model_name", "unknown")
        self.run_counts[self.model_name] += 1

    def on_llm_end(self, response, **kwargs):
        usage = response.llm_output.get("token_usage", {}) if response.llm_output else {}
        inp = usage.get("prompt_tokens", 0)
        out = usage.get("completion_tokens", 0)
        self.total_input_tokens += inp
        self.total_output_tokens += out
        costs = self.MODEL_COSTS.get(self.model_name, {"input": 0, "output": 0})
        self.total_cost += inp * costs["input"] + out * costs["output"]

    def get_report(self) -> dict:
        return {
            "model": self.model_name,
            "calls": self.run_counts[self.model_name],
            "input_tokens": self.total_input_tokens,
            "output_tokens": self.total_output_tokens,
            "total_tokens": self.total_input_tokens + self.total_output_tokens,
            "estimated_cost_usd": round(self.total_cost, 6),
        }

Pattern 9: Production RAG with Caching and Tracing

from langchain.globals import set_llm_cache
from langchain.cache import RedisCache
from langchain.callbacks.tracers import LangChainTracer
from langchain_core.runnables import RunnableConfig

set_llm_cache(RedisCache(redis_client))

tracer = LangChainTracer(project_name="my-rag-prod")

def invoke_with_observability(question: str, user_id: str) -> str:
    config = RunnableConfig(
        callbacks=[CostTrackingHandler(), tracer],
        metadata={"user_id": user_id, "env": "production"},
        tags=["rag", "production"],
    )
    return rag_chain.invoke({"question": question}, config=config)

Production Considerations

Error Handling Strategy

Error Type	Detection	Recovery	Retry Strategy
RateLimitError	Exception from provider	Exponential backoff + jitter	3 attempts, 1s-10s
APITimeoutError	Request timeout	Circuit breaker after 5 failures	30s cooldown
ContextLengthExceeded	Token count in callback	Truncate oldest messages	Immediate retry
BadRequestError	Exception from provider	Validate input schema	No retry
ToolExecutionError	Tool raises exception	Return structured error string	No retry

Circuit Breaker Pattern

class CircuitBreaker:
    def __init__(self, threshold: int = 5, recovery_timeout: float = 30.0):
        self.threshold = threshold
        self.recovery_timeout = recovery_timeout
        self.failures = 0
        self.last_failure = 0.0
        self.state = "closed"

    def __call__(self, func):
        async def wrapper(*args, **kwargs):
            if self.state == "open":
                if (time.time() - self.last_failure) > self.recovery_timeout:
                    self.state = "half-open"
                else:
                    return {"error": "service_unavailable",
                            "message": "Circuit breaker open. Please retry later."}
            try:
                result = await func(*args, **kwargs)
                if self.state == "half-open":
                    self.state = "closed"
                    self.failures = 0
                return result
            except Exception:
                self.failures += 1
                self.last_failure = time.time()
                if self.failures >= self.threshold:
                    self.state = "open"
                raise
        return wrapper

LangSmith Trace Configuration

from langsmith import Client
from langchain.callbacks.tracers import LangChainTracer
from langchain_core.tracers.context import tracing_v2_enabled

client = Client()

# Option 1: Per-call tracing
tracer = LangChainTracer(
    project_name="my-project",
    client=client,
)
result = chain.invoke(input, config={"callbacks": [tracer]})

# Option 2: Context manager
with tracing_v2_enabled(project_name="my-project"):
    result = chain.invoke(input)

# Option 3: Environment variables
# LANGCHAIN_TRACING_V2=true
# LANGCHAIN_PROJECT=my-project
# LANGCHAIN_ENDPOINT=https://api.smith.langchain.com

Cost Budget Manager

class BudgetManager:
    def __init__(self, monthly_budget_usd: float):
        self.monthly_budget = monthly_budget_usd
        self.spent = 0.0
        self.daily_log: list[dict] = []

    def track(self, cost: float, operation: str):
        self.spent += cost
        self.daily_log.append({
            "timestamp": time.time(),
            "cost": cost,
            "operation": operation,
        })
        if self.spent > self.monthly_budget:
            logger.warning(f"Monthly budget ${self.monthly_budget} exceeded: ${self.spent:.2f}")
            return False
        return True

    def remaining(self) -> float:
        return max(0.0, self.monthly_budget - self.spent)

    def daily_spend(self) -> float:
        today = time.time() - 86400
        return sum(e["cost"] for e in self.daily_log if e["timestamp"] > today)

Anti-Patterns

Anti-Pattern 1: Over-Abstraction

# BAD: Heavy inheritance for simple transformations
class QueryProcessor(BaseRunnable):
    def __init__(self, rules: list[Callable]):
        self.rules = rules
    def invoke(self, input: str, config: RunnableConfig | None = None) -> str:
        result = input
        for rule in self.rules:
            result = rule(result)
        return result

# GOOD: Direct RunnableLambda composition
chain = RunnableLambda(str.strip) | RunnableLambda(str.lower) | llm

Anti-Pattern 2: Ignoring Async

# BAD: Synchronous calls blocking async context
def handle_request(query: str):
    return chain.invoke(query)  # Blocks event loop

# GOOD: Async throughout
async def handle_request(query: str):
    return await chain.ainvoke(query)

Anti-Pattern 3: Memory Leaks via Unbounded History

# BAD: No bounds on memory
memory = ConversationBufferMemory(return_messages=True)  # Grows forever

# GOOD: Bounded memory
memory = ConversationBufferWindowMemory(k=10, return_messages=True)
# OR with token limit
from langchain.memory import ConversationTokenBufferMemory
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=4000)

Anti-Pattern 4: Callback Spaghetti

# BAD: Inline callbacks doing too much
chain.invoke(input, config={"callbacks": [
    handler_a, handler_b, handler_c, handler_d,
    handler_e, handler_f, handler_g,
]})

# GOOD: Composite handler
class CompositeHandler(BaseCallbackHandler):
    def __init__(self):
        self.logger = LoggingHandler()
        self.cost = CostTrackingHandler()
        self.latency = LatencyTrackingHandler()

    def on_llm_end(self, response, **kwargs):
        self.logger.on_llm_end(response, **kwargs)
        self.cost.on_llm_end(response, **kwargs)
        self.latency.on_llm_end(response, **kwargs)

chain.invoke(input, config={"callbacks": [CompositeHandler()]})

Anti-Pattern 5: No Max Iterations on Agents

# BAD: Agent can loop forever
executor = AgentExecutor(agent=agent, tools=tools)

# GOOD: Always bound iterations
executor = AgentExecutor(
    agent=agent, tools=tools,
    max_iterations=15, max_execution_time=30,
    early_stopping_method="generate",
)

# BAD: No return_intermediate_steps for debugging
executor = AgentExecutor(agent=agent, tools=tools, max_iterations=15)

# GOOD: Capture intermediate steps
executor = AgentExecutor(
    agent=agent, tools=tools,
    max_iterations=15,
    return_intermediate_steps=True,  # Debug agent reasoning
)

Anti-Pattern 6: No Fallback Chain

# BAD: Single point of failure
chain = retriever | prompt | llm | parser

# GOOD: Fallback chain for degraded mode
fallback_chain = prompt | fallback_llm | parser
chain = retriever | prompt | llm.with_fallback(fallback_llm) | parser
# Or full chain fallback
chain = primary.with_fallback(fallback)

Anti-Pattern 7: Missing Input Validation

# BAD: Assumes valid input
chain = prompt | llm | parser

# GOOD: Validate at entry point
from pydantic import BaseModel, Field

class QueryInput(BaseModel):
    text: str = Field(min_length=1, max_length=10000)
    user_id: str = Field(pattern=r"^user_\d+$")
    top_k: int = Field(ge=1, le=20, default=5)

def validate_input(data: dict) -> dict:
    validated = QueryInput(**data)
    return validated.model_dump()

chain = RunnableLambda(validate_input) | prompt | llm | parser

Anti-Pattern 8: Mixed Sync/Async in Pipeline

# BAD: Sync step in async pipeline
async def process(query: str):
    context = retriever.invoke(query)  # Sync call blocking event loop
    return await llm.ainvoke(context)

# GOOD: Consistent async
async def process(query: str):
    context = await retriever.ainvoke(query)
    return await llm.ainvoke(context)

LangChain vs. Direct LLM Calls

Aspect	LangChain (LCEL)	Direct LLM Calls
Boilerplate	Minimum for complex pipelines	Grows with each feature added
Composability	Pipe operator, Runnable interface	Manual function chaining
Streaming	Built-in via astream_events	Manual implementation
Retry/Error Handling	.with_retry(), with_fallback()	Manual try/except loops
Callbacks	Full event system	Manual instrumentation
Observability	LangSmith integration built-in	Custom metric collection
Memory	Pluggable memory backends	Manual state management
Tool Calling	Agent framework with validation	Manual function calling loop
Testing	Mock LLMs, FakeListLLM	Mock HTTP calls
Caching	Global set_llm_cache()	Manual cache layer
Flexibility	Constrained by Runnable interface	Full control over requests
Dependency Weight	Heavy (many sub-packages)	Minimal (single HTTP client)
Learning Curve	Medium (abstract concepts)	Low (direct API calls)
Debugging	Opaque if not using tracers	Easy to log raw requests
Best For	Complex multi-step pipelines	Simple single calls, microservices

When to Use Direct Calls

# Simple chat completion — direct call is fine
import openai
response = openai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": query}],
)
return response.choices[0].message.content

When to Use LangChain

# RAG with retrieval, memory, streaming — LangChain adds value
chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt | llm | StrOutputParser()
)
async for token in chain.astream_events({"question": query}, version="v2"):
    ...

Rules

Use RunnableParallel for independent branches, RunnableSequence for dependent steps.
Always use .assign() over manual dict construction in LCEL.
MultiQueryRetriever with N=3 is the minimum viable for ambiguous queries.
EnsembleRetriever weights must sum to 1.0.
Agents must have a max iteration limit to prevent infinite loops.
Memory must have a token limit to avoid context overflow.
Streaming is mandatory for chat applications.
Always wrap LLM calls with retry logic for production.
Track token usage and cost per user/session in production.
Use with_fallback() for critical chains to handle model degradation.
Validate all external input before it enters the pipeline.
Keep callback handlers focused — one responsibility per handler.
Use async consistently — never mix sync calls in async pipelines.
Version prompt templates alongside code for traceability.
Test chains with FakeListLLM; test agents with mock tool responses.

References

references/callback-streaming.md — Callbacks and Streaming
references/document-pipeline.md — Document Pipeline
references/langchain-memory-persistence.md — LangChain Memory Persistence
references/langchain-patterns-advanced.md — Langchain Patterns Advanced Topics
references/langchain-patterns-fundamentals.md — Langchain Patterns Fundamentals
references/langchain-production.md — LangChain Production Deployment
references/langchain-testing.md — LangChain Testing
references/lcel-patterns.md — LCEL (LangChain Expression Language) Patterns
references/retriever-agent-patterns.md — Retriever & Agent Patterns
references/tool-integration.md — Tool Integration Patterns
references/custom-components.md — Custom Component Development
references/migration-guides.md — Migration Guides
references/llm-comparison.md — LangChain vs Direct LLM Calls

Handoff

For vector database setup, hand off to ai-vector-databases. For MCP tool integration, hand off to ai-mcp-patterns. For observability, hand off to ai-ai-observability.