name: foundry-iq description: Build enterprise RAG solutions using Foundry IQ with Azure AI Search agentic retrieval. Use when implementing policy assistants, knowledge bases with citations, or multi-hop question answering systems.
Foundry IQ Agent Framework Integration Skill
Folder Contents
| File | Type | Description |
|---|---|---|
SKILL.md |
Documentation | Main skill documentation with architecture, API reference, and agentic retrieval deep dive |
PRD.md |
Documentation | Product Requirements Document for the skill |
.env.sample |
Configuration | Sample environment variables for Azure OpenAI and AI Search |
requirements.txt |
Dependencies | Python package dependencies (azure-search-documents, azure-ai-projects, fastapi) |
| scripts/ | ||
scripts/__init__.py |
Module | Package initializer with exports |
scripts/search_index_manager.py |
Index Manager | Creates and manages Azure AI Search indexes with vector search and HNSW configuration |
scripts/document_indexer.py |
Indexer | Document chunking with sentence boundary detection and batch upload to search index |
scripts/knowledge_agent_manager.py |
Agent Manager | Creates Knowledge Agents with configurable reasoning effort; KnowledgeAgentRetriever for multi-turn retrieval |
scripts/azure_openai_client.py |
LLM Client | Azure OpenAI client for chat completions; PolicyBot combining retrieval + generation |
Overview
Foundry IQ is Microsoft's enterprise-grade RAG solution that treats retrieval as a reasoning task. It uses Azure AI Search Knowledge Bases with agentic retrieval to enable multi-hop reasoning, query planning, and citation-backed responses.
Architecture
+---------------------------------------------------------------------+
| Foundry IQ Architecture |
+---------------------------------------------------------------------+
| |
| +----------------+ +------------------+ +-----------------+ |
| | Documents |--->| Azure AI Search |--->| Knowledge | |
| | (Blob) | | Index | | Agent | |
| +----------------+ +------------------+ +-----------------+ |
| | |
| v |
| +----------------+ +------------------+ +-----------------+ |
| | FastAPI |<-->| Agent Framework |<-->| Agentic | |
| | Endpoint | | (ChatAgent) | | Retrieval | |
| +----------------+ +------------------+ +-----------------+ |
| | |
| v |
| +------------------+ |
| | Azure OpenAI | |
| | (Configurable) | |
| +------------------+ |
| |
+---------------------------------------------------------------------+
Key Components
1. Azure AI Search Knowledge Agent
The Knowledge Agent provides:
- Query Planning: LLM-powered decomposition of complex queries
- Multi-hop Reasoning: Following chains of information across documents
- Answer Synthesis: Comprehensive context with citations
- Retrieval Modes:
semantic(fast) vsagentic(intelligent)
2. Retrieval Modes
| Mode | Speed | Use Case |
|---|---|---|
semantic |
~100-300ms | Simple Q&A, speed-critical apps |
agentic |
~1-3s | Complex questions, multi-hop reasoning |
3. Reasoning Effort Levels
minimal: Basic retrievallow: Light query planningmedium: Full query planning and multi-hop reasoning
Project Structure
The recommended project structure for a Foundry IQ implementation:
project-root/
|
+-- .env # All configuration (never hardcode!)
|
+-- src/
| +-- foundry-iq/
| +-- app/
| | +-- __init__.py
| | +-- main.py # FastAPI application & endpoints
| | +-- models.py # Pydantic request/response models
| | +-- services.py # Service layer (all business logic)
| |
| +-- requirements.txt # Python dependencies
| +-- Dockerfile # Container configuration
| +-- docker-compose.yml # Docker orchestration
|
+-- notebooks/
| +-- foundry_iq_demo.ipynb # Interactive demonstration
|
+-- .github/
| +-- skills/
| +-- foundry-iq/
| +-- SKILL.md # This documentation
| +-- scripts/ # Reusable building blocks
| +-- __init__.py
| +-- search_index_manager.py
| +-- document_indexer.py
| +-- knowledge_agent_manager.py
| +-- azure_openai_client.py
|
+-- research.md # Training materials & micro-hack design
Environment Variables
All configuration should be externalized to .env:
# Azure OpenAI Configuration
AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com
AZURE_OPENAI_API_KEY=<api-key>
AZURE_OPENAI_API_VERSION=2024-12-01-preview
# Azure AI Search Configuration
AI_SEARCH_ENDPOINT=https://<service>.search.windows.net
AI_SEARCH_KEY=<admin-key>
AI_SEARCH_API_VERSION=2025-01-01-preview
# PolicyBot Configuration
POLICY_INDEX_NAME=policy-documents
POLICY_AGENT_NAME=policy-agent
POLICY_CHAT_MODEL=gpt-4.1
# Document Chunking
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
# Agentic Retrieval
REASONING_EFFORT=medium # minimal | low | medium
OUTPUT_MODE=extractive_data
# Server Configuration
API_HOST=0.0.0.0
API_PORT=8001
Required Packages
pip install azure-search-documents>=11.7.0b1
pip install azure-ai-projects
pip install azure-identity
pip install openai
pip install fastapi uvicorn
pip install python-dotenv
pip install requests aiohttp
Building Block Scripts
1. search_index_manager.py
Creates and manages Azure AI Search indexes with vector search configuration.
Key Class: SearchIndexManager
- Creates indexes with semantic search configuration
- Configures vector search with HNSW algorithm
- Manages index lifecycle (create, list, delete)
2. document_indexer.py
Indexes documents into Azure AI Search with smart chunking.
Key Class: DocumentIndexer
- Chunking with configurable size and overlap
- Smart sentence boundary detection
- Batch document upload
- Sample policy documents included
3. knowledge_agent_manager.py
Creates and manages Knowledge Agents for agentic retrieval.
Key Classes:
KnowledgeAgentManager: Creates agents with configurable reasoning effortKnowledgeAgentRetriever: Performs retrieval with multi-turn history
4. azure_openai_client.py
Azure OpenAI client for chat completions and embeddings.
Key Classes:
AzureOpenAIClient: Low-level chat completionsPolicyBot: High-level Q&A combining retrieval + generation
Agentic Retrieval Deep Dive
What is Agentic Retrieval?
Traditional RAG follows a simple pattern:
Query -> Single Search -> Return Top K Results -> LLM Synthesizes
Agentic retrieval treats retrieval as a reasoning task:
Query -> LLM Plans Sub-queries -> Multiple Searches -> Reflection -> Synthesis
How It Works
- Query Analysis: The Knowledge Agent analyzes the user's question
- Query Planning: Decomposes complex queries into sub-queries
- Iterative Search: Executes sub-queries, following information chains
- Result Aggregation: Combines results from multiple searches
- Citation Tracking: Maintains source references throughout
Example: Multi-hop Query
User Question: "Can I work remotely from another country while using PTO?"
Traditional RAG might search once and miss the connection.
Agentic Retrieval decomposes:
- Sub-query 1: "What is the remote work policy for international work?"
- Sub-query 2: "What are the PTO policy restrictions?"
- Sub-query 3: "Are there rules about combining remote work with PTO?"
Then synthesizes an answer spanning multiple documents.
Implementation Location
Agentic retrieval is implemented in services.py:
# KnowledgeAgentService.retrieve() - Line 433-455
def retrieve(self, query: str) -> Dict[str, Any]:
"""Perform agentic retrieval."""
self.messages.append({"role": "user", "content": query})
request_body = {
"messages": [
{"role": msg["role"], "content": [{"text": msg["content"]}]}
for msg in self.messages if msg["role"] != "system"
]
}
url = f"{self.endpoint}/agents/{self.agent_name}/retrieve?api-version={self.api_version}"
response = requests.post(url=url, headers=self.headers, json=request_body)
# ... response handling
Configuration Options
| Parameter | Values | Description |
|---|---|---|
reasoningEffort |
minimal, low, medium | Query planning depth |
outputMode |
extractive_data, generated_text | How results are returned |
API Reference
Knowledge Agent Retrieval
from azure.search.documents.agent import KnowledgeAgentRetrievalClient
from azure.search.documents.agent.models import (
KnowledgeAgentRetrievalRequest,
KnowledgeAgentMessage,
KnowledgeAgentMessageTextContent,
SearchIndexKnowledgeSourceParams
)
agent_client = KnowledgeAgentRetrievalClient(
endpoint=search_endpoint,
agent_name=knowledge_agent_name,
credential=credential
)
req = KnowledgeAgentRetrievalRequest(
messages=[
KnowledgeAgentMessage(
role="user",
content=[KnowledgeAgentMessageTextContent(text=query)]
)
],
knowledge_source_params=[
SearchIndexKnowledgeSourceParams(
knowledge_source_name=index_name,
kind="searchIndex"
)
]
)
result = agent_client.retrieve(retrieval_request=req)
Direct REST API (Alternative)
# Used in KnowledgeAgentService
url = f"{endpoint}/agents/{agent_name}/retrieve?api-version=2025-01-01-preview"
headers = {"Content-Type": "application/json", "api-key": api_key}
request_body = {
"messages": [
{"role": "user", "content": [{"text": "What is the PTO policy?"}]}
]
}
response = requests.post(url, headers=headers, json=request_body)
Sample Use Cases
PolicyBot - Enterprise Policy Assistant
Answer questions about HR policies, PTO, expenses, etc. with citations.
query = "What's the approval process for expenses over $5000?"
# Returns: Cited answer from policy documents with source annotations
Multi-hop Reasoning
query = "Can I work remotely from another country while using PTO?"
# Agent decomposes into:
# 1. What is the remote work policy?
# 2. What is the PTO policy?
# 3. Are there restrictions on combining them?
Citations Format
Responses include annotations in the format:
[message_idx:search_idx+source_name]
Example: "Employees receive 15 PTO days [0:1+pto_policy.md]"
Lessons Learned
1. Configuration Management
- Always externalize config to environment variables
- Never hardcode model names, API versions, or endpoints
- Use sensible defaults with env var overrides
2. Chunking Strategy
- 1000 characters with 200 overlap works well for policy documents
- Smart sentence boundary detection prevents mid-sentence splits
- Overlap ensures context continuity across chunks
3. Reasoning Effort Selection
- Use
minimalfor simple factual queries - Use
mediumfor complex multi-hop questions - Higher effort = more tokens = more cost + latency
4. Error Handling
- Knowledge Agents may not exist on first run - handle gracefully
- Index creation is idempotent - "already exists" is OK
- API version mismatches are common - use preview versions for new features
5. Service Architecture
- Separate concerns: Index management, Document indexing, Agent retrieval, LLM generation
- Use service classes for testability and reusability
- Keep FastAPI endpoints thin - delegate to services
6. Multi-turn Conversations
- Track message history for context continuity
- Allow conversation reset for fresh starts
- Store conversations in-memory or external cache
Best Practices
- Use appropriate retrieval mode:
semanticfor simple queries,agenticfor complex - Set reasoning effort based on query complexity:
mediumfor multi-hop - Include clear agent instructions for citation format
- Handle gracefully when KB lacks relevant content
- Log all configuration at startup for debugging
- Use health checks to verify all services are operational
Troubleshooting
Common Errors
| Error | Cause | Solution |
|---|---|---|
| API Version mismatch | Using old version | Use 2025-01-01-preview for Knowledge Agents |
| Missing index | Index not created | Run /setup endpoint first |
| Authentication failed | Bad credentials | Check API keys in .env |
| No results | Empty index | Index sample documents first |
| Timeout | Large retrieval | Reduce reasoning effort or chunk size |
Debug Checklist
- Check environment variables are loaded
- Verify Azure services are accessible (health endpoint)
- Confirm index exists and has documents
- Test simple queries before complex ones
- Check API response codes and error messages
Extension Ideas
- Add SharePoint source: Connect document libraries as knowledge sources
- Multi-agent orchestration: Specialized agents for different domains
- Streaming responses: Real-time token streaming with Gradio UI
- Custom functions: Email escalation, ticket creation, etc.
- Caching layer: Redis for conversation history and frequent queries
- Observability: OpenTelemetry tracing for request flow visibility