instructor - SKILL.md Agent Skill

name: instructor description: Structured LLM outputs with Instructor — Pydantic models as response schemas for OpenAI, Anthropic, and any OpenAI-compatible API. version: 1.0.0 author: hermes-CCC (ported from Hermes Agent by NousResearch) license: MIT metadata: hermes: tags: [MLOps, Instructor, Structured-Output, Pydantic, LLM, Extraction] related_skills: [vllm]

Instructor — Structured LLM Outputs

Get type-safe, validated Pydantic objects from any LLM instead of raw strings.

Setup

pip install instructor pydantic
pip install anthropic  # or openai

Basic Usage (Anthropic)

import anthropic
import instructor
from pydantic import BaseModel

client = instructor.from_anthropic(anthropic.Anthropic())

class UserProfile(BaseModel):
    name: str
    age: int
    skills: list[str]
    experience_years: int

profile = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "Extract: John is a 32-year-old Python developer with 8 years experience in ML and DevOps."
    }],
    response_model=UserProfile,
)

print(profile.name)          # "John"
print(profile.age)           # 32
print(profile.skills)        # ["Python", "ML", "DevOps"]
print(profile.experience_years)  # 8

With OpenAI

import openai
import instructor

client = instructor.from_openai(openai.OpenAI())

result = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "..."}],
    response_model=UserProfile,
)

Nested Models

from pydantic import BaseModel, Field
from typing import Optional

class Address(BaseModel):
    street: str
    city: str
    country: str

class Company(BaseModel):
    name: str
    industry: str
    founded_year: int
    headquarters: Address
    employee_count: Optional[int] = None

class ResearchPaper(BaseModel):
    title: str
    authors: list[str]
    abstract: str
    key_findings: list[str] = Field(description="3-5 bullet points")
    methodology: str
    year: int

Validation with Pydantic

from pydantic import BaseModel, field_validator, Field

class SentimentAnalysis(BaseModel):
    sentiment: str = Field(description="positive, negative, or neutral")
    confidence: float = Field(ge=0, le=1)
    reasoning: str

    @field_validator("sentiment")
    def validate_sentiment(cls, v):
        if v not in ["positive", "negative", "neutral"]:
            raise ValueError("Must be positive, negative, or neutral")
        return v

Streaming Partial Objects

from instructor import Partial

for partial_profile in client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "..."}],
    response_model=Partial[UserProfile],
):
    print(partial_profile)  # updates as tokens arrive

Batch Extraction

from typing import Iterable

class Contact(BaseModel):
    name: str
    email: str
    phone: Optional[str]

# Extract multiple contacts from one text
class ContactList(BaseModel):
    contacts: list[Contact]

text = """
Alice: alice@example.com, 555-1234
Bob: bob@example.com
Carol: carol@example.com, 555-5678
"""

result = client.messages.create(
    model="claude-haiku-4-5",
    max_tokens=512,
    messages=[{"role": "user", "content": f"Extract contacts:\n{text}"}],
    response_model=ContactList,
)

for contact in result.contacts:
    print(contact.name, contact.email)

With vLLM / Local Models

client = instructor.from_openai(
    openai.OpenAI(
        base_url="http://localhost:8000/v1",
        api_key="not-needed"
    ),
    mode=instructor.Mode.JSON,
)

Use Cases

Entity extraction from documents
Structured data from unstructured text
Classification with confidence scores
RAG with typed outputs
Form filling automation
API response parsing