name: Image Understanding description: Analyzes photos to extract structured information about pets, people, or world settings using Gemini 3 multimodal capabilities. triggers: - User uploads an image - Agent needs to understand visual content keywords: - analyze - understand - what is this - look at
Image Understanding Skill
Analyzes uploaded images and extracts structured information that can be used for character creation or world building.
When to Use
- User uploads a photo of their pet → Extract features, personality traits
- User uploads a selfie → Extract style, aesthetic for anime character
- User uploads a scene photo → Extract mood, setting, visual style for world building
Inputs
| Input | Type | Required | Default | Description |
|---|---|---|---|---|
image_path |
Path | Yes | — | Path to the image file |
analysis_type |
str | No | "auto" | One of: "pet", "person", "world", "auto" |
Outputs
| Output | Type | Description |
|---|---|---|
analysis |
dict | Structured analysis with type-specific fields |
Output Schema by Type
Pet Analysis:
{
"species": "dog",
"breed_guess": "golden retriever",
"physical_features": {
"coat_color": "golden",
"eye_color": "brown",
"distinctive_features": ["fluffy ears"]
},
"personality_traits": ["playful", "friendly"],
"suggested_character_archetype": "loyal companion"
}
Person Analysis:
{
"hair": {"color": "black", "style": "short"},
"fashion_aesthetic": "casual modern",
"expression_mood": "confident",
"suggested_anime_traits": ["protagonist energy"],
"suggested_character_archetype": "determined hero"
}
World Analysis:
{
"setting_type": "cyberpunk city",
"visual_style": {
"color_palette": "neon and dark",
"lighting": "dramatic",
"evoked_style": "blade runner"
},
"atmosphere": "mysterious",
"mood_keywords": ["futuristic", "lonely", "electric"]
}
Implementation Contract
class ImageUnderstanding:
async def execute(
self,
image_path: Path,
analysis_type: str = "auto"
) -> dict:
"""
Analyze an image and return structured information.
Raises:
FileNotFoundError: If image_path doesn't exist
ValueError: If analysis_type is invalid
APIError: If Gemini API call fails
"""
...
Example Usage
from skills.understand_image import ImageUnderstanding
skill = ImageUnderstanding()
# Auto-detect type
result = await skill.execute(image_path=Path("photo.jpg"))
# Specify type for better results
result = await skill.execute(
image_path=Path("my_dog.jpg"),
analysis_type="pet"
)
Dependencies
- Gemini 3 Flash/Pro API (multimodal)
- Pillow (image loading)
Error Handling
| Error | Cause | Recovery |
|---|---|---|
FileNotFoundError |
Image path invalid | Check path exists |
ValueError |
Invalid analysis_type | Use "pet", "person", "world", or "auto" |
APIError |
Gemini API failure | Retry with backoff |