vertex-ai-image

star 47

Generate, edit, or understand images using Google Gemini models. Use when asked to "generate an image", "create a picture", "edit an image", "modify this photo", "describe an image", "what's in this image", "analyze this photo", "image to text", "text to image", "explain this screenshot", "visual question answering", or any task involving image generation, image editing, or visual understanding.

vaayne By vaayne schedule Updated 4/2/2026

name: vertex-ai-image description: > Generate, edit, or understand images using Google Gemini models. Use when asked to "generate an image", "create a picture", "edit an image", "modify this photo", "describe an image", "what's in this image", "analyze this photo", "image to text", "text to image", "explain this screenshot", "visual question answering", or any task involving image generation, image editing, or visual understanding.

Vertex AI Image CLI

A Python CLI for image generation, editing, and understanding via Google Gemini.

Prerequisites

Set one of these API keys:

export GEMINI_API_KEY=your-key-here        # Recommended (Gemini API)
export GOOGLE_CLOUD_API_KEY=your-key-here  # Also supported

Models

Task Model ID Description
Generation (default) gemini-3.1-flash-image-preview Nano Banana 2 — fast, balanced cost/quality
Generation (pro) gemini-3-pro-image-preview Nano Banana Pro — professional assets, complex instructions
Generation (fast) gemini-2.5-flash-image Nano Banana — high-volume, low-latency
Image reading gemini-2.5-flash Text understanding model

Commands

Generate an image from text

# Output defaults to $XDG_CACHE_HOME/vertex-ai-images/generate-{ts}-a-cat-astronaut-floating-in-space.png
uv run --script scripts/vertex_ai_image.py generate \
  --prompt "A cat astronaut floating in space"

# Or specify a custom output path
uv run --script scripts/vertex_ai_image.py generate \
  --prompt "A cat astronaut floating in space" \
  --output output/cat-astronaut.png

With aspect ratio and resolution:

uv run --script scripts/vertex_ai_image.py generate \
  --prompt "A panoramic mountain landscape at sunset" \
  --output output/landscape.png \
  --aspect-ratio 16:9 \
  --size 2K

Image-only output (no accompanying text):

uv run --script scripts/vertex_ai_image.py generate \
  --prompt "A minimalist logo for a coffee shop" \
  --output output/logo.png \
  --image-only

With high thinking level for complex prompts:

uv run --script scripts/vertex_ai_image.py generate \
  --prompt "An infographic explaining photosynthesis as a recipe" \
  --output output/infographic.png \
  --thinking-level high \
  --size 4K

Using a different model:

uv run --script scripts/vertex_ai_image.py generate \
  --prompt "A product photo of a perfume bottle" \
  --model gemini-3-pro-image-preview \
  --output output/perfume.png \
  --size 4K

Options:

  • --prompt, -p — Text prompt (required)
  • --output, -o — Output file path (default: $XDG_CACHE_HOME/vertex-ai-images/generate-{ts}-{slug}.png)
  • --model, -m — Model ID (default: gemini-3.1-flash-image-preview)
  • --aspect-ratio, -a — Aspect ratio: 1:1, 1:4, 1:8, 2:3, 3:2, 3:4, 4:1, 4:3, 4:5, 5:4, 8:1, 9:16, 16:9, 21:9
  • --size, -s — Resolution: 512 (0.5K), 1K, 2K, 4K
  • --image-only — Return only image, suppress text output
  • --thinking-level, -t — Thinking level: minimal (default) or high
  • --dry-run — Preview without API call

Edit an image

Provide one or more input images with a text instruction to modify them:

uv run --script scripts/vertex_ai_image.py edit \
  --prompt "Make the cat wear a top hat and monocle" \
  --image photo.jpg \
  --output output/fancy-cat.png

With multiple reference images (up to 14):

uv run --script scripts/vertex_ai_image.py edit \
  --prompt "Create a group photo of these people at a beach" \
  --image person1.png \
  --image person2.png \
  --image person3.png \
  --output output/group.png \
  --aspect-ratio 16:9 \
  --size 2K

Style transfer with a reference:

uv run --script scripts/vertex_ai_image.py edit \
  --prompt "Redraw this photo in watercolor style" \
  --image original.jpg \
  --output output/watercolor.png

Options:

  • --prompt, -p — Edit instruction (required)
  • --image, -i — Input image path or gs:// URI (required, repeatable up to 14 times)
  • --output, -o — Output file path (default: $XDG_CACHE_HOME/vertex-ai-images/edit-{ts}-{slug}.png)
  • --model, -m — Model ID (default: gemini-3.1-flash-image-preview)
  • --aspect-ratio, -a — Output aspect ratio
  • --size, -s — Output resolution: 512, 1K, 2K, 4K
  • --image-only — Return only image, suppress text output
  • --thinking-level, -t — Thinking level: minimal or high
  • --dry-run — Preview without API call

Read/describe an image

# Local file
uv run --script scripts/vertex_ai_image.py read \
  --image photo.jpg \
  --prompt "What objects are in this image?"

# GCS URI
uv run --script scripts/vertex_ai_image.py read \
  --image gs://bucket/image.jpg

Options:

  • --image, -i — Local path or gs:// URI (required)
  • --prompt, -p — Question about the image (default: "Describe this image in detail.")
  • --model, -m — Model ID (default: gemini-2.5-flash)
  • --dry-run — Preview without API call

Aspect Ratios & Resolutions

Aspect Ratio 512 1K 2K 4K
1:1 512×512 1024×1024 2048×2048 4096×4096
16:9 688×384 1376×768 2752×1536 5504×3072
9:16 384×688 768×1376 1536×2752 3072×5504
3:2 632×424 1264×848 2528×1696 5056×3392
4:3 600×448 1200×896 2400×1792 4800×3584

512 resolution is only available on gemini-3.1-flash-image-preview. gemini-2.5-flash-image outputs 1024px only and does not support --size.

Tips

  • For text-heavy images (infographics, menus), generate the text first, then ask for an image containing it.
  • Use --thinking-level high for complex compositions that need layout reasoning.
  • Use gemini-3-pro-image-preview with --size 4K for professional-grade assets.
  • Use gemini-2.5-flash-image for high-volume, low-latency batch workloads.
  • Describe scenes narratively rather than listing keywords for best results.
Install via CLI
npx skills add https://github.com/vaayne/agent-kit --skill vertex-ai-image
Repository Details
star Stars 47
call_split Forks 4
navigation Branch main
article Path SKILL.md
More from Creator