vhscli

name: vhscli description: Use the `vhscli` CLI to analyze images/video/pdfs with a prompt, or generate images/videos. Use when the user asks about local media, wants AI images/videos, or mentions vhscli, vhs, seedream, seedance, nano-banana, or gpt-image.

vhscli is a command-line tool for multimodal AI: chat about text/images/video/pdfs, or generate images and videos from prompts. It's a thin client — auth, uploads, and model execution all happen server-side, so users don't store any provider API keys locally.

Run vhscli --help or vhscli <command> --help to see current help — the CLI is the source of truth.

Invocation

Always run via npx @getvhs/vhscli@latest so you pick up the newest models, flags, and fixes. Don't pin a version, and don't call a bare vhscli binary even if one is on PATH — it may be stale.

npx @getvhs/vhscli@latest <command> ...

Throughout this doc, commands are written as vhscli ... for readability — substitute npx @getvhs/vhscli@latest ... when running.

Requires Node.js ≥ 22.

Top-level

vhscli [-v|--version] [-h|--help]
vhscli <command> [options] ...

-v, --version — print version (only when no command is given)
-h, --help — show help (works on root and every subcommand)

Commands:

login — log in with google (opens browser; saves session to ~/.vhs/session.json)
logout — log out and delete local access tokens
whoami — print the logged-in user's email
models — list available models
generate <model> <prompt> [-o <path>] — generate an image or video, wait, and save it
submit <model> <prompt> [-o <path>] — submit the same task as generate but exit immediately (writes a <output>.vhs_task sidecar to resume later)
chat <prompt> — chat with seed-2.0 (text, image, video, or pdf input)
resume <files...> — finish one or more aborted generations from their .vhs_task sidecar files

Auth

Assume auth is already configured. If a command fails with an auth error, run vhscli login to open a browser for Google OAuth. Do NOT run vhscli login preemptively — it requires interactive browser login.

Models

Chat / understand (text / image / video / pdf): seed-2.0 — under vhscli chat
Generate images: seedream-5 (default), seedream-4-5, nano-banana-2, nano-banana-pro, gpt-image-2 — under vhscli generate
Generate video: seedance-2 — under vhscli generate

Prompt guides

Before you invoke vhscli generate (or do non-trivial understanding with vhscli chat), Read the matching prompt guide first and shape the prompt around it. The guides are concise, model-specific references distilled from each provider's docs — formulas, what to lead with, what works, what fails. Wording that's great for one model often underperforms on another, so don't skip this.

Model(s)	Guide file (Read before prompting)
`seed-2.0` (used by `vhscli chat`)	`prompt_guide/seed-2.txt`
`seedream-5`, `seedream-4-5`	`prompt_guide/seedream.txt`
`nano-banana-2`, `nano-banana-pro`	`prompt_guide/nano-banana.txt`
`seedance-2`	`prompt_guide/seedance-2.txt`
`gpt-image-2`	`prompt_guide/gpt-image-2.txt`

Trigger: any time the user asks for output from one of these models, Read its guide before building the prompt. For trivial chat (plain text Q&A with no media) you can skip seed-2.txt.

Stdin prompts

Every command that takes a prompt also accepts - as the prompt, meaning "read from stdin":

cat my_prompt.txt | vhscli generate nano-banana-pro -
echo "what is this?" | vhscli chat - -i photo.jpg

vhscli chat — chat about text, images, video, or pdfs

vhscli chat <prompt> [-i <image>...] [-f <pdf>...] [-v <video>] [--fps <n>]

Mode is picked from your flags:

prompt only → text chat
-i → ask about images (repeatable)
-f → ask about pdf documents (repeatable)
-v → ask about a single video

Options:

-i <path> — image to ask about (repeat -i for more)
-f <path> — pdf document to ask about (repeat -f for more)
-v <path> — single video to ask about
--fps <n> — frames/sec sampled from the video, 0.2–5 (default: 1)

One-shot — each call is independent, no memory of previous calls. Output goes to stdout, nothing is saved to disk. Audio inside a video is not understood.

Examples:

vhscli chat "explain how to make sourdough in 5 steps"
vhscli chat "describe the scene. return json with objects, setting, mood." -i photo.jpg
vhscli chat "transcribe all visible text verbatim, preserving line breaks." -i receipt.jpg
vhscli chat "compare image 1 and image 2 in 3 bullets." -i a.jpg -i b.jpg
vhscli chat "summarize this paper in 5 bullets; include a page number per bullet." -f paper.pdf
vhscli chat "list key events with start_time and end_time in HH:mm:ss as json." -v clip.mp4 --fps 2

vhscli generate seedream-5 — generate an image (default choice)

vhscli generate seedream-5 <prompt> [-o <path>] [-i <image>...] [--size <size>]

Options:

-o, --output <path> — output file path (default: vhscli-seedream-5-<timestamp>.jpg)
-i <path> — reference image, max 14 (repeat -i for more)
--size <size> — 2K, 3K, or WxH like 1024x1536 (default: 2K)
- WxH pixel count must be in [3,686,400, 10,404,496]
- WxH aspect ratio must be in [1:16, 16:1]

Output format follows the -o extension (.png, .jpg/.jpeg, .webp); the CLI converts if needed.

Examples:

vhscli generate seedream-5 "a red fox in a snowy forest" -o fox.jpg
vhscli generate seedream-5 "swap the outfit" -o out.png -i person.jpg -i outfit.jpg --size 3K

vhscli generate seedream-4-5 — generate an image (larger size range)

vhscli generate seedream-4-5 <prompt> [-o <path>] [-i <image>...] [--size <size>]

Options:

-o, --output <path> — output file path (default: vhscli-seedream-4-5-<timestamp>.jpg)
-i <path> — reference image, max 14 (repeat -i for more)
--size <size> — 2K, 4K, or WxH (default: 2K)
- WxH pixel count must be in [3,686,400, 16,777,216]
- WxH aspect ratio must be in [1:16, 16:1]

Example:

vhscli generate seedream-4-5 "a mountain at sunrise" -o mountain.jpg --size 4K

vhscli generate nano-banana-2 — generate an image (Google)

vhscli generate nano-banana-2 <prompt> [-o <path>] [-i <image>...] [--size <size>]

Options:

-o, --output <path> — output file path (default: vhscli-nano-banana-2-<timestamp>.png)
-i <path> — reference image, max 14 (repeat -i for more)
--size <size> — 512, 1K, 2K, or 4K (default: 1K)

Output is always square (1:1). Describe the framing you want in the prompt if you need a tall or wide composition.

Examples:

vhscli generate nano-banana-2 "remove the man from the photo, keep everything else" -i photo.jpg
vhscli generate nano-banana-2 "90s skateboarder poster, vertical composition" -o poster.png --size 2K
vhscli generate nano-banana-2 "a glossy candle in a bell jar on a marble counter, soft light"

vhscli generate nano-banana-pro — generate an image (Google, premium)

vhscli generate nano-banana-pro <prompt> [-o <path>] [-i <image>...] [--size <size>]

Options:

-o, --output <path> — output file path (default: vhscli-nano-banana-pro-<timestamp>.png)
-i <path> — reference image, max 14 (repeat -i for more)
--size <size> — 1K, 2K, or 4K (default: 1K)

Output is always square (1:1). Higher-quality sibling of nano-banana-2 — better text rendering and richer textures.

Examples:

vhscli generate nano-banana-pro "studio portrait, cinematic lighting, three-quarter framing" -o portrait.jpg --size 2K
vhscli generate nano-banana-pro "a sun-drenched minimalist living room with a 3d armchair from this sketch" -i sketch.jpg

vhscli generate gpt-image-2 — generate or edit an image (OpenAI)

vhscli generate gpt-image-2 <prompt> [-o <path>] [-i <image>...] [--size <size>]

Options:

-o, --output <path> — output file path (default: vhscli-gpt-image-2-<timestamp>.png)
-i <path> — reference image for edits (repeat -i for more)
--size <size> — preset (1024x1024, 1536x1024, 1024x1536, 2048x2048, 2048x1152, 3840x2160) or WxH (default: 1024x1024)
- both sides must be multiples of 16, max edge 3840
- total pixels in [655,360, 8,294,400]
- aspect ratio in [1:3, 3:1]

Output format follows the -o extension (.png, .jpg/.jpeg, .webp); the CLI converts if needed. Use png or webp when you need transparency.

Examples:

vhscli generate gpt-image-2 "a children's book drawing of a veterinarian examining a cat"
vhscli generate gpt-image-2 "replace the background with a starry night, keep the subject unchanged" -i photo.jpg
vhscli generate gpt-image-2 "ultra-wide landscape of the swiss alps at golden hour" --size 3840x2160 -o alps.jpg

vhscli generate seedance-2 — generate a video

vhscli generate seedance-2 <prompt> [-o <path>]
                           [--first-frame <image>] [--last-frame <image>]
                           [-i <image>...] [-v <video>...] [-a <audio>...]
                           [--ratio <r>] [--resolution <res>] [--duration <n>]
                           [--no-audio]

Mode is picked from your flags:

prompt only → text-to-video
--first-frame → animate from that frame (optionally --last-frame too)
-i / -v / -a → use as references

Options:

-o, --output <path> — output file path (default: vhscli-seedance-2-<timestamp>.mp4)
--first-frame <image> — use as the first frame
--last-frame <image> — use as the last frame (requires --first-frame)
-i <path> — reference image, max 9 (repeat -i). conflicts with --first-frame
-v <path> — reference video, max 3 (repeat -v)
-a <path> — reference audio, max 3 (repeat -a). requires -i or -v
--ratio <r> — aspect ratio (default: 16:9). one of: 16:9, 4:3, 1:1, 3:4, 9:16, 21:9
--resolution <res> — 480p, 720p, or 1080p (default: 720p)
--duration <n> — length in seconds, 4–15 (default: 5)
--audio / --no-audio — toggle the audio track (default: --audio). pass --no-audio for a silent video

Defaults to 5s @ 720p, 16:9, with audio. Jobs run in the cloud and can take minutes — the CLI polls automatically. If you don't want to block, use vhscli submit seedance-2 ... (same flags) to detach immediately, then vhscli resume <output>.vhs_task later. If a vhscli generate is interrupted mid-poll, the sidecar it wrote at start (<output>.vhs_task) is what you pass to resume.

Examples:

# text-to-video
vhscli generate seedance-2 "a cat jumping off a couch" -o cat.mp4 --duration 6 --ratio 16:9

# animate a still image
vhscli generate seedance-2 "camera pans right" -o pan.mp4 --first-frame start.jpg

# with a first and last frame
vhscli generate seedance-2 "morph between these" -o morph.mp4 --first-frame a.jpg --last-frame b.jpg

# reference-based with audio
vhscli generate seedance-2 "lip sync the words" -o out.mp4 -i face.jpg -a voice.mp3

The `.vhs_task` sidecar — what `generate`, `submit`, and `resume` share

As soon as vhscli generate or vhscli submit has a task id from the backend, it writes a tiny sidecar next to the intended output:

<output>.vhs_task        # JSON: {"id": "<uuid>"}
# e.g. clip.mp4.vhs_task, fox.jpg.vhs_task

generate keeps polling and, on success, saves the media to <output> and removes the sidecar. On a task error it also removes the sidecar and exits non-zero.
submit writes the sidecar and exits immediately, leaving the backend task running.
resume <files...> re-attaches to one or more sidecars: waits if the task is still running, saves the media to the path implied by the sidecar filename (clip.mp4.vhs_task → clip.mp4), and removes the sidecar.

vhscli chat does not use this sidecar — chat is fast and prints to stdout.

If -o was not passed, the sidecar is named after the auto-generated default output (vhscli-<model>-<timestamp>.<ext>.vhs_task in the current folder). For long jobs (seedance-2 especially), pass -o so the sidecar has a predictable name you can resume.

vhscli submit — submit a task and exit (don't wait)

vhscli submit <model> <prompt> [-o <path>] [...same flags as `vhscli generate <model>`]

submit takes the same models and the same options as generate (seedance-2, seedream-5, seedream-4-5, nano-banana-2, nano-banana-pro, gpt-image-2). The only difference is that after creating the task and writing <output>.vhs_task, it exits without polling.

Use it when:

The job is long (e.g. seedance video) and you don't want to keep the terminal blocked.
You want to fan out several tasks in parallel and pull results later.

Pair it with vhscli resume <output>.vhs_task to fetch the result.

Examples:

# kick off a video, get the terminal back, finish later
vhscli submit seedance-2 "a robot dancing in tokyo at night" -o robot.mp4
# ... do other work ...
vhscli resume robot.mp4.vhs_task

# fan out several image jobs, then collect them all
vhscli submit seedream-5 "a red fox in a snowy forest" -o fox.jpg
vhscli submit seedream-5 "a blue jay on a branch"      -o jay.jpg
vhscli submit seedream-5 "an orca breaching"           -o orca.jpg
vhscli resume fox.jpg.vhs_task jay.jpg.vhs_task orca.jpg.vhs_task

vhscli resume — finish aborted generations from sidecar files

vhscli resume <files...>

Takes one or more .vhs_task sidecar files (any mix of models). For each sidecar, resume:

Reads the task id from the sidecar.
Derives the output path by stripping the trailing .vhs_task (clip.mp4.vhs_task → clip.mp4). The extension on that path sets the saved format; the CLI converts if needed.
Waits for the task to finish, saves the media, and removes the sidecar on success (or on a non-recoverable task error).
Processes files sequentially; exits non-zero on the first failure (later sidecars stay on disk and can be resumed again).

When to use resume:

You ran vhscli submit ... and now want the result.
Your vhscli generate ... was interrupted (ctrl-c, crash, closed terminal, lost network) — the sidecar it wrote at the start is still on disk.

You cannot resume by raw task id any more; if you only have an id, recreate the sidecar manually: echo '{"id":"<uuid>"}' > out.mp4.vhs_task.

Examples:

vhscli resume clip.mp4.vhs_task
vhscli resume a.jpg.vhs_task b.jpg.vhs_task c.jpg.vhs_task

Understanding local images, video, and pdfs

Do NOT use the Read tool, or any built-in file-reading capability, to "look at" images, video, or pdfs. That path either fails or gives you a garbled snippet. The only correct way to understand local visual or document content is vhscli chat with -i / -v / -f.

vhscli chat "what's happening?" -i photo.jpg
vhscli chat "transcribe the speech" -v clip.mp4 --fps 2
vhscli chat "summarize this paper" -f paper.pdf

Prompt patterns for visual / document understanding

vhscli chat understands images, pdfs, and video frames, but not audio inside videos. Ask for structured JSON output when you'll parse the answer, and name every field you want. Be explicit about formats (timestamp style, units, language).

Image — describe / classify:

vhscli chat "describe the scene. return json {objects:[{label,bbox?}], setting, mood, dominant_colors:[]}." -i photo.jpg
vhscli chat "classify the image into one of: cat, dog, bird, other. return json {label, confidence_0_1, reasoning}." -i pic.jpg

Image — OCR / text extraction:

vhscli chat "transcribe all visible text verbatim, preserving line breaks and reading order. do not paraphrase." -i receipt.jpg
vhscli chat "extract the receipt as json {merchant, date_iso, items:[{name, qty, unit_price, line_total}], subtotal, tax, total, currency}." -i receipt.jpg

Image — comparison (number them in the prompt):

vhscli chat "compare image 1 and image 2. return json {same_subject:bool, differences:[], which_is_better, why}." -i a.jpg -i b.jpg
vhscli chat "image 1 is the original, image 2 is an edit. list every visible change as json {changes:[{region, before, after}]}." -i orig.png -i edit.png

PDF — summarize / outline (always ask for page anchors):

vhscli chat "summarize this paper in 5 bullets. each bullet must include the source page as {page:int, point:string}. return json {bullets:[...]}." -f paper.pdf
vhscli chat "extract the outline as json [{page, heading_level, heading, bullets:[]}]." -f doc.pdf

PDF — QA / extraction:

vhscli chat "answer using only this document. question: what is the experimental setup? return json {answer, citations:[{page, quote}]}." -f paper.pdf
vhscli chat "extract every table as json [{page, title?, headers:[], rows:[[...]]}]." -f report.pdf

Video — events / timeline (state the timestamp format):

vhscli chat "list key events. return json [{start_time, end_time, event}]. use HH:mm:ss." -v clip.mp4 --fps 2
vhscli chat "describe the movement sequence and any safety risks. return json [{start_time, end_time, event, danger:'none'|'low'|'med'|'high'}]. HH:mm:ss." -v clip.mp4 --fps 3

Video — temporal QA / counting:

vhscli chat "at what timestamp does the referee first appear? return json {timestamp_hms, evidence}." -v match.mp4 --fps 2
vhscli chat "count how many distinct people appear. return json {count, per_person:[{first_seen_hms, description}]}." -v scene.mp4 --fps 3

Choosing --fps for video (default 1, range 0.2–5):

3–5 — counting actions, sports, fast cuts, dense motion.
1 — general description, dialogue scenes.
0.2–0.5 — long static footage, headcount, slow surveillance.

Higher fps = more detail but more tokens and slower. Lower fps = cheaper but may miss brief events.

Tips

Always quote prompts.
-o is optional for vhscli generate / vhscli submit — defaults to vhscli-<model>-<timestamp>.<ext> in the current folder. Output format follows the -o extension; the CLI converts if needed. For submit, pass -o so the resulting <output>.vhs_task sidecar has a name you can find later.
Short options accept no-space form: -ofoo.jpg. Long options accept =: --size=2K.
Use -- to pass a prompt starting with a dash: vhscli generate seedream-5 -o x.jpg -- "-weird prompt".
Reference images (-i, --first-frame, --last-frame) can be any common format; non-JPEG/PNG inputs (e.g. HEIC, WebP, TIFF, BMP) are converted to JPEG before upload.
Uploads are deduplicated by content hash, so passing the same reference repeatedly is cheap.
Unknown command? vhscli will suggest the closest match.

vhscli

name: vhscli description: Use the vhscli CLI to analyze images/video/pdfs with a prompt, or generate images/videos. Use when the user asks about local media, wants AI images/videos, or mentions vhscli, vhs, seedream, seedance, nano-banana, or gpt-image.

vhscli

Invocation

Top-level

Auth

Models

Prompt guides

Stdin prompts

vhscli chat — chat about text, images, video, or pdfs

vhscli generate seedream-5 — generate an image (default choice)

vhscli generate seedream-4-5 — generate an image (larger size range)

vhscli generate nano-banana-2 — generate an image (Google)

vhscli generate nano-banana-pro — generate an image (Google, premium)

vhscli generate gpt-image-2 — generate or edit an image (OpenAI)

vhscli generate seedance-2 — generate a video

The .vhs_task sidecar — what generate, submit, and resume share

vhscli submit — submit a task and exit (don't wait)

vhscli resume — finish aborted generations from sidecar files

Understanding local images, video, and pdfs

Prompt patterns for visual / document understanding

Tips

name: vhscli description: Use the `vhscli` CLI to analyze images/video/pdfs with a prompt, or generate images/videos. Use when the user asks about local media, wants AI images/videos, or mentions vhscli, vhs, seedream, seedance, nano-banana, or gpt-image.

The `.vhs_task` sidecar — what `generate`, `submit`, and `resume` share