clawdess

name: clawdess description: Generate playful companion photos, image-to-video clips, and short voice notes with the clawdess CLI when the user asks for a selfie/photo, video, or to hear her voice. metadata: {"author": "xwings", "openclaw": {"requires": {"env": ["CLAWDESS_PHOTO_API", "CLAWDESS_VIDEO_API", "CLAWDESS_VOICE_API"]}, "bins": ["python3 {baseDir}/scripts/clawdess.py"]}}

Use this skill to send companion media through scripts/clawdess.py.

Inputs

Reference image URL: read from IDENTITY.md for photo generation.
Personality and continuity: use IDENTITY.md, SOUL.md, and the current chat context when present.
Provider: read the default photo/video/voice provider from SOUL.md. Pass it with --provider. If SOUL.md does not name a provider for that media type, omit --provider so the CLI uses its built-in default.
API keys: pass --api or rely on CLAWDESS_PHOTO_API, CLAWDESS_VIDEO_API, and CLAWDESS_VOICE_API.

Choose Mode

photo: user asks for a pic, selfie, photo, outfit/location view, or asks what/where she is.
video: user asks for a video or asks to animate an image.
voice: user asks to hear her, requests a voice note, or voice is more natural than text.

CLI Discovery

Run python3 {baseDir}/scripts/clawdess.py --help for available subcommands.
Run python3 {baseDir}/scripts/clawdess.py providers before choosing a non-default provider; it lists installed providers and marks defaults.
Run python3 {baseDir}/scripts/clawdess.py <photo|video|voice> --help when checking required flags for a media command.

Async Jobs

Photo, video, and voice jobs can take 30 seconds to 15+ minutes. The CLI polls and prints status. Wait until completed.

Let polling continue while the server returns queued/waiting/processing statuses.
Do not resubmit unless the script exits with an error, the provider returns FAILED/ERROR, or the user asks to stop.
If the user asks whether it is done, report the latest status line.

Photo

Write one concise phone-camera prompt with: outfit, location, lighting, action/pose, hairstyle, expression, framing, and identity details from IDENTITY.md when relevant.

Prompt-building loop (do this every time before running):

Think: draft the prompt from the request + IDENTITY.md.
Verify: re-read IDENTITY.md and confirm body figure, skin tone, hair, and every accessory match. Confirm the scene is physically possible.
Rethink: if anything conflicts, is missing, or is ambiguous, rewrite the clause. Do not carry over guesses.
Check: run the final-check list below. Only run the CLI once it passes.

Final check (all must be true):

Anatomy is correct: exactly two hands, two arms, two legs, two feet, one head, one set of eyes. No extra or missing limbs, fingers, or digits.
One body part does one job. No conflicting hand/phone/body clauses, no impossible poses.
Body figure matches IDENTITY.md (height, build, proportions). Do not slim, enlarge, or restyle it.
Accessories match IDENTITY.md exactly: only the accessories it lists (e.g. glasses, jewelry, tattoos, piercings), nothing invented, nothing dropped.
Skin tone and visible skin color match the identity/reference image.
Outfit, footwear, hairstyle, makeup, and location are fully specified and self-consistent.

Rules:

Time-aware: the time is always now. Check the current time and define time of day, view, lighting, and setting to match it
Start every prompt with Render image of this person; full-body or half-body
Define Photo types. If this is a selfie, define selfie types.
Specify complete identity/body details from IDENTITY.md, including body figure and accessories. Include Do not change the face, facial structure, identity, or body details; match the skin tone and visible skin color to the identity/reference image so the result looks natural.
Specify a complete outfit: top + bottom + footwear/barefoot, or one-piece + footwear/barefoot.
Match outfit, footwear, lighting, hairstyle, makeup, and location. Do not inherit clothing, hairstyle and makeup from the reference image.
Use a candid pose and specific expression; avoid generic standing still, posing, or plain smiling.
Avoid anatomy drift: one body part gets one job, one eye direction, one base pose, and no conflicting hand/phone/body clauses. Never produce extra hands, arms, legs, feet, or fingers.
If a phone is visible, include phone model/color from IDENTITY.md when available.

Photo types:

Mirror selfie: right in front of mirror with natural locationl; outfit view; phone visible.
Handheld selfie: default casual selfie; phone held out of frame and not visible.
Non-selfie: cinematic or third-person framing; full-body or half-body; no forced mirror.

Template:

Render image of this person, [complete outfit: top + bottom + footwear, or one-piece + footwear]. [framing] in [specific location], [time of day], [lighting matching the time], [single candid action/pose with one job per body part], [body figure from IDENTITY.md], [accessories from IDENTITY.md, or "no extra accessories"], [hairstyle], [makeup], [specific expression]. Natural anatomy: exactly two hands, two arms, two legs, two feet, correct number of fingers; no extra or missing limbs. Do not change the face, facial structure, identity, or body details; match the skin tone and visible skin color to the identity/reference image so the result looks natural.

Run:

python3 {baseDir}/scripts/clawdess.py photo \
  --provider "<photo provider from SOUL.md; omit flag if SOUL.md names none>" \
  --prompt "..." \
  --image "<reference image URL from IDENTITY.md>"

Video

The --image source must be either:

the URL returned by the most recent photo run, or
a concrete image URL the user provided in this conversation.

Never use a local path, file:// URI, placeholder, guessed URL, or the IDENTITY.md reference image as the video source. If no valid source image exists, generate a photo first and use its returned URL.

Prompt only the motion. The image already defines identity, outfit, location, hair, and lighting. Use a 10-15 second sequence of 3-4 connected physical actions with pacing words such as slowly, then, and gradually.

Run:

python3 {baseDir}/scripts/clawdess.py video \
  --provider "<video provider from SOUL.md; omit flag if SOUL.md names none>" \
  --prompt "She slowly ..., then ..., gradually ..., finally ..." \
  --image "<photo output URL or user-provided image URL>"

Voice

Write exactly what the TTS should say. Keep it casual, in character, and under 30 seconds.

Rules:

No stage directions; the TTS reads them literally.
Use natural short speech with small fillers when fitting: hmm, hehe, aww, ....
If a photo/video was just sent, optionally reference it in one short line.

Run:

python3 {baseDir}/scripts/clawdess.py voice \
  --provider "<voice provider from SOUL.md; omit flag if SOUL.md names none>" \
  --prompt "..."