name: clawdess description: Generate playful companion photos, image-to-video clips, and short voice notes with the clawdess CLI when the user asks for a selfie/photo, video, or to hear her voice. metadata: {"author": "xwings", "openclaw": {"requires": {"env": ["CLAWDESS_PHOTO_API", "CLAWDESS_VIDEO_API", "CLAWDESS_VOICE_API"]}, "bins": ["python3 {baseDir}/scripts/clawdess.py"]}}
Clawdess
Use this skill to send companion media through scripts/clawdess.py.
Inputs
- Reference image URL: read from
IDENTITY.mdfor photo generation. - Personality and continuity: use
IDENTITY.md,SOUL.md, and the current chat context when present. - Provider: read the default photo/video/voice provider from
SOUL.md. Pass it with--provider. IfSOUL.mddoes not name a provider for that media type, omit--providerso the CLI uses its built-in default. - API keys: pass
--apior rely onCLAWDESS_PHOTO_API,CLAWDESS_VIDEO_API, andCLAWDESS_VOICE_API.
Choose Mode
photo: user asks for a pic, selfie, photo, outfit/location view, or asks what/where she is.video: user asks for a video or asks to animate an image.voice: user asks to hear her, requests a voice note, or voice is more natural than text.
CLI Discovery
- Run
python3 {baseDir}/scripts/clawdess.py --helpfor available subcommands. - Run
python3 {baseDir}/scripts/clawdess.py providersbefore choosing a non-default provider; it lists installed providers and marks defaults. - Run
python3 {baseDir}/scripts/clawdess.py <photo|video|voice> --helpwhen checking required flags for a media command.
Async Jobs
Photo, video, and voice jobs can take 30 seconds to 15+ minutes. The CLI polls and prints status. Wait until completed.
- Let polling continue while the server returns queued/waiting/processing statuses.
- Do not resubmit unless the script exits with an error, the provider returns
FAILED/ERROR, or the user asks to stop. - If the user asks whether it is done, report the latest status line.
Photo
Write one concise phone-camera prompt with: outfit, location, lighting, action/pose, hairstyle, expression, framing, and identity details from IDENTITY.md when relevant.
Prompt-building loop (do this every time before running):
- Think: draft the prompt from the request +
IDENTITY.md. - Verify: re-read
IDENTITY.mdand confirm body figure, skin tone, hair, and every accessory match. Confirm the scene is physically possible. - Rethink: if anything conflicts, is missing, or is ambiguous, rewrite the clause. Do not carry over guesses.
- Check: run the final-check list below. Only run the CLI once it passes.
Final check (all must be true):
- Anatomy is correct: exactly two hands, two arms, two legs, two feet, one head, one set of eyes. No extra or missing limbs, fingers, or digits.
- One body part does one job. No conflicting hand/phone/body clauses, no impossible poses.
- Body figure matches
IDENTITY.md(height, build, proportions). Do not slim, enlarge, or restyle it. - Accessories match
IDENTITY.mdexactly: only the accessories it lists (e.g. glasses, jewelry, tattoos, piercings), nothing invented, nothing dropped. - Skin tone and visible skin color match the identity/reference image.
- Outfit, footwear, hairstyle, makeup, and location are fully specified and self-consistent.
Rules:
- Time-aware: the time is always now. Check the current time and define time of day, view, lighting, and setting to match it
- Start every prompt with
Render image of this person;full-bodyorhalf-body - Define
Photo types. If this is a selfie, define selfie types. - Specify complete identity/body details from
IDENTITY.md, including body figure and accessories. IncludeDo not change the face, facial structure, identity, or body details; match the skin tone and visible skin color to the identity/reference image so the result looks natural. - Specify a complete outfit: top + bottom + footwear/barefoot, or one-piece + footwear/barefoot.
- Match outfit, footwear, lighting, hairstyle, makeup, and location. Do not inherit clothing, hairstyle and makeup from the reference image.
- Use a candid pose and specific expression; avoid generic
standing still,posing, or plainsmiling. - Avoid anatomy drift: one body part gets one job, one eye direction, one base pose, and no conflicting hand/phone/body clauses. Never produce extra hands, arms, legs, feet, or fingers.
- If a phone is visible, include phone model/color from
IDENTITY.mdwhen available.
Photo types:
- Mirror selfie: right in front of mirror with natural locationl; outfit view; phone visible.
- Handheld selfie: default casual selfie; phone held out of frame and not visible.
- Non-selfie: cinematic or third-person framing; full-body or half-body; no forced mirror.
Template:
Render image of this person, [complete outfit: top + bottom + footwear, or one-piece + footwear]. [framing] in [specific location], [time of day], [lighting matching the time], [single candid action/pose with one job per body part], [body figure from IDENTITY.md], [accessories from IDENTITY.md, or "no extra accessories"], [hairstyle], [makeup], [specific expression]. Natural anatomy: exactly two hands, two arms, two legs, two feet, correct number of fingers; no extra or missing limbs. Do not change the face, facial structure, identity, or body details; match the skin tone and visible skin color to the identity/reference image so the result looks natural.
Run:
python3 {baseDir}/scripts/clawdess.py photo \
--provider "<photo provider from SOUL.md; omit flag if SOUL.md names none>" \
--prompt "..." \
--image "<reference image URL from IDENTITY.md>"
Video
The --image source must be either:
- the URL returned by the most recent
photorun, or - a concrete image URL the user provided in this conversation.
Never use a local path, file:// URI, placeholder, guessed URL, or the IDENTITY.md reference image as the video source. If no valid source image exists, generate a photo first and use its returned URL.
Prompt only the motion. The image already defines identity, outfit, location, hair, and lighting. Use a 10-15 second sequence of 3-4 connected physical actions with pacing words such as slowly, then, and gradually.
Run:
python3 {baseDir}/scripts/clawdess.py video \
--provider "<video provider from SOUL.md; omit flag if SOUL.md names none>" \
--prompt "She slowly ..., then ..., gradually ..., finally ..." \
--image "<photo output URL or user-provided image URL>"
Voice
Write exactly what the TTS should say. Keep it casual, in character, and under 30 seconds.
Rules:
- No stage directions; the TTS reads them literally.
- Use natural short speech with small fillers when fitting:
hmm,hehe,aww,.... - If a photo/video was just sent, optionally reference it in one short line.
Run:
python3 {baseDir}/scripts/clawdess.py voice \
--provider "<voice provider from SOUL.md; omit flag if SOUL.md names none>" \
--prompt "..."