name: generating-videos description: Generates videos using Grok. Supports text-to-video and image-to-video. AI视频生成、文生视频、图生视频。
Video Generation
Generates videos using Grok.
Model Selection Strategy
Use grok-imagine-video for video generation.
| Scenario | Use Model | Reason |
|---|---|---|
| Video 1-15s | grok-imagine-video |
Grok supports direct 1-15s generation |
| Text-to-video | grok-imagine-video |
Prompt-only generation |
| Image-to-video | grok-imagine-video |
Uses one reference image |
| Video >15s | Generate multiple clips, then load editing-videos to concatenate |
Grok max duration is 15s |
Language Rule
Video prompts MUST follow the user's language. If the user writes in Chinese, generate Chinese prompts; if in English, generate English prompts. If the user explicitly requests a specific language, use that language.
Grok Video Model
| Model | Speed | Quality | Max Duration | Use Case |
|---|---|---|---|---|
grok-imagine-video |
Fast | Good | 15s | Default video generation |
Parameters
| Parameter | Values | Default | Description |
|---|---|---|---|
duration |
1-15 | model default | Video duration in seconds |
resolution |
480p, 720p |
720p |
Video resolution |
aspectRatio |
1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3 |
9:16 |
Video aspect ratio |
imageUrl |
URL | none | Reference image URL for image-to-video |
Workflow
- Call
generateVideoWithGrokwith prompt and optional parameters. - Poll
getGrokVideoStatusevery 30 seconds. - Maximum wait: 5 minutes.
- Save completed output to draft unless the user explicitly asks for media library only.
Prompt Structure
[Subject & Background] + [Action] + [Style] + [Camera] + [Atmosphere] + [Audio]
Subject & Background
Specify the main focus and environmental context.
Examples:
- "A young woman with long black hair wearing a red dress"
- "White concrete apartment building with organic shapes and lush greenery"
Action
Describe what the subject is doing.
Examples:
- "walks slowly towards the camera"
- "transforms from liquid to solid"
Style
Add aesthetic direction.
Examples:
- Film noir
- Surrealism
- Cyberpunk
- 3D cartoon animation
- Cinematic documentary
Camera
Describe movement and framing.
Examples:
- "slow dolly-in"
- "handheld tracking shot"
- "close-up with shallow depth of field"
Atmosphere
Describe lighting, color, mood, and setting.
Examples:
- "warm sunset lighting"
- "misty morning atmosphere"
- "high contrast studio lighting"
Audio
Include useful audio cues when relevant.
Examples:
- "soft ambient city sounds"
- "gentle piano music"
- "footsteps on wet pavement"
Aspect Ratio Guidance
| Platform / Use | Aspect Ratio |
|---|---|
| TikTok, Reels, Shorts, mobile social video | 9:16 |
| YouTube landscape, website hero, presentation | 16:9 |
| Profile or square feed asset | 1:1 |
| Portrait-style content | 3:4 or 2:3 |
| Landscape social feed | 4:3 or 3:2 |
Long Video Strategy
For videos longer than 15 seconds:
- Split the video into independent 15-second-or-shorter segments.
- Generate each segment with
generateVideoWithGrok. - Poll every task until complete.
- Load
editing-videos. - Use the video editing tool to concatenate the completed segments.
Keep prompts visually consistent across segments by repeating subject, style, lighting, camera language, and environment details.