Choose image, video, and audio modes, defaults, and CLI-exposed parameters.
Model & Parameter Selection
Use this page to choose the model path and settings you can actually control
from makefx today. For prompting strategy, start with
Media Playbooks.
Images
Make Effects routes image jobs through Google's Nano Banana image models and stores the exact provider model ID in each generated image recipe.
| Selection | Exact model ID | Use for | Notes |
|---|---|---|---|
pro | gemini-3-pro-image-preview | assets you expect to reuse, compare, or hand off | default; supports up to 14 references |
flash | gemini-2.5-flash-image | quick drafts with one reference or no reference | supports 1 reference and 1K output only |
Image parameters
| Parameter | Values | Guidance |
|---|---|---|
| Aspect ratio | 1:1, 16:9, 9:16, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 21:9 | choose for destination; default image generation is square when omitted |
| Image size | 1K, 2K, 4K | Pro supports all three; Flash supports 1K only |
| Reference images | Pro supports up to 14; Flash supports 1 | label each reference by role |
| Operation | generate, refine, derive | generate from text, edit an existing variant, or compose from references |
Use 16:9 or 21:9 for keyframes and backgrounds, 9:16 for vertical clips, 1:1 for icons and tiles, and portrait ratios for character cards.
Video
Make Effects routes video jobs through Google's Veo 3.1 family. The public CLI lets you set prompt, references, resolution, provider duration, model tier, audio, and production metadata.
| Tier | Exact model ID | Use for |
|---|---|---|
generate | veo-3.1-generate-preview | clips you expect to review, place on a timeline, or ship |
fast | veo-3.1-fast-generate-preview | cheaper iteration path |
lite | veo-3.1-lite-generate-preview | draft path for background motion tests |
Video parameters
| Parameter | Values | Guidance |
|---|---|---|
| Aspect ratio | 16:9, 9:16 | other values normalize to landscape behavior |
| Resolution | 720p, 1080p, 4k | pick 720p for tests; 4k requires the generate or fast tier |
| Provider duration | 4, 6, 8 seconds | not controlled by --duration-ms; use CLI --duration or the web duration control |
| Tier | generate, fast, lite | use generate for final clips, fast/lite for iteration |
| References | up to 3 source images/keyframes | one unstyled image uses image-to-video; two unstyled images use first/last frames; any style image or 3 images uses reference-image mode |
The CLI --duration-ms flag records where the clip fits on your production timeline. It does not set the generated clip length; use --duration 4|6|8 for provider duration.
Audio
Choose one audio mode for each request.
| Mode | Default use | CLI |
|---|---|---|
speech | one narrator or voiceover | makefx audio speech generate |
dialogue | multi-speaker script | makefx audio dialogue generate |
music | bed, cue, sting, loop | makefx audio music generate |
sfx | one-off sound effect | makefx audio sfx generate |
Speech and dialogue depend on voice selection and provider configuration. Treat
the voice as a reusable reference for identity. Production can use ElevenLabs;
music requests may opt into Lyria with --provider lyria. Stage and local
environments may use fake providers. Entitlement, quota, and rate checks can
stop image, video, or audio generation before a provider call is made.
| Mode | Exact default model |
|---|---|
speech | eleven_multilingual_v2 |
dialogue | eleven_v3 |
ElevenLabs music | music_v1 |
Lyria music | lyria-3-clip-preview |
sfx | eleven_text_to_sound_v2 |
Provider reference semantics
Image generation requires references that resolve to completed image variants.
Zero image references calls Gemini text-to-image, one-reference refine calls
Gemini edit, and derive or multi-reference refine calls Gemini compose.
Reference labels such as Image 1: or Style ref 1: are included in prompt
text; they are not typed provider channels.
Video references use Veo-specific channels. Zero images is text-to-video. One
unstyled image is sent as the top-level image input. Two unstyled images use
first/last-frame interpolation. Any style image, or three final image inputs,
uses Veo referenceImages[]; style images are typed as provider STYLE, and
the remaining references are typed as ASSET.
Audio generation does not accept image references today. Voice IDs and ordered dialogue voice selections are the reference-like controls for speech and dialogue identity.
Decision tables
Images
| Situation | Pick |
|---|---|
| final asset with several references | pro image model |
| quick draft with one reference | flash image model |
| character turnaround or tile set | pro image model |
| character plus style plus background | pro image model |
Video
| Situation | Pick |
|---|---|
| final shot from a keyframe | generate tier, 1080p or 4k, 8s |
| quick motion test | fast or lite tier, 720p, 4s or 6s |
| vertical social clip | generate tier, vertical aspect ratio, 6s or 8s |
Audio
| Situation | Pick |
|---|---|
| narration | speech |
| character conversation | dialogue |
| background bed | music |
| event sound | sfx |
Controls you can set
Start with these controls when shaping output:
--aspectfor image/video shape where supported--countfor batches- audio mode subcommands
- production metadata such as
--production-id,--shot-id,--scene-label,--timeline-start-ms, and--duration-ms
If a control is not listed here, let Make Effects use its defaults and focus on prompt, references, aspect, count, and production metadata.