Public docs

Keep the thread of your media project.

Start with the CLI, explore with an agent or colleague, and keep the variants, prompts, relationships, and chosen files together.

Choose image, video, and audio modes, defaults, and CLI-exposed parameters.

Model & Parameter Selection

Use this page to choose the model path and settings you can actually control from makefx today. For prompting strategy, start with Media Playbooks.

Images

Make Effects routes image jobs through Google's Nano Banana image models and stores the exact provider model ID in each generated image recipe.

SelectionExact model IDUse forNotes
progemini-3-pro-image-previewassets you expect to reuse, compare, or hand offdefault; supports up to 14 references
flashgemini-2.5-flash-imagequick drafts with one reference or no referencesupports 1 reference and 1K output only

Image parameters

ParameterValuesGuidance
Aspect ratio1:1, 16:9, 9:16, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 21:9choose for destination; default image generation is square when omitted
Image size1K, 2K, 4KPro supports all three; Flash supports 1K only
Reference imagesPro supports up to 14; Flash supports 1label each reference by role
Operationgenerate, refine, derivegenerate from text, edit an existing variant, or compose from references

Use 16:9 or 21:9 for keyframes and backgrounds, 9:16 for vertical clips, 1:1 for icons and tiles, and portrait ratios for character cards.

Video

Make Effects routes video jobs through Google's Veo 3.1 family. The public CLI lets you set prompt, references, resolution, provider duration, model tier, audio, and production metadata.

TierExact model IDUse for
generateveo-3.1-generate-previewclips you expect to review, place on a timeline, or ship
fastveo-3.1-fast-generate-previewcheaper iteration path
liteveo-3.1-lite-generate-previewdraft path for background motion tests

Video parameters

ParameterValuesGuidance
Aspect ratio16:9, 9:16other values normalize to landscape behavior
Resolution720p, 1080p, 4kpick 720p for tests; 4k requires the generate or fast tier
Provider duration4, 6, 8 secondsnot controlled by --duration-ms; use CLI --duration or the web duration control
Tiergenerate, fast, liteuse generate for final clips, fast/lite for iteration
Referencesup to 3 source images/keyframesone unstyled image uses image-to-video; two unstyled images use first/last frames; any style image or 3 images uses reference-image mode

The CLI --duration-ms flag records where the clip fits on your production timeline. It does not set the generated clip length; use --duration 4|6|8 for provider duration.

Audio

Choose one audio mode for each request.

ModeDefault useCLI
speechone narrator or voiceovermakefx audio speech generate
dialoguemulti-speaker scriptmakefx audio dialogue generate
musicbed, cue, sting, loopmakefx audio music generate
sfxone-off sound effectmakefx audio sfx generate

Speech and dialogue depend on voice selection and provider configuration. Treat the voice as a reusable reference for identity. Production can use ElevenLabs; music requests may opt into Lyria with --provider lyria. Stage and local environments may use fake providers. Entitlement, quota, and rate checks can stop image, video, or audio generation before a provider call is made.

ModeExact default model
speecheleven_multilingual_v2
dialogueeleven_v3
ElevenLabs musicmusic_v1
Lyria musiclyria-3-clip-preview
sfxeleven_text_to_sound_v2

Provider reference semantics

Image generation requires references that resolve to completed image variants. Zero image references calls Gemini text-to-image, one-reference refine calls Gemini edit, and derive or multi-reference refine calls Gemini compose. Reference labels such as Image 1: or Style ref 1: are included in prompt text; they are not typed provider channels.

Video references use Veo-specific channels. Zero images is text-to-video. One unstyled image is sent as the top-level image input. Two unstyled images use first/last-frame interpolation. Any style image, or three final image inputs, uses Veo referenceImages[]; style images are typed as provider STYLE, and the remaining references are typed as ASSET.

Audio generation does not accept image references today. Voice IDs and ordered dialogue voice selections are the reference-like controls for speech and dialogue identity.

Decision tables

Images

SituationPick
final asset with several referencespro image model
quick draft with one referenceflash image model
character turnaround or tile setpro image model
character plus style plus backgroundpro image model

Video

SituationPick
final shot from a keyframegenerate tier, 1080p or 4k, 8s
quick motion testfast or lite tier, 720p, 4s or 6s
vertical social clipgenerate tier, vertical aspect ratio, 6s or 8s

Audio

SituationPick
narrationspeech
character conversationdialogue
background bedmusic
event soundsfx

Controls you can set

Start with these controls when shaping output:

  • --aspect for image/video shape where supported
  • --count for batches
  • audio mode subcommands
  • production metadata such as --production-id, --shot-id, --scene-label, --timeline-start-ms, and --duration-ms

If a control is not listed here, let Make Effects use its defaults and focus on prompt, references, aspect, count, and production metadata.