Public docs

Keep the thread of your media project.

Start with the CLI, explore with an agent or colleague, and keep the variants, prompts, relationships, and chosen files together.

Pick speech, dialogue, music, or SFX modes and brief each sound deliberately.

Audio Playbook

Audio prompts need the same staging as visuals: who is speaking, where the sound happens, and what action it belongs to.

Pick the mode

ModeUse it forCLI
speechSingle-voice narration or voiceovermakefx audio speech generate
dialogueMulti-speaker scriptsmakefx audio dialogue generate
musicBeds, cues, stings, loopsmakefx audio music generate
sfxOne-off sound effectsmakefx audio sfx generate
makefx audio speech generate \
  "A calm host intro: Welcome back to the forge." \
  --name "Episode Intro Narration" -o audio/intro.wav

makefx audio sfx generate \
  "A crisp inventory item pickup sound effect" \
  --name "Item Pickup SFX" -o audio/item-pickup.wav

Treat voice as identity

For speech and dialogue, the voice is the audio equivalent of a character sheet. Pick it once and reuse it across the production.

For dialogue, keep speaker names stable:

Host: Welcome back to the forge.
Blacksmith: Took you long enough. Grab a hammer.
Host: Easy. I only just put my coffee down.
makefx audio dialogue generate \
  --input scripts/scene-dialogue.txt \
  --name "Blacksmith Dialogue" -o audio/blacksmith-dialogue.wav

Do not leave the performance implied. Add pace, emotion, and situation directly to the line.

Brief music clearly

For music, include:

  • genre or era
  • tempo
  • key instruments
  • mood
  • dynamic arc
  • whether vocals are allowed
makefx audio music batch \
  "Three 20-second low-intensity fantasy workshop beds, warm strings and soft hand percussion, no vocals, gentle and unobtrusive" \
  --name "Workshop Music Bed" --count 3 --output-dir audio/music-beds

Use batch generation when the next step is choosing among candidates.

Tie effects to action

For SFX, describe the sound and the visible event it belongs to:

A short, bright magical pickup chime exactly as a glowing coin snaps into the inventory.

For video work, include ambience too: room tone, crowd murmur, wind, machine hum, footsteps, or intentional silence.

Quick reference

GoalDo this
Consistent narratorReuse one voice
Multi-speaker sceneStable Speaker: names
Music bedGenre + tempo + instruments + dynamics
Several candidatesBatch mode
Video effectTie the sound to visible action

See Model & Parameter Selection for choosing speech, dialogue, music, SFX, and output settings.