Omni Flash creates from any input
Use text, images, video, audio, or a rough scene idea as creative references, then turn them into one cohesive video direction.
Create and edit videos as easy as having a conversation. Speak it. See it. Share it. Think of Gemini Omni Flash like Nano Banana for video: a natural way to create from text, images, video, audio, and step-by-step edits.
Omni Flash makes video creation feel more like a chat than a timeline. Describe what should happen, add a reference when needed, and keep shaping the result through clear instructions.
Use Omni Flash to start with one prompt, then refine the idea into a product teaser, UGC ad, cinematic story beat, lesson preview, or social short.
Prompt, preview, and refine Omni Flash style video ideas from one focused workspace.
For stronger Omni Video prompts, include subject, motion, camera, style, text, audio, and editing instructions.
Your Gemini Omni Video result will appear here
Google DeepMind describes Gemini Omni as a model for creating anything from any input, starting with video. On this page, Omni Flash focuses that idea into a practical AI video workflow: speak it, see it, and keep reshaping the clip with follow-up instructions.
Official Omni Flash demo sources
The X thread introduces Gemini Omni as Google DeepMind's first step toward a model that can create from many inputs. The Omni Flash videos below use Google's official Gemini Omni demo assets instead of fragile X video hotlinks, so they should play more reliably inside the page.
Use text, images, video, audio, or a rough scene idea as creative references, then turn them into one cohesive video direction.
Refine the same scene step by step. Change a detail, camera angle, or environment while keeping the clip coherent.
Create motion that responds to gravity, material behavior, camera movement, and cause-and-effect instead of feeling stitched together.
Switch what happens in an existing video, from a small action change to a more dramatic visual transformation.
Replace a character or object with natural language while keeping the camera move, scene structure, and continuity readable.
Coordinate onscreen text with motion and timing, useful for explainers, product demos, and social shorts.