Four things make Omni different from every video model before it.πͺ
1. The world model. Omni understands the physical boundaries of reality β gravity, kinetic energy, fluid dynamics. The “marble test” (a marble running a chain-reaction track, every collision physically correct and individually audible) is the proof. Better physics intuition, not a perfect simulator β but a real leap over the previous generation.π¬
2. Conversational, turn-by-turn editing. Instead of rewriting a full prompt for every change, you talk to the video: “make the violin invisible,” “change the camera angle.” Characters stay consistent, physics hold, and the scene remembers what came before. Reliable up to about 4 turns before drift sets in. π§
3. Audio as input. Feed Omni a music track or voiceover and it reasons across the audio to generate matching visuals β pacing, emotion, and beats lined up. Most models take one input type; Omni combines video + image + audio at once. That’s the “omni” in the name.π‘οΈ
4. SynthID on every clip. Every output carries an imperceptible SynthID watermark, verifiable in the Gemini app, Google Search, and Chrome. Know this: it’s a provenance signal, not DRM β standard re-encoding (color grade, grain, upscaling) tends to disrupt it.β οΈ
Held back at launch: editing what people say in a video (speech editing) was built but restricted β read as a deepfake hedge ahead of the 2026 US elections.
