Gemini

Veo 3.1のIngredients to Video、一貫性と制御性を強化 Veo 3.1 Ingredients to Video: More consistency, creativity and control

Google DeepMind Blog · deepmind.google · 2026/01/14 02:00 · 4mo ago · 📖 2 min

元記事を読む鮮度 OK

AI 3 行サマリ

Google DeepMindは動画生成モデルVeo 3.1をアップデートし、複数の参照画像から一貫性のある動画を生成する「Ingredients to Video」機能を強化した。
音声統合や編集機能も改善され、Flow経由でクリエイターがより精密に映像を制御できるようになった。

English summary

Our latest Veo update generates lively, dynamic clips that feel natural and engaging — and supports vertical video generation.

Google DeepMindは動画生成モデルVeo 3.1のアップデートを発表した。複数の参照画像を組み合わせて一貫性のある動画を生成する「Ingredients to Video」機能を中心に、一貫性、創造性、制御性の三点で強化が図られている。

今回のアップデートでは、ユーザーがキャラクター、オブジェクト、背景といった複数の「素材(ingredients)」となる画像をアップロードし、それらをシーン内で自然に組み合わせた動画を生成できる。Veo 3で導入された音声同時生成機能はそのまま継承しつつ、フレーム間や生成クリップ間でのキャラクターの見た目や雰囲気の維持精度が向上したとされる。

また、最初と最後のフレームを指定して中間を補間する機能、既存クリップの末尾を延長する機能、シーン内に新しい要素を追加・除去する編集機能なども提供される。これらの機能はGoogleの映像制作向けインターフェースFlowを通じて利用可能で、Gemini APIやVertex AIからも順次展開される見込みである。

Google DeepMindは動画生成モデルVeo 3.1をアップデートし、複数の参照画像から一貫性のある動画を生成する「Ingredients to Video」機能を強化した。

✨ Gemini · 本記事のポイント

背景として、動画生成分野ではOpenAIのSora 2、Runway Gen-4、Kling、MiniMaxなどが競合しており、特に「複数ショット間でのキャラクター一貫性」と「ネイティブ音声生成」が差別化軸となりつつある。Veoはもともと音声付き生成で先行しており、今回のIngredients強化は参照画像主導の制御性でRunwayのReferences機能などに対抗する位置付けと見られる。

クリエイター側からは、ストーリーボードや広告制作など反復編集を伴うワークフローで、キャラクターや小物の同一性を保てるかが実用化の鍵とされてきた。今回の更新はその課題への直接的な回答であり、商業映像制作におけるAI動画ツールの実用度をさらに押し上げる可能性がある。

Google DeepMind has rolled out an update to its video generation model Veo 3.1, with the headline change being a more capable Ingredients to Video feature designed to deliver greater consistency, creativity, and control for creators working with AI-generated footage.

The Ingredients to Video workflow lets users upload several reference images — for example a character, a prop, and a setting — and have the model weave them into a coherent shot. Veo 3.1 retains the native audio generation introduced with Veo 3, meaning dialogue, ambient sound, and effects are produced jointly with the visuals rather than dubbed in afterward. DeepMind says the update improves how faithfully characters, objects, and overall mood persist across frames and across separate clips, a long-standing weakness of diffusion-based video models.

Alongside the reference-driven generation, the update brings several editing-oriented capabilities. Creators can specify a first and last frame and let the model interpolate the motion between them, extend an existing clip from its final frame to continue a scene, or insert and remove objects within a generated shot. These tools are surfaced primarily through Flow, Google's filmmaking-oriented interface, with broader availability planned via the Gemini API and Vertex AI for developers and enterprise users.

The competitive context matters here. OpenAI's Sora 2, Runway's Gen-4, Kuaishou's Kling, and MiniMax's Hailuo have all pushed aggressively on multi-shot character consistency and controllability over the past year, while Runway's References and similar features have established image-anchored generation as a baseline expectation. Veo's earlier lead in synchronized audio generation gave it a distinctive edge, and the strengthened Ingredients pipeline appears aimed at closing the gap on reference-based control while preserving that audio advantage.

For professional users, the practical question has always been whether AI video can survive iterative production workflows — storyboarding, reshoots, ad variants — where the same character or product must appear identical across many shots. Frame interpolation and scene extension features suggest DeepMind is targeting exactly this pipeline rather than one-off social clips. If the consistency improvements hold up under real production stress, Veo 3.1 could meaningfully raise the bar for what creative teams expect from generative video tools, though independent evaluations will likely be needed to confirm how much of a step-change this represents in practice.