Google、Gemini 3 Pro Image「Nano Banana Pro」を提供開始 Build with Nano Banana Pro, our Gemini 3 Pro Image model

Google DeepMind Blog · deepmind.google · 2025/11/21 00:11 · 7mo ago · 📖 2 min

AI 3 行サマリ

GoogleはGemini 3 Proベースの画像生成・編集モデル「Nano Banana Pro」を発表した。
高精度なテキスト描画やインフォグラフィック生成、最大4K出力、複数画像の合成、リアルタイム情報の反映に対応し、Gemini APIとVertex AI経由で利用できる。

English summary

Google DeepMind launched Nano Banana Pro, a Gemini 3 Pro-based image model offering accurate text rendering, infographic creation, up to 4K output, multi-image blending and SynthID, available via the Gemini API and Vertex AI.

Google DeepMindは、Gemini 3 Proを基盤とする新しい画像生成・編集モデル「Nano Banana Pro」を発表した。先行してヒットした「Nano Banana」(Gemini 2.5 Flash Image)の上位版にあたり、開発者向けにGemini APIとGoogle AI Studio、企業向けにVertex AIで提供される。

最大の特徴は、これまで生成画像の弱点とされてきた文字描画の精度を大幅に高めた点だ。複数言語のテキストを画像内に正確に書き込めるため、ポスターや図解、UIモックアップ、インフォグラフィックといった「文字が主役」のクリエイティブ用途に踏み込めるとされる。出力解像度は1Kに加え2K・4Kにも対応し、アスペクト比の指定や印刷向けの高精細出力も視野に入る。

編集機能も強化された。最大14枚の参照画像を取り込み、最大5人の人物の一貫性を保ったまま合成・再構成できるという。さらにGemini 3 Proの推論能力を活かし、ライティングの方向や被写界深度、カラーグレーディングといった写真的なパラメータをローカル/グローバルに調整できる。Google検索によるグラウンディングにも対応し、最新のスポーツ結果や天気、レシピなどリアルタイム情報を反映した図版生成が可能と説明されている。

高精度なテキスト描画やインフォグラフィック生成、最大4K出力、複数画像の合成、リアルタイム情報の反映に対応し、Gemini APIとVertex AI経由で利用できる。

✨ Gemini / Gemma · 本記事のポイント

来歴の透明性確保のため、出力にはSynthIDによる不可視の電子透かしが埋め込まれ、Geminiアプリ側ではC2PA準拠のメタデータも付与される。AI生成コンテンツの検出ニーズが高まる中、SynthID Detectorによる照合も提供される見込みだ。

背景として、画像生成領域はOpenAIのGPT Image、Black Forest LabsのFLUX、Adobe Fireflyなどが競合し、特に「画像内テキストの正確さ」と「マルチ画像合成の整合性」が差別化軸になりつつある。Nano Banana Proはこの2点に正面から応えるアップデートと位置づけられ、マーケティング素材や教育コンテンツ制作などエンタープライズ用途での採用が広がる可能性がある。一方で4K出力や多人数合成は計算コストも大きいと見られ、料金面のバランスが普及の鍵となりそうだ。

Google DeepMind has introduced Nano Banana Pro, a new image generation and editing model built on Gemini 3 Pro. It is positioned as the higher-end sibling of the original Nano Banana (Gemini 2.5 Flash Image) that became a viral hit earlier this year, and it is rolling out to developers through the Gemini API and Google AI Studio, and to enterprises via Vertex AI.

The headline improvement is text rendering. Generative image models have historically struggled to draw legible words, especially in non-Latin scripts, which has limited their usefulness for posters, slides, UI mockups and infographics. Nano Banana Pro is tuned to produce accurate, multi-language text directly inside images, opening the door to design workflows where typography is central rather than incidental. Output resolution scales up to 4K, with explicit control over aspect ratio, making the model more viable for print and large-format use.

Editing capabilities have also been expanded. The model can ingest up to 14 reference images and preserve the likeness of as many as five people across a composition, which is a notable jump for consistency-driven tasks like storyboards or branded character sheets. Drawing on Gemini 3 Pro's reasoning, users can apply photographic adjustments — lighting direction, depth of field, color grading, focus — either locally or globally via natural language. The model is also grounded with Google Search, so it can incorporate real-time information such as sports results, weather or recipes when producing explanatory visuals.

For provenance, every output carries an invisible SynthID watermark, and images generated through the Gemini app additionally receive C2PA metadata. Google says SynthID Detector can be used to verify whether an image originated from its models, an increasingly relevant feature as platforms and regulators push for clearer labeling of synthetic media.

In the broader landscape, image generation has become a crowded field with OpenAI's GPT Image, Black Forest Labs' FLUX family and Adobe Firefly all competing for creative and enterprise workloads. The frontier is shifting from raw aesthetic quality toward in-image text accuracy, multi-image compositional consistency, and controllable editing — exactly the dimensions Nano Banana Pro targets. That suggests strong appeal for marketing, e-commerce and education use cases, though 4K rendering and multi-subject composition are likely compute-intensive, so pricing and latency may ultimately determine how widely it is adopted in production pipelines.