#multimodal 24 total

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

og fallback

Thu, Apr 16 1 entries

blog local-llm 2mo ago ·

huggingface-blog

Sentence Transformersでマルチモーダル埋め込み・再ランカーを学習 Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 4月16日 Published Apr 16

AI要約 Hugging FaceがSentence Transformers v5系を用いて、テキストと画像を扱うマルチモーダル埋め込みモデルおよび再ランカーモデルを学習・微調整する方法を解説。CLIPなどのビジョン言語モデルを基盤に、損失関数やデータ準備、評価まで実践的に紹介する。

EN Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers

#huggingface #open-model #sentence-transformers +4

fallback

Thu, Apr 9 1 entries

blog local-llm 2mo ago ·

huggingface-blog

Sentence Transformersでマルチモーダル埋め込みとリランカーをサポート Multimodal Embedding & Reranker Models with Sentence Transformers

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 4月9日 Published Apr 9

AI要約 Sentence Transformersが画像やマルチモーダル入力に対応するよう拡張された。CLIPやSigLIPなどのモデルを共通APIで扱え、テキスト・画像横断の埋め込みやリランキングが可能になり、検索やRAGの構築が容易になる。

EN Multimodal Embedding & Reranker Models with Sentence Transformers

#huggingface #open-model #sentence-transformers +5

fallback

Thu, Apr 2 1 entries

🔥 HOT blog local-llm 2mo ago ·

huggingface-blog

Gemma 4登場: オンデバイスで動くフロンティア級マルチモーダルAI Welcome Gemma 4: Frontier multimodal intelligence on device

重要度 High High priority 重要度 High · 技術記事 · Local LLM / Open Models High priority · technical post · Local LLM / Open Models 公開 4月2日 Published Apr 2

AI要約 GoogleがオープンモデルファミリーGemma 4を公開。オンデバイス動作を視野に入れたマルチモーダル対応で、画像・テキストを統合的に扱える。Hugging Face上で重みが配布され、各種推論フレームワークにday-0で統合された。

EN Welcome Gemma 4: Frontier multimodal intelligence on device

#huggingface #open-model #gemma +4

fallback

Wed, Apr 1 2 entries

blog local-llm 2mo ago ·

huggingface-blog

TIIがFalcon Perceptionを公開、軽量マルチモーダルモデルを提供 Falcon Perception

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 4月1日 Published Apr 1

AI要約アブダビのTechnology Innovation Institute(TII)が、Falconシリーズの新たなマルチモーダルモデル「Falcon Perception」を公開した。視覚と言語を統合し、軽量かつエッジ展開を意識した設計で、オープンウェイトとして提供される。

EN Falcon Perception

#huggingface #open-model #falcon +4

fallback

blog local-llm 2mo ago ·

huggingface-blog

IBM、企業文書向け軽量マルチモーダルモデルGranite 4.0 3B Visionを公開 Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 4月1日 Published Apr 1

AI要約 IBMが企業文書処理に特化した軽量マルチモーダルモデルGranite 4.0 3B Visionを発表。3Bパラメータながら文書理解やOCR、表・図解析で大規模モデルに匹敵する性能を示し、Apache 2.0で公開された。エンタープライズ用途を意識した設計が特徴。

EN Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents

#enterprise #huggingface #open-model +5

fallback

Wed, Mar 18 1 entries

blog local-llm 3mo ago ·

huggingface-blog

Hugging Faceが示す2026年春のオープンソースAI動向 State of Open Source on Hugging Face: Spring 2026

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 3月18日 Published Mar 18

AI要約 Hugging Faceが2026年春時点のオープンソースAIの状況をまとめた。中国勢のLLM主導、マルチモーダルや動画生成モデルの台頭、推論・量子化エコシステムの成熟が示され、コミュニティ規模も急拡大している。

EN State of Open Source on Hugging Face: Spring 2026

#huggingface #open-model #hugging-face +4

fallback

Tue, Mar 17 1 entries

🔥 HOT blog codex 3mo ago ·

openai-blog

GPT-5.4 miniとnanoを発表 Introducing GPT-5.4 mini and nano

重要度 High High priority 重要度 High · 技術記事 · OpenAI / Codex High priority · technical post · OpenAI / Codex 公開 3月17日 Published Mar 17

AI要約 OpenAIがGPT-5.4の軽量版「mini」と「nano」を発表。低コスト・低遅延でコーディングやツール利用、マルチモーダル推論などに最適化。

EN GPT-5.4 mini and nano are smaller, faster versions of GPT-5.4 optimized for coding, tool use, multimodal reasoning, and high-volume API and sub-agent workloads.

#agent #openai #gpt-5 +7

fallback

Wed, Jan 28 1 entries

blog local-llm 4mo ago ·

huggingface-blog

中国オープンソースAIエコシステムの設計思想：DeepSeekを超えて Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Local LLM / Open Models Medium priority · technical post · Local LLM / Open Models 公開 1月28日 Published Jan 28

AI要約 DeepSeek登場から1年、中国発オープンソースAIモデルのアーキテクチャ選択——MoE・長文処理・マルチモーダル・推論——を俯瞰分析。

EN Architectural Choices in China's Open-Source AI Ecosystem: Building Beyond DeepSeek

#huggingface #open-model #china +7

fallback

Wed, Dec 17 1 entries

🔥 HOT NEW blog gemini 6mo ago ·

google-deepmind

Gemini 3 Flash: 高速性を追求したフロンティアAI Gemini 3 Flash: frontier intelligence built for speed

重要度 High High priority 重要度 High · 技術記事 · Gemini / Gemma High priority · technical post · Gemini / Gemma 公開 12月17日 Published Dec 17

AI要約 Google DeepMindは、軽量かつ高速なフロンティアモデル「Gemini 3 Flash」を発表した。推論やマルチモーダル性能を維持しつつ、低レイテンシと高スループットを実現し、リアルタイム用途や大規模展開に最適化されている。

EN Gemini 3 Flash offers frontier intelligence built for speed at a fraction of the cost.

#deepmind #google #gemini-3 +4

Gemini 3 Flash: frontier intelligence built for speed

media fallback

Wed, Nov 19 1 entries

🔥 HOT NEW blog gemini 7mo ago ·

google-deepmind

Gemini 3が切り拓く新時代の知能、推論とマルチモーダルを大幅強化 A new era of intelligence with Gemini 3

重要度 High High priority 重要度 High · 技術記事 · Gemini / Gemma High priority · technical post · Gemini / Gemma 公開 11月19日 Published Nov 19

AI要約 Google DeepMindが最新フラッグシップモデル「Gemini 3」を発表。推論力、マルチモーダル理解、エージェント機能を大幅に強化し、検索やGeminiアプリ、開発者向けAPIに同時投入される。コーディング特化版「Gemini 3 Deep Think」も提供される見込み。

EN A new era of intelligence with Gemini 3

#deepmind #google #gemini-3 +4

fallback

Sun, Oct 26 2 entries

NEW blog gemini 7mo ago ·

google-deepmind

MedGemma: 医療AI開発向けGoogleの最強オープンモデル MedGemma: Our most capable open models for health AI development

重要度 Medium Medium priority 重要度 Medium · 技術記事 · Gemini / Gemma Medium priority · technical post · Gemini / Gemma 公開 10月26日 Published Oct 26

AI要約 Googleは医療画像とテキストを扱えるオープンモデル群MedGemmaを発表した。Gemma 3をベースに4Bと27BのマルチモーダルモデルおよびMedSigLIP画像エンコーダを提供し、医療AI開発者がローカル環境で微調整・運用できる。

EN We’re announcing new multimodal models in the MedGemma collection, our most capable open models for health AI development.

#deepmind #google #medgemma +4

MedGemma: Our most capable open models for health AI development

media fallback

🔥 HOT NEW blog gemini 7mo ago ·

google-deepmind

Gemma 3n 開発者向けガイドを公開 Introducing Gemma 3n: The developer guide

重要度 High High priority 重要度 High · 技術記事 · Gemini / Gemma High priority · technical post · Gemini / Gemma 公開 10月26日 Published Oct 26

AI要約 GoogleがオンデバイスマルチモーダルモデルGemma 3nの開発者ガイドを公開。テキスト・画像・音声・動画に対応し、低メモリで動作する設計。

EN Gemma 3n is designed for the developer community that helped shape Gemma.

#deepmind #google #open-model +7

Introducing Gemma 3n: The developer guide

media fallback

Fri, Oct 24 1 entries

🔥 HOT NEW blog gemini 7mo ago ·

google-deepmind

Gemini Robotics 1.5、AIエージェントを物理世界へ Gemini Robotics 1.5 brings AI agents into the physical world

重要度 High High priority 重要度 High · 技術記事 · Gemini / Gemma High priority · technical post · Gemini / Gemma 公開 10月24日 Published Oct 24

AI要約 Google DeepMindがGemini Robotics 1.5を発表。視覚・言語・行動を統合し、ロボットが複雑なマルチステップタスクを自律的に計画・実行できる具現化AIエージェントを実現。

EN We’re powering an era of physical agents — enabling robots to perceive, plan, think, use tools and act to better solve complex, multi-step tasks.

#deepmind #google #robotics +7

Gemini Robotics 1.5 brings AI agents into the physical world

media fallback

Tue, Sep 30 1 entries

🔥 HOT blog codex 8mo ago ·

openai-blog

Sora 2登場：より物理的に正確で制御性の高い動画生成モデル Sora 2 is here

重要度 High High priority 重要度 High · 技術記事 · OpenAI / Codex High priority · technical post · OpenAI / Codex 公開 9月30日 Published Sep 30

AI要約 OpenAIが動画生成モデル「Sora 2」を発表。物理的精度・リアリティが向上し、同期音声・効果音にも対応。新アプリも同時公開。

EN Our latest video generation model is more physically accurate, realistic, and controllable than prior systems. It also features synchronized dialogue and sound effects. Create with it in the new Sora

#openai #sora #video-generation +5

fallback

Thu, Aug 28 1 entries

🔥 HOT blog codex 9mo ago ·

openai-blog

OpenAI、gpt-realtimeとRealtime APIの大幅アップデートを発表 Introducing gpt-realtime and Realtime API updates

重要度 High High priority 重要度 High · 技術記事 · OpenAI / Codex High priority · technical post · OpenAI / Codex 公開 8月28日 Published Aug 28

AI要約 OpenAIが本番向け音声合成モデルgpt-realtimeとRealtime API正式版を公開。リモートMCPサーバー、画像入力、SIP電話対応などを追加。

EN We’re releasing a more advanced speech-to-speech model and new API capabilities including MCP server support, image input, and SIP phone calling support.

#mcp-server #openai #realtime-api +6

fallback

Thu, Aug 7 1 entries

🔥 HOT blog codex 10mo ago ·

openai-blog

GPT-5の初公開：開発者が初めて触れる次世代モデル First look at GPT-5

重要度 High High priority 重要度 High · 技術記事 · OpenAI / Codex High priority · technical post · OpenAI / Codex 公開 8月7日 Published Aug 7

AI要約 OpenAIがGPT-5を初公開。推論・コーディング・マルチモーダル能力が大幅に向上し、開発者向けAPIへの統合も予定されている。

EN See how a group of leading developers use GPT-5 for the first time.

#openai #gpt-5 #llm +4