Hugging Face、拡散パイプラインを部品化するModular Diffusersを発表 Introducing Modular Diffusers - Composable Building Blocks for Diffusion Pipelines

Hugging Face Blog · huggingface.co · 2026/03/05 09:00 · 3mo ago · 📖 2 min

AI 3 行サマリ

Hugging Faceは、拡散モデルのパイプラインをブロック単位で組み立てられる新フレームワーク「Modular Diffusers」を発表した。
再利用可能なPipelineBlockを組み合わせることで、独自ワークフローの構築や既存コンポーネントの共有が容易になる。

Hugging Faceは、Diffusersライブラリに新しい設計思想を持ち込む「Modular Diffusers」を発表した。これまでモノリシックだった拡散パイプラインを、再利用可能な部品として組み立て直せるようにするフレームワークである。

従来のDiffusersでは、StableDiffusionPipelineやFluxPipelineのように、用途ごとに専用のパイプラインクラスが用意されていた。利便性は高い一方で、ControlNet、IP-Adapter、Differential Diffusionといった派生機能を組み合わせようとすると、似たコードが何度も複製され、メンテナンス負荷が増していた。Modular Diffusersはこの課題に対し、推論ステップを「PipelineBlock」という単位に分解し、ブロックを差し替え・連結する方式で再構成する。

中心となる概念はPipelineBlock、ModularPipelineBlocks(複数ブロックの合成)、そしてComponentsManagerである。開発者はテキストエンコード、ノイズ予測、デコードといった処理単位を定義し、それらをグラフ的に組み合わせて新しいパイプラインを構築できる。さらに、ブロック単位でHugging Face Hubにアップロード・共有できるため、コミュニティが新手法を公開した際の取り込みが従来より迅速になると見られる。

Hugging Faceは、拡散モデルのパイプラインをブロック単位で組み立てられる新フレームワーク「Modular Diffusers」を発表した。

🏠 Local LLM / Open Models · 本記事のポイント

背景には、画像・動画生成モデルの多様化がある。FLUX、Stable Diffusion 3、HunyuanVideo、Wanなど次々登場するアーキテクチャごとに専用パイプラインを書く方式は、もはや持続可能ではない。ComfyUIがノードベースUIで支持を集めているのも同じ流れであり、Modular DiffusersはそれをPython API側で実現する試みと位置づけられる。

Hugging Faceは互換性を重視しており、既存のDiffusersパイプラインは引き続き利用可能としている。Modular Diffusersは現時点でベータ的な位置付けと見られ、今後LoRAや量子化、各種アダプタとの統合が進むことで、研究者と実プロダクト開発者の双方にとって標準的な構築基盤となる可能性がある。

Hugging Face has unveiled Modular Diffusers, a new architectural layer for its popular Diffusers library that reimagines diffusion pipelines as compositions of reusable building blocks rather than monolithic classes.

Until now, Diffusers has shipped a dedicated pipeline class for nearly every model family and task — StableDiffusionPipeline, FluxPipeline, StableDiffusionControlNetImg2ImgPipeline, and so on. While this gave users a clean one-line entry point, it also led to substantial code duplication. Adding a new technique such as ControlNet, IP-Adapter, or differential diffusion typically required forking an entire pipeline file, and combining several techniques meant maintaining yet another bespoke variant. Modular Diffusers addresses this by decomposing inference into discrete PipelineBlocks that can be swapped, chained, and reused.

The framework centers on a few key abstractions: PipelineBlock, which encapsulates a single logical step such as text encoding, denoising, or VAE decoding; ModularPipelineBlocks, which compose multiple blocks; and a ComponentsManager that handles shared models and memory across the graph. Developers can publish their own blocks to the Hugging Face Hub, meaning a new sampler or guidance trick can be distributed as a standalone unit and dropped into any compatible pipeline. The blog post walks through assembling text-to-image, image-to-image, and ControlNet workflows from the same underlying parts.

The motivation is partly ecosystem pressure. The pace of new diffusion architectures — FLUX, Stable Diffusion 3.5, HunyuanVideo, Wan, and a steady stream of fine-tuned variants — has made the per-model pipeline approach increasingly hard to scale. Tools like ComfyUI have demonstrated strong community appetite for node-based, composable workflows, and Modular Diffusers can be seen as bringing a similar philosophy to the Python API surface that researchers and backend engineers prefer. It also aligns with Hugging Face's broader trend of pushing modularity into Transformers and PEFT.

Importantly, the team emphasises backward compatibility: existing Diffusers pipelines continue to work unchanged, and Modular Diffusers is positioned as an additive option rather than a replacement. The current release appears to be an initial cut, with documentation and tutorials covering the core blocks; deeper integration with LoRA loading, quantisation backends like bitsandbytes and torchao, and adapter ecosystems is likely to follow in subsequent updates.

For practitioners, the practical upshot is that experimenting with custom inference logic — say, mixing a ControlNet branch with a differential diffusion mask and a custom scheduler — should require considerably less boilerplate. For library maintainers, it offers a path to reduce the combinatorial explosion of pipeline classes. Whether the community embraces the new abstractions will depend on how cleanly third-party blocks can be authored and published, but the design direction is consistent with where much of the open-source generative stack appears to be heading.