SDKに頼らないFetch API+SSEでOpenAI / Anthropic / Gemini を統一インターフェースで呼び出す

SDK不使用のFetch+SSEでOpenAI・Anthropic・Geminiを統一呼び出し A walkthrough of calling OpenAI, Anthropic and Gemini LLMs through a unified interface usi…

Zenn Claude tag · zenn.dev · 2026/05/10 10:33 · 2h ago · 📖 2 min

AI 3 行サマリ

OpenAI、Anthropic、GeminiのLLMをSDKに依存せず、Fetch APIとServer-Sent Eventsだけで統一インターフェースから呼び出す手法を解説。
各社のレスポンス形式の違いを吸収し、ストリーミング処理を共通化する実装例を示している。

English summary

A walkthrough of calling OpenAI, Anthropic and Gemini LLMs through a unified interface using only the Fetch API and Server-Sent Events, without vendor SDKs.
The post shows how to normalize each provider's streaming response format into a common abstraction.

OpenAI、Anthropic、Googleの三大LLMプロバイダーは、それぞれ公式SDKを提供している。しかし、複数のモデルを切り替えて使うアプリケーションでは、SDKごとの依存関係や型定義の違いが負債になりやすい。本記事は、SDKを一切使わずFetch APIとServer-Sent Events(SSE)だけで三社のチャット補完APIを統一インターフェースから呼び出す実装アプローチを紹介している。

中心となる課題は、ストリーミング応答のフォーマット差異である。OpenAIはOpenAI互換のSSE形式、Anthropicは独自のevent typeを伴うSSE、GeminiはJSONのストリームと、各社で微妙に異なる。著者は共通インターフェースを定義し、各プロバイダーのレスポンスをパースして統一形式のチャンクへ変換するアダプターパターンで吸収する。Fetch APIのReadableStreamを直接読み、TextDecoderでバイト列を文字列に組み立てつつ、SSEのdelimiterで分割する典型的な実装が示されている。

このアプローチの利点は、バンドルサイズの削減、Edge RuntimeやCloudflare Workersなど制約のある実行環境への適合性、そしてSDKのバージョンアップ追従コストを避けられる点にある。Vercel AI SDKやLangChain.jsといった抽象化ライブラリも同様の統一を提供しているが、内部実装をブラックボックス化したくないケースや、独自ロジックを差し込みたい場合には自前実装が選ばれることもある。

OpenAI、Anthropic、GeminiのLLMをSDKに依存せず、Fetch APIとServer-Sent Eventsだけで統一インターフェースから呼び出す手法を解説。

🧡 Claude · 本記事のポイント

一方で、tool calling、画像入力、構造化出力など各社が独自進化させている機能を網羅的にサポートしようとすると、アダプター層が肥大化する点には注意が必要だろう。とりわけAnthropicのprompt cachingやGeminiのマルチモーダル拡張は仕様変更も多く、メンテナンスコストとの天秤になると見られる。シンプルなテキスト補完用途であれば、本記事の手法は十分実用的な選択肢といえる。

OpenAI, Anthropic and Google all ship official SDKs for their LLM APIs, but maintaining multiple SDK dependencies in a single application that switches between providers can quickly become a liability. This article walks through an alternative: calling all three providers' chat completion endpoints through a unified interface built only on the Fetch API and Server-Sent Events, with no vendor SDK in the dependency tree.

The core difficulty lies in the differences between each provider's streaming response format. OpenAI uses an OpenAI-style SSE stream, Anthropic emits SSE with explicit event types attached to each chunk, and Gemini streams JSON fragments. The author defines a common interface and uses an adapter pattern, parsing each provider's raw response and normalizing it into a shared chunk type. The implementation reads the Fetch ReadableStream directly, decoding bytes with TextDecoder and splitting on SSE delimiters — a fairly canonical pattern but spelled out clearly for each provider.

The advantages of this approach are concrete. Bundle size shrinks notably, the code runs cleanly in constrained environments such as Vercel Edge Runtime or Cloudflare Workers where some SDKs struggle, and there is no need to chase SDK version bumps every time a provider tweaks its types. The trade-off is that the developer becomes responsible for tracking API changes manually, including subtle behaviors around retries, error envelopes, and rate-limit headers.

A walkthrough of calling OpenAI, Anthropic and Gemini LLMs through a unified interface using only the Fetch API and Server-Sent Events, without vendor SDKs.

🧡 Claude · Key takeaway

It is worth noting that abstraction libraries such as the Vercel AI SDK and LangChain.js already provide this kind of unified surface across providers. They are convenient defaults, but teams that want to keep the internals transparent, inject custom telemetry, or avoid yet another opinionated dependency often end up writing the kind of thin adapter layer described here. The pattern also composes well with frameworks like Hono or Next.js route handlers that expose Web-standard Request and Response objects.

A caveat is that provider-specific features keep diverging. Anthropic has invested heavily in prompt caching and a structured tool-use protocol, OpenAI continues to evolve its Responses API and structured outputs, and Gemini pushes multimodal inputs including video. Trying to expose all of these through a single common interface tends to inflate the adapter layer and may negate the simplicity gains. For straightforward text generation and basic streaming chat use cases, however, the SDK-free approach outlined in the article looks like a reasonable and maintainable choice, particularly for projects that already lean on Web-standard APIs.