Streaming

Stream LLM tokens via Server-Sent Events. opengateway normalizes supported providers to the OpenAI SSE chunk shape for compatible clients.

What is streaming in opengateway?#

Streaming sends each model token to the client as soon as it is generated, over a long-lived Server-Sent Events connection. opengateway translates supported providers into OpenAI's SSE shape, so a single OpenAI-compatible client can consume the stream.

When to use streaming#

Use it whenever you want the user to see tokens as soon as the model generates them. Chat UIs, agent loops, and any other interactive surface benefit from streaming.

For short responses (under about 200 tokens), the setup cost often outweighs the benefit. In that case, a normal non-streaming request is simpler.

Enable it#

Set stream: true on the request. No other changes are needed.

curl https://api.opengateway.ai/v1/chat/completions \
  -H "Authorization: Bearer $OPENGATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "stream": true,
    "messages": [{"role": "user", "content": "Count to 5 slowly"}]
  }'

Response format#

The response is a stream of Server-Sent Events. Each chunk is a JSON object on a data: line.

data: {"choices":[{"delta":{"content":"One"},"index":0}]}

data: {"choices":[{"delta":{"content":"."},"index":0}]}

data: {"choices":[{"delta":{"content":" Two"},"index":0}]}

...

data: {"choices":[{"delta":{},"finish_reason":"stop","index":0}]}

data: [DONE]

The shape matches OpenAI's stream output exactly, so any OpenAI-compatible streaming client works without modification.

OpenAI SDK (Python)#

stream = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[{"role": "user", "content": "Count to 5"}],
    stream=True,
)
 
for chunk in stream:
    content = chunk.choices[0].delta.content or ""
    print(content, end="", flush=True)

Vercel AI SDK#

import { streamText } from "ai";
import { createOpenAI } from "@ai-sdk/openai";
 
const og = createOpenAI({
  apiKey: process.env.OPENGATEWAY_API_KEY,
  baseURL: "https://api.opengateway.ai/v1",
});
 
const result = await streamText({
  model: og("anthropic/claude-sonnet-4"),
  prompt: "Count to 5 slowly",
});
 
for await (const chunk of result.textStream) {
  process.stdout.write(chunk);
}

Cross-provider normalization#

Anthropic and Google stream responses in different wire formats from OpenAI. opengateway translates all of them into OpenAI's SSE shape, so your client code does not need to branch on provider.

Timeouts#

The default timeout for a streaming response is 10 minutes. Contact support if you need a longer window for long agent loops.

Errors mid-stream#

If the upstream provider fails after tokens have already started flowing, opengateway emits an error chunk and closes the stream:

data: {"error": {"type": "server_error", "message": "Upstream terminated"}}

data: [DONE]

Handle this case in your client. Fallbacks do not automatically retry mid-stream by design: the user has already seen part of the output, and silently switching models would confuse them. Plan for it at the application layer.