Fallbacks

Keep serving users through provider outages. Declare fallback model IDs for Chat Completions requests; opengateway tries them in order.

What are fallbacks?#

A fallback is a backup model opengateway tries when the primary model fails, times out, or returns a rate limit. Pass fallback model IDs under extra.fallbacks in a Chat Completions request. opengateway tries targets in order and only surfaces an error to your application when every option is exhausted.

Why you want this#

Even the best providers have bad hours. A timeout here, a rate limit there, an upstream hiccup during a deploy. Without fallbacks, your users feel every one of those incidents.

With fallbacks enabled, opengateway quietly tries the next model in your list, and your users keep moving.

How to use it#

Pass an extra.fallbacks array alongside the primary model. The router caps the resolved target list at three targets, including the primary target.

{
  "model": "anthropic/claude-sonnet-4",
  "extra": {
    "fallbacks": ["openai/gpt-4o", "google/gemini-2.5-pro"]
  },
  "messages": [{"role": "user", "content": "Hello"}]
}

Order matters. opengateway tries anthropic/claude-sonnet-4 first. If that fails, it tries openai/gpt-4o. If that also fails, it tries google/gemini-2.5-pro. Only when every entry in the list fails does your application see an error.

When a fallback fires#

Fallbacks trigger on upstream conditions that suggest the provider, not your request, is at fault:

Upstream timeout, configurable per provider (30 seconds to first byte by default).
HTTP 5xx from the provider.
HTTP 429 (rate limit) after opengateway has exhausted its own retries.
Authentication failure on the provider's side (their key, not yours).
A malformed response body from the upstream provider.

When a fallback does not fire#

Your request is invalid. A 400 from the upstream means the body is wrong, and a different provider will not accept it either.
The response stream has already started. Once the first byte has been sent, opengateway cannot silently retry without confusing the user. Handle mid-stream errors in your client.

Cost and quality considerations#

Different models have different prices. A fallback chain like the one below costs roughly 100 times less per token on the fallback, but the output quality will drop:

anthropic/claude-opus-4      # $15 / 1M input
  → openai/gpt-4o-mini       # $0.15 / 1M input

Choose fallbacks that are close enough in capability that your application still works when they run. If quality matters more than cost, fall back to another top-tier model rather than a budget one.

Debugging#

Every Chat Completions response includes routing attempts:

{
  "extra": {
    "routing": {
      "attempts": [
        { "provider": "anthropic", "region": "default", "status": "failed" },
        { "provider": "openai", "region": "default", "status": "succeeded" }
      ]
    }
  }
}

The Logs dashboard shows the full chain for every request, including the ones that hit fallbacks. It is the fastest way to understand why a given day looked unusual.

A production pattern#

{
  "model": "anthropic/claude-sonnet-4",
  "extra": {
    "fallbacks": [
      "openai/gpt-4o",
      "google/gemini-2.5-pro"
    ]
  },
  "messages": [...]
}

Three different providers. If Anthropic, OpenAI, and Google all have an outage at the same time, you have bigger problems than your application.