POST/v1/chat/completions

Create chat completion

Creates a model response for the given chat conversation. Compatible with OpenAI API.

Headers

AuthorizationstringRequired

Bearer token for authentication. Format: Bearer YOUR_API_KEY

x-opengateway-user-idstringOptional

User identifier for analytics and tracking.

x-opengateway-session-idstringOptional

Session identifier for analytics and tracking.

Request body

modelstringRequired

ID of the model to use. See the model list for available models.

Use the owner/model format (e.g., openai/gpt-4o).

messagesarrayRequired

A list of messages comprising the conversation so far.

rolestring

The role: system, user, assistant, or tool

contentstring | array

The contents of the message. Can be a string or an array of content parts for multimodal input.

Content part types (when array):

{ type: "text", text: "..." } — Text content

{ type: "image_url", image_url: { url: "...", detail: "auto" } } — Image input (URL or base64). detail: auto | low | high | original (Optional, default: auto)

namestringOptional

An optional name for the participant.

tool_callsarrayassistant only

Tool calls generated by the model (present in assistant messages).

tool_call_idstringtool role only

The ID of the tool call this message is responding to.

▶cache_controlobjectOptional

Prompt caching control.
Gemini: converts marked prefix to cached content.
Anthropic: auto-injected by gateway.

type: "ephemeral" (required)

ttl: e.g. "600s" — Gemini only, defaults to 10 minutes

temperaturenumberOptional

Sampling temperature between 0 and 2. Higher values make output more random. Defaults to 1. Reasoning models (o-series) force this to 1.

top_pnumberOptional

Nucleus sampling. The model considers tokens with top_p probability mass. Value between 0 and 1. Defaults to 1. Not supported with reasoning models (o-series).

max_tokensintegerOptional

The maximum number of tokens to generate in the chat completion.

max_completion_tokensintegerOptional

An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. Alternative to max_tokens. Recommended for reasoning models.

streambooleanOptional

If set to true, partial message deltas will be sent as server-sent events. Defaults to false.

stream_optionsobjectOptional

Options for streaming response. Only set this when stream is true.

include_usageboolean

If set, an additional chunk will be streamed with usage statistics.

frequency_penaltynumberOptional

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far. Defaults to 0. Not supported with reasoning models (o-series).

presence_penaltynumberOptional

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far. Defaults to 0. Not supported with reasoning models (o-series).

stopstring | arrayOptional

Up to 4 sequences where the API will stop generating further tokens. Not supported with newer reasoning models (o4-mini, gpt-5).

seedintegerOptional

If specified, the system will make a best effort to sample deterministically for reproducible outputs.

logprobsbooleanOptional

Whether to return log probabilities of the output tokens. Defaults to false. Not supported with reasoning models (o-series).

top_logprobsintegerOptional

An integer between 0 and 20 specifying the number of most likely tokens to return at each position, each with an associated log probability. logprobs must be set to true.

toolsarrayOptional

A list of tools the model may call. Currently, only functions are supported.

typestring

The type of tool. Currently only function is supported.

functionobject

The function definition: name (required), description (optional), parameters (JSON Schema object, optional).

tool_choicestring | objectOptional

Controls which tool is called. none disables tools, auto lets the model decide, required forces a tool call. Or specify a function: { "type": "function", "function": { "name": "my_func" } }.

response_formatobjectOptional

Specifies the format of the output. Set type to json_object for JSON mode, json_schema for schema output where the provider supports it, or text (default).

reasoning_effortstringOptional

Constrains effort on reasoning for reasoning models. Supported by o-series and gpt-5 models. Accepted values: none, minimal, low, medium, high, xhigh. Supported values vary by model.

userstringOptional

A unique identifier representing your end-user, which can help to monitor and detect abuse.

nintegerOptional

How many chat completion choices to generate for each input message. Between 1 and 128. Defaults to 1. Not supported with stream: true.

parallel_tool_callsbooleanOptional

Whether to enable parallel function calling during tool use. Defaults to true.

service_tierstringOptional

Specifies the latency tier to use for processing the request. auto, default, or flex. When not set, the default service tier is used.

storebooleanOptional

Provider store flag for providers that support stored chat completion outputs.

metadatamapOptional

Set of up to 16 key-value pairs that can be attached to the object. Keys are up to 64 characters, values up to 512 characters.

logit_biasmapOptional

Modify the likelihood of specified tokens appearing in the completion. Maps token IDs to bias values from -100 to 100. Not supported with reasoning models (o-series).

extraobjectOptional

OpenGateway-specific extension parameters.

fallbacksarray

List of fallback model IDs to try if the primary target fails. The resolved target list is capped at three targets including the primary target.

Returns

Returns a chat completion object, or a streamed sequence of chat completion chunk objects if streaming is enabled.

idstring

A unique identifier for the chat completion.

objectstring

The object type, always chat.completion.

system_fingerprintstring

This fingerprint represents the backend configuration that the model runs with.

service_tierstring

The service tier used for processing the request.

choicesarray

A list of chat completion choices. When n > 1, each choice has a unique index.

index — The index of this choice in the list.

message — The generated message (role, content, tool_calls).

finish_reason — stop | length | tool_calls | content_filter

logprobs — Log probability information (when logprobs=true). Contains:

content[] — Per-token logprob array. Each entry: token, logprob, bytes

content[].top_logprobs[] — Top N candidates (when top_logprobs is set)

refusal — Logprob info for refusal message, if present

usageobject

Usage statistics for the completion request.

prompt_tokens, completion_tokens, total_tokens

prompt_tokens_details — cached_tokens

completion_tokens_details — reasoning_tokens

Try it

Request

curl https://apis.opengateway.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ]
  }'

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1734567890,
  "model": "openai/gpt-4o-mini",
  "system_fingerprint": "fp_abc123",
  "service_tier": "default",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 10,
    "total_tokens": 30,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0
    }
  }
}