POST/v1/chat/completionsCreate chat completion
Creates a model response for the given chat conversation. Compatible with OpenAI API.
Headers
AuthorizationstringRequiredBearer token for authentication. Format: Bearer YOUR_API_KEY
x-opengateway-user-idstringOptionalUser identifier for analytics and tracking.
x-opengateway-session-idstringOptionalSession identifier for analytics and tracking.
Request body
modelstringRequiredID of the model to use. See the model list for available models.
Use the owner/model format (e.g., openai/gpt-4o).
messagesarrayRequiredA list of messages comprising the conversation so far.
rolestringThe role: system, user, assistant, or tool
contentstring | arrayThe contents of the message. Can be a string or an array of content parts for multimodal input.
Content part types (when array):
{ type: "text", text: "..." } — Text content
{ type: "image_url", image_url: { url: "...", detail: "auto" } } — Image input (URL or base64). detail: auto | low | high | original (Optional, default: auto)
namestringOptionalAn optional name for the participant.
tool_callsarrayassistant onlyTool calls generated by the model (present in assistant messages).
tool_call_idstringtool role onlyThe ID of the tool call this message is responding to.
▶cache_controlobjectOptional
Prompt caching control.
Gemini: converts marked prefix to cached content.
Anthropic: auto-injected by gateway.
type: "ephemeral" (required)
ttl: e.g. "600s" — Gemini only, defaults to 10 minutes
temperaturenumberOptionalSampling temperature between 0 and 2. Higher values make output more random. Defaults to 1. Reasoning models (o-series) force this to 1.
top_pnumberOptionalNucleus sampling. The model considers tokens with top_p probability mass. Value between 0 and 1. Defaults to 1. Not supported with reasoning models (o-series).
max_tokensintegerOptionalThe maximum number of tokens to generate in the chat completion.
max_completion_tokensintegerOptionalAn upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens. Alternative to max_tokens. Recommended for reasoning models.
streambooleanOptionalIf set to true, partial message deltas will be sent as server-sent events. Defaults to false.
stream_optionsobjectOptionalOptions for streaming response. Only set this when stream is true.
include_usagebooleanIf set, an additional chunk will be streamed with usage statistics.
frequency_penaltynumberOptionalNumber between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far. Defaults to 0. Not supported with reasoning models (o-series).
presence_penaltynumberOptionalNumber between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far. Defaults to 0. Not supported with reasoning models (o-series).
stopstring | arrayOptionalUp to 4 sequences where the API will stop generating further tokens. Not supported with newer reasoning models (o4-mini, gpt-5).
seedintegerOptionalIf specified, the system will make a best effort to sample deterministically for reproducible outputs.
logprobsbooleanOptionalWhether to return log probabilities of the output tokens. Defaults to false. Not supported with reasoning models (o-series).
top_logprobsintegerOptionalAn integer between 0 and 20 specifying the number of most likely tokens to return at each position, each with an associated log probability. logprobs must be set to true.
toolsarrayOptionalA list of tools the model may call. Currently, only functions are supported.
typestringThe type of tool. Currently only function is supported.
functionobjectThe function definition: name (required), description (optional), parameters (JSON Schema object, optional).
tool_choicestring | objectOptionalControls which tool is called. none disables tools, auto lets the model decide, required forces a tool call. Or specify a function: { "type": "function", "function": { "name": "my_func" } }.
response_formatobjectOptionalSpecifies the format of the output. Set type to json_object for JSON mode, json_schema for schema output where the provider supports it, or text (default).
reasoning_effortstringOptionalConstrains effort on reasoning for reasoning models. Supported by o-series and gpt-5 models. Accepted values: none, minimal, low, medium, high, xhigh. Supported values vary by model.
userstringOptionalA unique identifier representing your end-user, which can help to monitor and detect abuse.
nintegerOptionalHow many chat completion choices to generate for each input message. Between 1 and 128. Defaults to 1. Not supported with stream: true.
parallel_tool_callsbooleanOptionalWhether to enable parallel function calling during tool use. Defaults to true.
service_tierstringOptionalSpecifies the latency tier to use for processing the request. auto, default, or flex. When not set, the default service tier is used.
storebooleanOptionalProvider store flag for providers that support stored chat completion outputs.
metadatamapOptionalSet of up to 16 key-value pairs that can be attached to the object. Keys are up to 64 characters, values up to 512 characters.
logit_biasmapOptionalModify the likelihood of specified tokens appearing in the completion. Maps token IDs to bias values from -100 to 100. Not supported with reasoning models (o-series).
extraobjectOptionalOpenGateway-specific extension parameters.
fallbacksarrayList of fallback model IDs to try if the primary target fails. The resolved target list is capped at three targets including the primary target.
Returns
Returns a chat completion object, or a streamed sequence of chat completion chunk objects if streaming is enabled.
idstringA unique identifier for the chat completion.
objectstringThe object type, always chat.completion.
system_fingerprintstringThis fingerprint represents the backend configuration that the model runs with.
service_tierstringThe service tier used for processing the request.
choicesarrayA list of chat completion choices. When n > 1, each choice has a unique index.
index — The index of this choice in the list.message — The generated message (role, content, tool_calls).finish_reason — stop | length | tool_calls | content_filterlogprobs — Log probability information (when logprobs=true). Contains:content[] — Per-token logprob array. Each entry: token, logprob, bytescontent[].top_logprobs[] — Top N candidates (when top_logprobs is set)refusal — Logprob info for refusal message, if presentusageobjectUsage statistics for the completion request.
prompt_tokens, completion_tokens, total_tokensprompt_tokens_details — cached_tokenscompletion_tokens_details — reasoning_tokensTry it
Request
curl https://apis.opengateway.ai/v1/chat/completions \-H "Content-Type: application/json" \-H "Authorization: Bearer $API_KEY" \-d '{"model": "openai/gpt-4o-mini","messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello!"}]}'
Response
{"id": "chatcmpl-abc123","object": "chat.completion","created": 1734567890,"model": "openai/gpt-4o-mini","system_fingerprint": "fp_abc123","service_tier": "default","choices": [{"index": 0,"message": {"role": "assistant","content": "Hello! How can I help you today?"},"finish_reason": "stop"}],"usage": {"prompt_tokens": 20,"completion_tokens": 10,"total_tokens": 30,"prompt_tokens_details": {"cached_tokens": 0},"completion_tokens_details": {"reasoning_tokens": 0}}}