Skip to main content

Chat Completions

POST 

/v1/chat/completions

Query LLM using text and, for vision-capable models, image data. This endpoint supports multiple models (up to 5, comma-separated) in the model field. Specify the appropriate request format depending on whether the model is text-based or vision-capable.

An array of responses: will be returned for multi-model queries.

Please refer to the ChatCompletionRequest for text-based models and VisionChatCompletionRequest for vision-capable models for detailed request formats.

Request

Body

Chat Completion query which can include either text or image data based on the model capability. Refer to the appropriate schema based on the model type.

    oneOf
    messages object[]required
  • Array [
  • role string

    Possible values: [assistant, system, user]

    Message role

    content string

    Content of the message

  • ]
  • model stringrequired

    Language Model to use, comma separated, up to 5 models. Check our /models route for list of language models. Cannot use provider option with multi-model.

    provider string

    AI Service Provider to use, omit provider and we will automatically use the most responsive provider. Optionally you can include provider with the model instead such as model: provider/model

    rag_tune string

    Specify the name of the rag tune or vector collection to be used for RAG tuning. This will augment the language model query with information from the specified vector database.

    routing string

    Define how we should route your call when you do not specify a provider and multiple providers exist for that model. Options are price (cheapest), multi-tiered performance (routing based on lowest latency for prompt size) and average_latency. The default is perf (price/perf/perf_avg)

    temperature number

    Possible values: <= 2

    Influences the randomness in the selection process of the next token. Lower values make the model more deterministic, while higher values increase diversity but might reduce coherence.

    top_p number

    Possible values: <= 1

    Controls the cumulative probability distribution cutoff, selecting the smallest set of tokens whose cumulative probability exceeds the threshold p. This focuses generation on more likely tokens, enhancing creativity and coherence.

    top_k integer

    Limits the selection pool to the top k most probable tokens. The probability distribution is then reranked among these k tokens, which helps in reducing randomness by eliminating the least likely options.

    frequency_penalty number

    Possible values: >= -2 and <= 2

    Adjusts the likelihood of a token's selection based on its previous occurrences, decreasing the chances of frequently selected tokens to promote diversity in the output.

    presence_penalty number

    Possible values: >= -2 and <= 2

    Similar to frequency penalty, but it decreases the likelihood of tokens appearing again based on their presence, regardless of frequency, to encourage novel token selection.

    repetition_penalty number

    Possible values: >= 1 and <= 2

    Discourages the model from repeating the same words or phrases, enhancing the uniqueness and variety of the content generated.

    n integer

    Possible values: >= 1 and <= 5

    receive this many responses to your prompt, currently only works with OpenAI direct

    beam_size integer

    Possible values: >= 1 and <= 5

    Used in beam search, it represents the number of sequences to keep at each step of the generation. A larger beam size increases the chances of finding a more optimal sequence but at the cost of computational resources and time. Only some models support this

    max_tokens integer

    Possible values: >= 1

    Max tokens used to generate a response

    stream boolean

    Return response in event-stream

    tools object[]

    Optional tools/functions, all models support tools, serviced by an Open AI or Anthropic model of your choice. Function calls are handled in Open AI standard request/response format.

  • Array [
  • type string

    Possible values: [function]

    function object
    name string

    Name of the function to call

    description string

    Description of the function

    parameters object

    Parameters required by the function

  • ]
  • tool_choice string

    Specifies how the tools are chosen. "none" means the model will not call any tool and instead generates a message, this is default. "auto" means the model can pick between generating a message or calling one or more tools, this is how most devs use it. "required" means the model must call one or more tools.

    tools_model string

    Default value: gpt-4o-mini

    Specifies the model to use for processing tools, pass any OpenAI or Anthropic model, gpt-4o-mini is default.

    integrity integer

    Possible values: [12, 13]

    Integrity setting, can be 12 or 13, used to query and return best of two answers or the best of 3 answers.

    integrity_model string

    Default value: gpt-4o

    Specifies model to use for integrity checks, currently only supports OpenAI models.

    force_provider boolean

    Force request to be routed to the specified provider, otherwise request will be routed to the requested provider only if it is available

Responses

Result of the Query

Schema
    id stringrequired

    Unique identifier for the completion.

    object stringrequired

    Type of the returned object, usually set to chat.completion.

    created integerrequired

    UTC timestamp of when the completion was created.

    provider stringrequired

    Provider of the AI service.

    model stringrequired

    AI model used for generating the completion.

    choices object[]required

    Array of possible completion options generated by the model.

  • Array [
  • index integerrequired

    Index of the choice in the array.

    message objectrequired
    role string

    Role of the message, such as user or assistant.

    content string

    Content of the message.

    logprobs objectnullable

    Log probabilities for the completion, can be null.

    finish_reason stringrequired

    Reason why the model stopped generating text.

  • ]
  • usage objectrequired
    prompt_tokens integerrequired

    Number of tokens used in the prompt.

    completion_tokens integerrequired

    Number of tokens generated in the completion.

    total_tokens integerrequired

    Total number of tokens used in both prompt and completion.

    prompt_characters integerrequired

    Number of characters used in the prompt.

    response_characters integerrequired

    Number of characters in the response.

    cost floatrequired

    Cost associated with the computation of the completion.

    latency_ms integerrequired

    Latency in milliseconds for the completion to be generated.

    system_fingerprint stringrequired

    Unique fingerprint of the system configuration used.

    streaming object

    Object detailing streaming data chunks if stream is true.

    type string

    Type of the streamed object, usually chat.completion.chunk.

    chunks object[]

    Array of data chunks streamed.

  • Array [
  • content string

    Content of the streamed data chunk.

    index integer

    Index of the data chunk in the stream.

    finish_reason string

    Reason for the finish of this particular chunk.

    usage string

    tokens, characters, latency and cost for this query

  • ]
Loading...