Skip to main content

Chat API

The Chat API allows you to interact with Fluence's conversational AI models. This endpoint is designed for chat-based interactions and supports various features like streaming, function calling, and more.

Endpoint

POST /v1/chat/completions

Request Format

{
"model": "string",
"messages": [
{
"role": "string",
"content": "string"
}
],
"temperature": number,
"max_tokens": number,
"stream": boolean
}

Parameters

ParameterTypeRequiredDescription
modelstringYesThe ID of the model to use
messagesarrayYesArray of message objects
temperaturenumberNoControls randomness (0-2)
max_tokensnumberNoMaximum tokens to generate
streambooleanNoWhether to stream the response

Message Object

{
"role": "string",
"content": "string"
}

Response Format

{
"id": "string",
"object": "chat.completion",
"created": number,
"model": "string",
"choices": [
{
"index": number,
"message": {
"role": "string",
"content": "string"
},
"finish_reason": "string"
}
],
"usage": {
"prompt_tokens": number,
"completion_tokens": number,
"total_tokens": number
}
}

Example Request

curl https://api.fluence.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "fluence-chat",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"temperature": 0.7
}'

Example Response

{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "fluence-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "I'm doing well, thank you for asking! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}

Error Codes

Status CodeError CodeDescription
400invalid_requestThe request was invalid
401authentication_errorAuthentication failed
429rate_limit_exceededRate limit exceeded
500server_errorInternal server error

Rate Limits

  • 100 requests per minute
  • 1000 requests per hour

Best Practices

  1. Always include error handling in your implementation
  2. Use streaming for real-time responses
  3. Implement retry logic with exponential backoff
  4. Cache responses when appropriate
  5. Monitor your token usage