Skip to main content

Completions API

The Completions API allows you to generate text completions based on a prompt. This endpoint is designed for single-turn text generation tasks.

Create Completion

Endpoint

POST /v1/completions

Request Format

{
"model": "string",
"prompt": "string" | ["string"],
"suffix": "string",
"max_tokens": number,
"temperature": number,
"top_p": number,
"n": number,
"stream": boolean,
"logprobs": number,
"echo": boolean,
"stop": "string" | ["string"],
"presence_penalty": number,
"frequency_penalty": number,
"best_of": number,
"logit_bias": object,
"user": "string"
}

Parameters

ParameterTypeRequiredDescription
modelstringYesThe ID of the model to use
promptstring or arrayYesThe prompt(s) to generate from
suffixstringNoThe suffix that comes after a completion
max_tokensnumberNoMaximum tokens to generate
temperaturenumberNoControls randomness (0-2)
top_pnumberNoControls diversity via nucleus sampling
nnumberNoNumber of completions to generate
streambooleanNoWhether to stream the response
logprobsnumberNoInclude log probabilities
echobooleanNoEcho the prompt in the output
stopstring or arrayNoStop sequences
presence_penaltynumberNoPenalize new tokens based on presence
frequency_penaltynumberNoPenalize new tokens based on frequency
best_ofnumberNoGenerate best_of completions
logit_biasobjectNoModify likelihood of specified tokens
userstringNoA unique identifier for the end-user

Response Format

{
"id": "string",
"object": "text_completion",
"created": number,
"model": "string",
"choices": [
{
"text": "string",
"index": number,
"logprobs": object,
"finish_reason": "string"
}
],
"usage": {
"prompt_tokens": number,
"completion_tokens": number,
"total_tokens": number
}
}

Example Request

curl https://api.fluence.ai/v1/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "fluence-completion",
"prompt": "Write a poem about artificial intelligence:",
"temperature": 0.7,
"max_tokens": 100
}'

Example Response

{
"id": "cmpl-123",
"object": "text_completion",
"created": 1677652288,
"model": "fluence-completion",
"choices": [
{
"text": "\n\nIn circuits deep and silicon bright,\nA mind awakens in the night.\nLearning, growing, day by day,\nIn its own unique way.\n\nNot flesh and blood, but code and light,\nYet understanding takes its flight.\nA partner in our human quest,\nTo solve problems, be our guest.",
"index": 0,
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 7,
"completion_tokens": 64,
"total_tokens": 71
}
}

Supported Models

ModelDescriptionMax Tokens
fluence-completionGeneral purpose completion2048
fluence-completion-largeLarge completion model4096

Best Practices

  1. Use appropriate temperature for your use case
  2. Set reasonable max_tokens limits
  3. Implement proper error handling
  4. Use streaming for real-time responses
  5. Cache responses when appropriate

Error Codes

Status CodeError CodeDescription
400invalid_requestThe request was invalid
401authentication_errorAuthentication failed
429rate_limit_exceededRate limit exceeded
500server_errorInternal server error

Rate Limits

  • 100 requests per minute
  • 1000 requests per hour