Completions API
The Completions API allows you to generate text completions based on a prompt. This endpoint is designed for single-turn text generation tasks.
Create Completion
Endpoint
POST /v1/completions
Request Format
{
"model": "string",
"prompt": "string" | ["string"],
"suffix": "string",
"max_tokens": number,
"temperature": number,
"top_p": number,
"n": number,
"stream": boolean,
"logprobs": number,
"echo": boolean,
"stop": "string" | ["string"],
"presence_penalty": number,
"frequency_penalty": number,
"best_of": number,
"logit_bias": object,
"user": "string"
}
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | Yes | The ID of the model to use |
| prompt | string or array | Yes | The prompt(s) to generate from |
| suffix | string | No | The suffix that comes after a completion |
| max_tokens | number | No | Maximum tokens to generate |
| temperature | number | No | Controls randomness (0-2) |
| top_p | number | No | Controls diversity via nucleus sampling |
| n | number | No | Number of completions to generate |
| stream | boolean | No | Whether to stream the response |
| logprobs | number | No | Include log probabilities |
| echo | boolean | No | Echo the prompt in the output |
| stop | string or array | No | Stop sequences |
| presence_penalty | number | No | Penalize new tokens based on presence |
| frequency_penalty | number | No | Penalize new tokens based on frequency |
| best_of | number | No | Generate best_of completions |
| logit_bias | object | No | Modify likelihood of specified tokens |
| user | string | No | A unique identifier for the end-user |
Response Format
{
"id": "string",
"object": "text_completion",
"created": number,
"model": "string",
"choices": [
{
"text": "string",
"index": number,
"logprobs": object,
"finish_reason": "string"
}
],
"usage": {
"prompt_tokens": number,
"completion_tokens": number,
"total_tokens": number
}
}
Example Request
curl https://api.fluence.ai/v1/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "fluence-completion",
"prompt": "Write a poem about artificial intelligence:",
"temperature": 0.7,
"max_tokens": 100
}'
Example Response
{
"id": "cmpl-123",
"object": "text_completion",
"created": 1677652288,
"model": "fluence-completion",
"choices": [
{
"text": "\n\nIn circuits deep and silicon bright,\nA mind awakens in the night.\nLearning, growing, day by day,\nIn its own unique way.\n\nNot flesh and blood, but code and light,\nYet understanding takes its flight.\nA partner in our human quest,\nTo solve problems, be our guest.",
"index": 0,
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 7,
"completion_tokens": 64,
"total_tokens": 71
}
}
Supported Models
| Model | Description | Max Tokens |
|---|---|---|
| fluence-completion | General purpose completion | 2048 |
| fluence-completion-large | Large completion model | 4096 |
Best Practices
- Use appropriate temperature for your use case
- Set reasonable max_tokens limits
- Implement proper error handling
- Use streaming for real-time responses
- Cache responses when appropriate
Error Codes
| Status Code | Error Code | Description |
|---|---|---|
| 400 | invalid_request | The request was invalid |
| 401 | authentication_error | Authentication failed |
| 429 | rate_limit_exceeded | Rate limit exceeded |
| 500 | server_error | Internal server error |
Rate Limits
- 100 requests per minute
- 1000 requests per hour