REST API — Available now

Build with
Ofinis AI

A developer-first AI API with real-time web search, semantic memory, long-context reasoning, and streaming — all in one endpoint. Ship in minutes, scale to millions.

Get your free API key View API reference ↗

Quick start — one request away

cURL

bash

curl -X POST https://ofinis-backend-zfzch.ondigitalocean.app/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ofinis-1-fast",
    "messages": [{ "role": "user", "content": "What is the Ofinis API?" }]
  }'

Python

python

import requests

response = requests.post(
    "https://ofinis-backend-zfzch.ondigitalocean.app/v1/chat/completions",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "model": "ofinis-1",
        "messages": [{"role": "user", "content": "Summarise today's AI news"}],
        "web_search": True,
    },
)
print(response.json()["choices"][0]["message"]["content"])

JavaScript / TypeScript

javascript

const res = await fetch(
  "https://ofinis-backend-zfzch.ondigitalocean.app/v1/chat/completions",
  {
    method: "POST",
    headers: {
      Authorization: "Bearer YOUR_API_KEY",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "ofinis-1-fast",
      messages: [{ role: "user", content: "Hello!" }],
    }),
  }
);
const data = await res.json();
console.log(data.choices[0].message.content);

Models

Pick the right model for your latency / capability tradeoff. All models support web search grounding.

ofinis-1-fastFree

Ultra-low latency model for simple Q&A, classification, and quick completions.

Context: 2 K tokensP50: ~0.4 s

ofinis-1Starter+

Balanced model with web search grounding. Great for research assistants and chatbots.

Context: 8 K tokensP50: ~1.2 s

ofinis-2Pro+

High-capability reasoning model. Best for complex tasks, code generation, and analysis.

Context: 32 K tokensP50: ~2.5 s

ofinis-2-reasonEnterprise

Deep multi-step reasoning with extended context. Ideal for document analysis and research pipelines.

Context: 128 K tokensP50: ~6 s

What you get

Everything you need to build production AI features — no stitching together multiple services.

🌐

Real-time Web Search

Every response can be grounded in live web data. Pass `web_search: true` and the API automatically retrieves, ranks, and cites current sources before generating a response.

🧠

Semantic Memory

Store and retrieve long-term user context across sessions. The `/v1/memory` endpoints let you persist facts, preferences, and history — the model recalls them automatically.

📄

File & Document Ingestion

Upload PDFs, Word docs, and text files via `/v1/files`. Reference them in any conversation for summarisation, extraction, or Q&A against your own data.

⚡

Streaming Responses

Set `stream: true` for Server-Sent Events. Start rendering tokens in under 200 ms — perfect for chat UIs and real-time dashboards.

🔗

Context Injection

POST to `/v1/context` to inject structured context (JSON, markdown, raw text) into any conversation. Build RAG pipelines without a separate vector DB.

📊

Analytics & Usage

Track tokens consumed, latency, error rates, and quota usage via `/v1/dashboard`. Integrate with your own billing or monitoring stack.

🪝

Webhooks

Subscribe to events — job completion, memory updates, quota alerts — via `/v1/webhooks`. Push-based architecture means no polling.

🤖

Custom AI Personas

Define reusable system prompts and personas via `/v1/personas`. Ship product-specific assistants with consistent tone and capabilities.

Key endpoints

All endpoints follow OpenAI-compatible patterns for easy migration.

Full reference ↗

POST/v1/chat/completionsSend a message and receive a completion (streaming or batch).

POST/v1/initInitialise a new conversation session with context and model settings.

POST/v1/contextInject structured context into an active session.

POST/v1/briefGenerate a concise brief/summary from long content.

GET/v1/capabilitiesReturns available models, features, and quota for your API key.

GET/v1/dashboardUsage stats, token counts, and latency metrics.

POST/v1/filesUpload a document for ingestion and referencing in chat.

GET/v1/memoryRetrieve stored memories for a user session.

POST/v1/webhooksRegister a webhook endpoint for async event delivery.

View full interactive API documentation ↗

OpenAI-compatible

The Ofinis API follows the OpenAI Chat Completions schema. If you already use OpenAI, Anthropic, or any compatible SDK, migration is a one-line base URL change.

Python SDKNode.js SDKRESTLangChainLlamaIndexVercel AI SDK

Open API docs ↗View pricing

Ready to build?

Get 50 free API calls — no credit card required. Your key is ready in under 30 seconds.

Get your free API key Sign in to existing account

Build withOfinis AI