Build with
Ofinis AI
A developer-first AI API with real-time web search, semantic memory, long-context reasoning, and streaming — all in one endpoint. Ship in minutes, scale to millions.
Quick start — one request away
cURL
curl -X POST https://ofinis-backend-zfzch.ondigitalocean.app/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "ofinis-1-fast",
"messages": [{ "role": "user", "content": "What is the Ofinis API?" }]
}'Python
import requests
response = requests.post(
"https://ofinis-backend-zfzch.ondigitalocean.app/v1/chat/completions",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"model": "ofinis-1",
"messages": [{"role": "user", "content": "Summarise today's AI news"}],
"web_search": True,
},
)
print(response.json()["choices"][0]["message"]["content"])JavaScript / TypeScript
const res = await fetch(
"https://ofinis-backend-zfzch.ondigitalocean.app/v1/chat/completions",
{
method: "POST",
headers: {
Authorization: "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "ofinis-1-fast",
messages: [{ role: "user", content: "Hello!" }],
}),
}
);
const data = await res.json();
console.log(data.choices[0].message.content);Models
Pick the right model for your latency / capability tradeoff. All models support web search grounding.
Ultra-low latency model for simple Q&A, classification, and quick completions.
Balanced model with web search grounding. Great for research assistants and chatbots.
High-capability reasoning model. Best for complex tasks, code generation, and analysis.
Deep multi-step reasoning with extended context. Ideal for document analysis and research pipelines.
What you get
Everything you need to build production AI features — no stitching together multiple services.
Real-time Web Search
Every response can be grounded in live web data. Pass `web_search: true` and the API automatically retrieves, ranks, and cites current sources before generating a response.
Semantic Memory
Store and retrieve long-term user context across sessions. The `/v1/memory` endpoints let you persist facts, preferences, and history — the model recalls them automatically.
File & Document Ingestion
Upload PDFs, Word docs, and text files via `/v1/files`. Reference them in any conversation for summarisation, extraction, or Q&A against your own data.
Streaming Responses
Set `stream: true` for Server-Sent Events. Start rendering tokens in under 200 ms — perfect for chat UIs and real-time dashboards.
Context Injection
POST to `/v1/context` to inject structured context (JSON, markdown, raw text) into any conversation. Build RAG pipelines without a separate vector DB.
Analytics & Usage
Track tokens consumed, latency, error rates, and quota usage via `/v1/dashboard`. Integrate with your own billing or monitoring stack.
Webhooks
Subscribe to events — job completion, memory updates, quota alerts — via `/v1/webhooks`. Push-based architecture means no polling.
Custom AI Personas
Define reusable system prompts and personas via `/v1/personas`. Ship product-specific assistants with consistent tone and capabilities.
Key endpoints
All endpoints follow OpenAI-compatible patterns for easy migration.
/v1/chat/completionsSend a message and receive a completion (streaming or batch)./v1/initInitialise a new conversation session with context and model settings./v1/contextInject structured context into an active session./v1/briefGenerate a concise brief/summary from long content./v1/capabilitiesReturns available models, features, and quota for your API key./v1/dashboardUsage stats, token counts, and latency metrics./v1/filesUpload a document for ingestion and referencing in chat./v1/memoryRetrieve stored memories for a user session./v1/webhooksRegister a webhook endpoint for async event delivery.OpenAI-compatible
The Ofinis API follows the OpenAI Chat Completions schema. If you already use OpenAI, Anthropic, or any compatible SDK, migration is a one-line base URL change.
Ready to build?
Get 50 free API calls — no credit card required. Your key is ready in under 30 seconds.