Meet Ofinis
Features
Everything you need to build production AI — without stitching together multiple services. Web search, memory, documents, streaming, and more, all in one API.
Knowledge & Search
Real-time Web Search
web_search: trueEvery model call can be optionally grounded in live web data. The API retrieves, ranks, and summarises current sources before generating a response — eliminating stale training cutoffs.
Cited Responses
citations[]Sources are surfaced as structured citation objects in the API response, so your UI can display references and users can verify claims.
News & Topic Briefing
/v1/briefPOST to /v1/brief to get a concise, cited summary on any topic — powered by live search results aggregated in real time.
Memory & Context
Semantic Memory
/v1/memoryStore and retrieve long-term user context across sessions. Facts, preferences, and history are automatically recalled and injected into every conversation.
Context Injection
/v1/contextPOST structured data — JSON, markdown, plain text — directly into a session context window. Build RAG pipelines without maintaining a separate vector database.
128 K Token Context
ofinis-2-reasonEnterprise models support up to 128 000 tokens per request — enough for entire codebases, legal documents, or research papers.
Documents & Files
File Ingestion
/v1/filesUpload PDFs, Word documents, Markdown, and plain text via /v1/files. Reference them in any conversation for summarisation, Q&A, or extraction.
Document Q&A
document Q&AAsk questions against your own uploaded documents. The API handles chunking, retrieval, and citation automatically.
Export & Embeddings
/v1/embeddingsGenerate vector embeddings for any text via /v1/embeddings. Use them in your own similarity search, clustering, or classification pipelines.
Performance & Delivery
Streaming (SSE)
stream: trueSet stream: true for Server-Sent Events. First tokens arrive in under 200 ms. Ideal for chat UIs and live dashboards.
Priority Queue
Starter+ tiersPro and Enterprise plans bypass shared queues for consistent low-latency responses even under peak load.
Async Jobs
/v1/jobsSubmit long-running tasks — batch completions, large file analysis — as background jobs and poll or receive a webhook when complete.
Customisation & Personas
Custom AI Personas
/v1/personasDefine reusable system prompts, tone, and capabilities via /v1/personas. Ship product-specific assistants that stay on-brand.
Companion Agents
/v1/companionsConfigure long-lived AI companions with persistent memory, character, and goals — ideal for consumer apps, tutors, and support agents.
Verticals
/v1/verticalsPre-configured domain packs for legal, medical, finance, and more — delivering specialised system prompts and retrieval strategies out of the box.
Platform & Integrations
Webhooks
/v1/webhooksSubscribe to events — job completion, memory updates, quota alerts — via /v1/webhooks. Push-based architecture; no polling required.
Usage Analytics
/v1/dashboardTrack token consumption, latency percentiles, error rates, and quota burn via /v1/dashboard. Integrate with your own monitoring stack.
Community Feed
/v1/communityAccess Ofinis community posts, trending topics, and shared prompts via /v1/community — great for social AI features.
See it in action
Try the live chat or grab your API key — 50 free calls, no credit card needed.