Ofinis

Meet Ofinis

Features

Everything you need to build production AI — without stitching together multiple services. Web search, memory, documents, streaming, and more, all in one API.

🌐

Knowledge & Search

Real-time Web Search

web_search: true

Every model call can be optionally grounded in live web data. The API retrieves, ranks, and summarises current sources before generating a response — eliminating stale training cutoffs.

Cited Responses

citations[]

Sources are surfaced as structured citation objects in the API response, so your UI can display references and users can verify claims.

News & Topic Briefing

/v1/brief

POST to /v1/brief to get a concise, cited summary on any topic — powered by live search results aggregated in real time.

🧠

Memory & Context

Semantic Memory

/v1/memory

Store and retrieve long-term user context across sessions. Facts, preferences, and history are automatically recalled and injected into every conversation.

Context Injection

/v1/context

POST structured data — JSON, markdown, plain text — directly into a session context window. Build RAG pipelines without maintaining a separate vector database.

128 K Token Context

ofinis-2-reason

Enterprise models support up to 128 000 tokens per request — enough for entire codebases, legal documents, or research papers.

📄

Documents & Files

File Ingestion

/v1/files

Upload PDFs, Word documents, Markdown, and plain text via /v1/files. Reference them in any conversation for summarisation, Q&A, or extraction.

Document Q&A

document Q&A

Ask questions against your own uploaded documents. The API handles chunking, retrieval, and citation automatically.

Export & Embeddings

/v1/embeddings

Generate vector embeddings for any text via /v1/embeddings. Use them in your own similarity search, clustering, or classification pipelines.

Performance & Delivery

Streaming (SSE)

stream: true

Set stream: true for Server-Sent Events. First tokens arrive in under 200 ms. Ideal for chat UIs and live dashboards.

Priority Queue

Starter+ tiers

Pro and Enterprise plans bypass shared queues for consistent low-latency responses even under peak load.

Async Jobs

/v1/jobs

Submit long-running tasks — batch completions, large file analysis — as background jobs and poll or receive a webhook when complete.

🤖

Customisation & Personas

Custom AI Personas

/v1/personas

Define reusable system prompts, tone, and capabilities via /v1/personas. Ship product-specific assistants that stay on-brand.

Companion Agents

/v1/companions

Configure long-lived AI companions with persistent memory, character, and goals — ideal for consumer apps, tutors, and support agents.

Verticals

/v1/verticals

Pre-configured domain packs for legal, medical, finance, and more — delivering specialised system prompts and retrieval strategies out of the box.

🔗

Platform & Integrations

Webhooks

/v1/webhooks

Subscribe to events — job completion, memory updates, quota alerts — via /v1/webhooks. Push-based architecture; no polling required.

Usage Analytics

/v1/dashboard

Track token consumption, latency percentiles, error rates, and quota burn via /v1/dashboard. Integrate with your own monitoring stack.

Community Feed

/v1/community

Access Ofinis community posts, trending topics, and shared prompts via /v1/community — great for social AI features.

See it in action

Try the live chat or grab your API key — 50 free calls, no credit card needed.