One control plane for all your inference ops
Retanu sits between your apps and the inference providers — picking the right model, enforcing budgets, and tracking every request automatically.
Automatic model selection
Every request is classified as simple or complex. Simple tasks go to cost-efficient models. Complex tasks go to stronger ones. The system picks the cheapest provider in the quality tier you set, and fails over automatically if a provider is down or slow.
- 3 quality tiers — quality-first, balanced, cost-optimized
- Fully automatic — classify, route, and failover on every request
- ~2 s median response — measured end-to-end
- 4 providers — OpenAI, Anthropic, DeepInfra, Google
Per-workspace configuration
Each workspace gets its own configuration — quality tier, provider preferences, budget cap, and data policy. Change a setting and the next request uses the new rules. No redeploy, no code change.
- Per-workspace controls — tier, providers, budget, limits, model lists
- Instant changes — new settings apply on the next request
- Isolated credentials — one endpoint and key per workspace
Budget and safety controls
Spend caps
Set a monthly budget per workspace. When it's hit, requests are blocked — not just logged. 98% budget accuracy.
Rate limits
Per-workspace request limits enforced before any inference call is made. Prevents runaway usage from a single app or team.
Fail-closed design
If the budget check can't run (database issue, network problem), the request is rejected. No silent overspend.
Usage tracking and reporting
Every request is tracked with token counts and cost. Usage rolls up into per-workspace reports you can generate with your own rates applied. You use your own provider keys — providers bill you, and you can re-bill workspaces at your own rates.
Real-time spend
See costs per workspace as requests flow through — not after the monthly bill.
Custom reports
Generate usage reports with your own rate card and markup applied.
Your keys, your pricing
You connect your own provider keys. Provider charges land on your account. You decide how to price it for your teams or clients.
Data isolation
Every organization's data is separated at the database level using row-level security. One organization cannot access another's data, keys, or logs. This isolation is verified by 16 automated tests that run continuously.
Database-enforced isolation
Security policies on every data table. The application cannot bypass them — it's enforced by the database itself.
Separate credentials
Each workspace gets its own API key and endpoint. Keys, configuration, and logs never cross organization boundaries.
Drop-in API
Retanu uses the same API format as OpenAI. If your app already calls OpenAI, it works with Retanu — just change the URL and key. No code rewrite needed.
- OpenAI-compatible — same request/response format
- Request tagging — add metadata to track usage by department or project
- Health checks — monitoring endpoints included
- Standard auth — Bearer token scoped to each workspace
See it in action
Sign in to get your organization set up. Run Retanu yourself, or let Black Gibbon Tech Pod be your ops team.