Build on the NaraRouter API
One stable, OpenAI-compatible endpoint across leading models. Point any OpenAI SDK at our base URL, authenticate with your key, and ship.
Quickstart
Three steps to your first response: create a key, set the base URL, send a request. The API speaks the OpenAI Chat Completions format, so existing OpenAI clients work by changing two lines.
2. Set the base URL
Send all requests to the gateway base URL. The chat endpoint lives at /v1/chat/completions.
https://router.naraya.ai/v13. Send your first request
Use cURL or any OpenAI-compatible SDK. Pass a model alias (see Models), your messages, and your key as a Bearer token.
1curl https://router.naraya.ai/v1/chat/completions \2 -H "Authorization: Bearer sk-nry-xxxxxxxx" \3 -H "Content-Type: application/json" \4 -d '{5 "model": "deepseek-3.2",6 "messages": [7 { "role": "user", "content": "Hello!" }8 ]9 }'Because the API is OpenAI-compatible, the official OpenAI SDKs work unchanged apart from the base URL and key — no provider-specific client required.
Authentication & keys
Every request to the API must carry your secret key in the Authorization header as a Bearer token.
Authorization header
Authorization: Bearer sk-nry-xxxxxxxxxxxxxxxxxxxxxxxxxxxxKey format
Keys are prefixed with sk-nry- followed by a random secret. The full secret is returned only once, at creation or rotation; afterwards only a masked form is shown.
Rotation & revocation
Manage keys from the dashboard. You can rotate a key (issue a new secret and invalidate the old one), revoke a key (it stays listed but no longer authenticates), or delete it. Rotated and revoked secrets stop working immediately.
Keep keys secret
Treat keys like passwords. Never embed them in client-side code, mobile apps, or public repositories. Use server-side environment variables and rotate any key you suspect is exposed.
Chat completions
Send a list of messages and receive a model response. This is the primary endpoint and follows the OpenAI Chat Completions schema.
Endpoint
POST https://router.naraya.ai/v1/chat/completionsRequest parameters
The gateway routes on the fields below; additional OpenAI-compatible fields in the body are forwarded to the model as-is.
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | A model alias from the Models list. Determines routing and which plan tier may use it. |
messages | array | Yes | An array of message objects, each with a role (system, user, or assistant) and content. At least one message is required. |
stream | boolean | No | When true, the response is delivered as Server-Sent Events. When false (default), a single JSON object is returned. |
temperature | number | No | Sampling temperature. Forwarded to the model unchanged. |
max_tokens | integer | No | Upper bound on output tokens. Must be non-negative. Used to size the request; the model honors it as a ceiling. |
Request
{
"model": "deepseek-3.2",
"messages": [
{ "role": "system", "content": "You are helpful." },
{ "role": "user", "content": "Hello!" }
],
"temperature": 0.7,
"max_tokens": 256
}Response
{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "deepseek-3.2",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "Hello! How can I help?" },
"finish_reason": "stop"
}
],
"usage": { "prompt_tokens": 18, "completion_tokens": 8, "total_tokens": 26 }
}On success you receive a standard chat completion object. The model field in the response is the public alias you requested.
The gateway validates model, messages, and max_tokens before forwarding. Streaming requests against a model that does not support streaming are rejected.
Streaming
Set stream to true to receive the response incrementally as Server-Sent Events (SSE). Each event carries a JSON delta in the OpenAI streaming format.
Wire format
The connection uses content-type text/event-stream. Each chunk arrives as a data: line containing a JSON delta. The stream terminates with a final data: [DONE] sentinel.
data: {"choices":[{"delta":{"content":"Hel"}}]}
data: {"choices":[{"delta":{"content":"lo"}}]}
data: [DONE]Consuming the stream
Most OpenAI SDKs expose streaming natively. With the official SDK, iterate the streamed chunks and read each delta's content.
stream = client.chat.completions.create(
model="deepseek-3.2",
messages=[{"role": "user", "content": "Tell me a story."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)If an error occurs mid-stream, the gateway emits a clean error event followed by [DONE]; an error before the first byte is returned as a normal JSON error response instead.
Models
Pass a model alias in the model field. The list below is loaded live from the public plans endpoint, so it always reflects the models currently offered and which plan tier grants each.
Live from /api/plans. The authenticated /v1/models returns exactly the aliases your own plan entitles.
Some models are reasoning models (for example kimi-k2.5 and gemini-3.1-pro) that spend part of the token budget on internal reasoning before producing an answer. If you set a very low max_tokens, the visible content may come back empty because the budget was used up while reasoning. For these models, use a higher max_tokens.
Rate limits & quotas
Limits depend on your plan. Subscription plans are governed by a per-minute request rate and a daily token quota; the free tier and per-model fair-use caps apply otherwise.
Request rate (per minute)
Each plan sets a maximum number of requests per minute. Exceeding it returns a 429 with a rate_limited error. The window resets every minute.
Daily token quota
Subscription plans count input + output tokens against a daily quota but the quota is per model class, not one account-wide ceiling. Each class (base, Lite, Mocin, Pro) has its own separate daily cap. When a class bucket is reached, only that class returns 429 until the next day; other class buckets keep working. A null cap means fair-use (no hard daily ceiling).
Per-class daily token quotas
Each model belongs to a quota class — base, Lite, Mocin, or Pro — and each class has its own daily token quota. A subscription gets a separate daily quota for every class it can access, and the quotas are independent: using a model only counts against its own class quota and never reduces another. When one quota is exhausted, models in the other classes you have still work until the quotas reset at the start of the next day.
Concurrency
Plans also bound how many requests may run at once. Excess concurrent requests are rejected with 429; retry once an in-flight request completes.
429 behavior
On any limit breach the response status is 429 with the rate_limited error type. A daily token breach is per model tier — the message reads that this model tier's quota is reached, while other model tiers you have access to still work. Back off and retry after the relevant window resets. Per-plan limits are shown on the pricing page and load live below.
Errors
Errors use a single, stable JSON envelope. Switch on the type field rather than parsing the message. The request_id helps correlate a failure with server logs.
Error shape
{
"error": {
"type": "rate_limited",
"message": "Rate limit exceeded. Please retry later.",
"request_id": "req_..."
}
}Status codes
| Status | Type | Meaning |
|---|---|---|
400 | validation_error | The request was malformed or failed validation (for example, a missing model or messages field). |
401 | unauthorized | Missing or invalid API key. Check the Authorization header. |
403 | forbidden | Authenticated, but your plan does not include the requested model, or the account is suspended. |
404 | not_found | The requested model alias or endpoint does not exist. |
413 | bad_request | The request body or input is too large. |
415 | unsupported_media_type | Content-Type must be application/json. |
429 | rate_limited | Rate limit or a per-model-tier daily token quota was exceeded. A quota breach affects only that model tier — other tiers you have access to still work. Retry after the window resets. |
503 | service_unavailable | The model service is temporarily unavailable. Retry with backoff. |
500 | internal_error | An unexpected internal error. Retry; if it persists, contact support with the request_id. |
Pricing is in Rupiah, billed per day or per week. Each tier grants a model set, a request rate, and a daily token quota. Tiers load live below.