Developer docs

Build on the NaraRouter API

One stable, OpenAI-compatible endpoint across leading models. Point any OpenAI SDK at our base URL, authenticate with your key, and ship.

Quickstart

Three steps to your first response: create a key, set the base URL, send a request. The API speaks the OpenAI Chat Completions format, so existing OpenAI clients work by changing two lines.

1. Create an API key

Sign in and open the API keys page to mint a key. The secret is shown once at creation — copy it immediately and store it securely. Keys begin with the prefix sk-nry-.

2. Set the base URL

Send all requests to the gateway base URL. The chat endpoint lives at /v1/chat/completions.

text

https://router.naraya.ai/v1

3. Send your first request

Use cURL or any OpenAI-compatible SDK. Pass a model alias (see Models), your messages, and your key as a Bearer token.

1curl https://router.naraya.ai/v1/chat/completions \2  -H "Authorization: Bearer sk-nry-xxxxxxxx" \3  -H "Content-Type: application/json" \4  -d '{5    "model": "deepseek-3.2",6    "messages": [7      { "role": "user", "content": "Hello!" }8    ]9  }'

Because the API is OpenAI-compatible, the official OpenAI SDKs work unchanged apart from the base URL and key — no provider-specific client required.

Authentication & keys

Every request to the API must carry your secret key in the Authorization header as a Bearer token.

Authorization header

http

Authorization: Bearer sk-nry-xxxxxxxxxxxxxxxxxxxxxxxxxxxx

Key format

Keys are prefixed with sk-nry- followed by a random secret. The full secret is returned only once, at creation or rotation; afterwards only a masked form is shown.

Rotation & revocation

Manage keys from the dashboard. You can rotate a key (issue a new secret and invalidate the old one), revoke a key (it stays listed but no longer authenticates), or delete it. Rotated and revoked secrets stop working immediately.

Keep keys secret

Treat keys like passwords. Never embed them in client-side code, mobile apps, or public repositories. Use server-side environment variables and rotate any key you suspect is exposed.

Chat completions

Send a list of messages and receive a model response. This is the primary endpoint and follows the OpenAI Chat Completions schema.

Endpoint

http

POST https://router.naraya.ai/v1/chat/completions

Request parameters

The gateway routes on the fields below; additional OpenAI-compatible fields in the body are forwarded to the model as-is.

Parameter	Type	Required	Description
`model`	string	Yes	A model alias from the Models list. Determines routing and which plan tier may use it.
`messages`	array	Yes	An array of message objects, each with a role (system, user, or assistant) and content. At least one message is required.
`stream`	boolean	No	When true, the response is delivered as Server-Sent Events. When false (default), a single JSON object is returned.
`temperature`	number	No	Sampling temperature. Forwarded to the model unchanged.
`max_tokens`	integer	No	Upper bound on output tokens. Must be non-negative. Used to size the request; the model honors it as a ceiling.

Request

json

{
  "model": "deepseek-3.2",
  "messages": [
    { "role": "system", "content": "You are helpful." },
    { "role": "user", "content": "Hello!" }
  ],
  "temperature": 0.7,
  "max_tokens": 256
}

Response

json

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "deepseek-3.2",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Hello! How can I help?" },
      "finish_reason": "stop"
    }
  ],
  "usage": { "prompt_tokens": 18, "completion_tokens": 8, "total_tokens": 26 }
}

On success you receive a standard chat completion object. The model field in the response is the public alias you requested.

The gateway validates model, messages, and max_tokens before forwarding. Streaming requests against a model that does not support streaming are rejected.

Streaming

Set stream to true to receive the response incrementally as Server-Sent Events (SSE). Each event carries a JSON delta in the OpenAI streaming format.

Wire format

The connection uses content-type text/event-stream. Each chunk arrives as a data: line containing a JSON delta. The stream terminates with a final data: [DONE] sentinel.

text

data: {"choices":[{"delta":{"content":"Hel"}}]}

data: {"choices":[{"delta":{"content":"lo"}}]}

data: [DONE]

Consuming the stream

Most OpenAI SDKs expose streaming natively. With the official SDK, iterate the streamed chunks and read each delta's content.

python

stream = client.chat.completions.create(
    model="deepseek-3.2",
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

If an error occurs mid-stream, the gateway emits a clean error event followed by [DONE]; an error before the first byte is returned as a normal JSON error response instead.

Models

Pass a model alias in the model field. The list below is loaded live from the public plans endpoint, so it always reflects the models currently offered and which plan tier grants each.

Live from /api/plans. The authenticated /v1/models returns exactly the aliases your own plan entitles.

Some models are reasoning models (for example kimi-k2.5 and gemini-3.1-pro) that spend part of the token budget on internal reasoning before producing an answer. If you set a very low max_tokens, the visible content may come back empty because the budget was used up while reasoning. For these models, use a higher max_tokens.

Rate limits & quotas

Limits depend on your plan. Subscription plans are governed by a per-minute request rate and a daily token quota; the free tier and per-model fair-use caps apply otherwise.

Request rate (per minute)

Each plan sets a maximum number of requests per minute. Exceeding it returns a 429 with a rate_limited error. The window resets every minute.

Daily token quota

Subscription plans count input + output tokens against a daily quota but the quota is per model class, not one account-wide ceiling. Each class (base, Lite, Mocin, Pro) has its own separate daily cap. When a class bucket is reached, only that class returns 429 until the next day; other class buckets keep working. A null cap means fair-use (no hard daily ceiling).

Per-class daily token quotas

Each model belongs to a quota class — base, Lite, Mocin, or Pro — and each class has its own daily token quota. A subscription gets a separate daily quota for every class it can access, and the quotas are independent: using a model only counts against its own class quota and never reduces another. When one quota is exhausted, models in the other classes you have still work until the quotas reset at the start of the next day.

Concurrency

Plans also bound how many requests may run at once. Excess concurrent requests are rejected with 429; retry once an in-flight request completes.

429 behavior

On any limit breach the response status is 429 with the rate_limited error type. A daily token breach is per model tier — the message reads that this model tier's quota is reached, while other model tiers you have access to still work. Back off and retry after the relevant window resets. Per-plan limits are shown on the pricing page and load live below.

Errors

Errors use a single, stable JSON envelope. Switch on the type field rather than parsing the message. The request_id helps correlate a failure with server logs.

Error shape

json

{
  "error": {
    "type": "rate_limited",
    "message": "Rate limit exceeded. Please retry later.",
    "request_id": "req_..."
  }
}

Status codes

Status	Type	Meaning
`400`	`validation_error`	The request was malformed or failed validation (for example, a missing model or messages field).
`401`	`unauthorized`	Missing or invalid API key. Check the Authorization header.
`403`	`forbidden`	Authenticated, but your plan does not include the requested model, or the account is suspended.
`404`	`not_found`	The requested model alias or endpoint does not exist.
`413`	`bad_request`	The request body or input is too large.
`415`	`unsupported_media_type`	Content-Type must be application/json.
`429`	`rate_limited`	Rate limit or a per-model-tier daily token quota was exceeded. A quota breach affects only that model tier — other tiers you have access to still work. Retry after the window resets.
`503`	`service_unavailable`	The model service is temporarily unavailable. Retry with backoff.
`500`	`internal_error`	An unexpected internal error. Retry; if it persists, contact support with the request_id.

Plans & pricing

Pricing is in Rupiah, billed per day or per week. Each tier grants a model set, a request rate, and a daily token quota. Tiers load live below.