How rate limits work
When your administrator configures a model, they can set a maximum number of requests per minute that any single user may send to that model. If you exceed that limit, OpenOpen8 returns an HTTP429 Too Many Requests response and rejects the request — it is not queued or retried automatically.
Rate limits are tracked independently per model. Sending many requests to gpt-4o does not count against your limit for claude-3-5-sonnet.
Rate limits protect the shared infrastructure. If your workload regularly hits the limit, contact your administrator to request a higher limit for your account or group.
Detecting rate limit errors
A rate-limited request returns:- HTTP status:
429 Too Many Requests - Response body: a JSON error object describing the rate limit
429 in your error-handling logic to distinguish rate limit errors from authentication errors (401) or upstream provider errors (5xx).
Handling rate limits in your code
The most common strategies for dealing with rate limits are exponential backoff and request queuing.- Exponential backoff
- Request queue
Retry the request after a delay that increases with each failed attempt. This avoids hammering the gateway while giving it time to recover.
Tips for staying within limits
- Batch requests where possible — some models support sending multiple prompts in one request, which counts as a single request against the rate limit.
- Cache responses — if your application asks the same question repeatedly, cache the response instead of re-requesting it.
- Spread load across tokens — if you control multiple tokens, distributing requests across them does not bypass per-user limits (limits apply per user, not per token), but it can help with per-token quota management.
- Use less expensive models for high-volume tasks — switching from a large model to a smaller one for tasks that don’t require the larger model reduces both rate limit pressure and credit consumption.
Requesting higher limits
Rate limits are set by your administrator at the model level. If your use case requires a higher limit, reach out to your admin and describe:- Which model(s) you need higher limits for
- Your expected request volume (requests per minute)
- The nature of your workload (batch processing, interactive application, etc.)