Skip to main content
OpenOpen8 exposes Google Gemini-compatible endpoints at the /v1beta/models/{model} path, matching the format used by the Google AI SDKs and the Gemini REST API. You authenticate with your OpenOpen8 token using the x-goog-api-key header or a key query parameter — no Google credentials needed. OpenOpen8 routes the request to whichever upstream channel is configured for the model you specify in the URL path.

Endpoints

MethodPathDescription
POST/v1beta/models/{model}:generateContentGenerate a response (non-streaming)
POST/v1beta/models/{model}:streamGenerateContentGenerate a response as a stream
The model name is part of the URL path, not the request body. For example, to use gemini-2.0-flash, send a POST to /v1beta/models/gemini-2.0-flash:generateContent.

Authentication

Pass your OpenOpen8 token using either method: Header (recommended):
x-goog-api-key: YOUR_TOKEN
Query parameter:
POST /v1beta/models/gemini-2.0-flash:generateContent?key=YOUR_TOKEN

Request parameters

contents
object[]
required
The conversation history as an array of content objects. Each object has a role and a parts array.
systemInstruction
object
A system prompt. Structure is the same as a contents item: an object with a parts array. Only text parts are supported for system instructions.
{
  "systemInstruction": {
    "parts": [{"text": "You are a helpful assistant."}]
  }
}
Also accepted as system_instruction (snake_case).
generationConfig
object
Parameters that control how the model generates output. Also accepted as generation_config.
tools
object[]
Tools the model may use. Supports functionDeclarations, googleSearch, googleSearchRetrieval, codeExecution, and urlContext.
toolConfig
object
Controls how the model selects tools.
safetySettings
object[]
Override the default safety filters. Each entry specifies a category and threshold.

Response fields

candidates
object[]
An array of generated response candidates.
usageMetadata
object
Token usage for the request.
promptFeedback
object
Feedback about the prompt, including safety ratings and any block reason if the prompt was blocked.

Examples

curl "https://openopen8.ai/v1beta/models/gemini-2.0-flash:generateContent" \
  -H "x-goog-api-key: YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{"text": "Explain the difference between RAM and ROM."}]
      }
    ],
    "generationConfig": {
      "temperature": 0.7,
      "maxOutputTokens": 512
    }
  }'
Example response:
{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "RAM (Random Access Memory) is volatile memory used for temporary storage while your computer is running..."
          }
        ]
      },
      "finishReason": "STOP",
      "index": 0,
      "safetyRatings": []
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 11,
    "candidatesTokenCount": 142,
    "totalTokenCount": 153
  }
}

Thinking models

OpenOpen8 supports Gemini thinking models, which perform additional reasoning before generating a response. You have three ways to enable thinking: 1. Thinking model suffix — append -thinking to any supported model name:
POST /v1beta/models/gemini-2.5-flash-thinking:generateContent
POST /v1beta/models/gemini-2.5-pro-thinking:generateContent
2. Effort suffix — append -low, -medium, or -high for fine-grained control:
POST /v1beta/models/gemini-2.5-flash-high:generateContent
3. thinkingConfig in generationConfig — pass the configuration explicitly:
{
  "contents": [...],
  "generationConfig": {
    "thinkingConfig": {
      "includeThoughts": true,
      "thinkingBudget": 8192
    }
  }
}
When thinking is active, parts with "thought": true in the response contain the model’s reasoning. These parts are not shown to end users by default — your application decides whether to display them.
If you only need text output and do not want to process thinking parts, set includeThoughts: false and let the model reason internally without including those tokens in the response body.