Gemini generateContent and streaming via OpenOpen8

OpenOpen8 exposes Google Gemini-compatible endpoints at the /v1beta/models/{model} path, matching the format used by the Google AI SDKs and the Gemini REST API. You authenticate with your OpenOpen8 token using the x-goog-api-key header or a key query parameter — no Google credentials needed. OpenOpen8 routes the request to whichever upstream channel is configured for the model you specify in the URL path.

Endpoints

Method	Path	Description
`POST`	`/v1beta/models/{model}:generateContent`	Generate a response (non-streaming)
`POST`	`/v1beta/models/{model}:streamGenerateContent`	Generate a response as a stream

The model name is part of the URL path, not the request body. For example, to use gemini-2.0-flash, send a POST to /v1beta/models/gemini-2.0-flash:generateContent.

Authentication

Pass your OpenOpen8 token using either method: Header (recommended):

x-goog-api-key: YOUR_TOKEN

Query parameter:

POST /v1beta/models/gemini-2.0-flash:generateContent?key=YOUR_TOKEN

Request parameters

contents

object[]

required

The conversation history as an array of content objects. Each object has a role and a parts array.

Show content properties

contents[].role

string

The role of the content author. Use "user" for user turns and "model" for model turns. Omit for single-turn requests.

contents[].parts

object[]

required

An array of content parts that make up this turn.

Show part properties

contents[].parts[].text

string

A text part. Pass this for plain text input or to provide text alongside other modalities.

contents[].parts[].inlineData

object

Inline binary data (e.g. an image or audio clip encoded in base64).

Show inlineData properties

contents[].parts[].inlineData.mimeType

string

required

The MIME type of the data, for example "image/jpeg", "audio/mp3", or "video/mp4".

contents[].parts[].inlineData.data

string

required

Base64-encoded bytes.

contents[].parts[].fileData

object

Reference to a file by URI, for example a Google Cloud Storage URI or a Files API URI.

Show fileData properties

contents[].parts[].fileData.mimeType

string

The MIME type of the file.

contents[].parts[].fileData.fileUri

string

required

The URI of the file.

contents[].parts[].functionCall

object

A function call the model wants to make. Contains name and args.

contents[].parts[].functionResponse

object

The result of a function call. Contains name and response.

systemInstruction

object

A system prompt. Structure is the same as a contents item: an object with a parts array. Only text parts are supported for system instructions.

{
  "systemInstruction": {
    "parts": [{"text": "You are a helpful assistant."}]
  }
}

Also accepted as system_instruction (snake_case).

generationConfig

object

Parameters that control how the model generates output. Also accepted as generation_config.

Show generationConfig properties

generationConfig.temperature

number

Sampling temperature. Higher values are more creative; lower values are more deterministic.

generationConfig.topP

number

Nucleus sampling probability mass. Also accepted as top_p.

generationConfig.topK

number

Top-k sampling. Also accepted as top_k.

generationConfig.maxOutputTokens

integer

The maximum number of tokens to generate. Also accepted as max_output_tokens.

generationConfig.candidateCount

integer

Number of candidate responses to generate. Defaults to 1.

generationConfig.stopSequences

string[]

Sequences at which the model stops generating. Also accepted as stop_sequences.

generationConfig.responseMimeType

string

The MIME type for the response. Use "application/json" to request JSON output. Also accepted as response_mime_type.

generationConfig.responseSchema

object

A JSON Schema that the response must conform to. Requires responseMimeType to be "application/json". Also accepted as response_schema.

generationConfig.responseModalities

string[]

Output modalities to request, for example ["TEXT"] or ["TEXT", "IMAGE"]. Also accepted as response_modalities.

generationConfig.thinkingConfig

object

Configuration for extended thinking. Also accepted as thinking_config.

Show thinkingConfig properties

generationConfig.thinkingConfig.includeThoughts

boolean

Whether to include the model’s thoughts in the response. Also accepted as include_thoughts.

generationConfig.thinkingConfig.thinkingBudget

integer

Maximum number of tokens to use for thinking. Also accepted as thinking_budget.

generationConfig.thinkingConfig.thinkingLevel

string

Reasoning effort level: "low", "medium", or "high". Also accepted as thinking_level.

tools

object[]

Tools the model may use. Supports functionDeclarations, googleSearch, googleSearchRetrieval, codeExecution, and urlContext.

toolConfig

object

Controls how the model selects tools.

Show toolConfig properties

toolConfig.functionCallingConfig

object

Function calling configuration.

Show functionCallingConfig properties

toolConfig.functionCallingConfig.mode

string

Function calling mode: "AUTO", "ANY", or "NONE".

toolConfig.functionCallingConfig.allowedFunctionNames

string[]

When mode is "ANY", restricts tool calls to this list of function names.

safetySettings

object[]

Override the default safety filters. Each entry specifies a category and threshold.

Response fields

candidates

object[]

An array of generated response candidates.

Show candidate properties

candidates[].content

object

The generated content.

Show content properties

candidates[].content.role

string

Always "model" for generated content.

candidates[].content.parts

object[]

An array of content parts. Each part has a text field for text output, or functionCall for tool invocations. Parts with thought: true contain the model’s internal reasoning when thinking is enabled.

candidates[].finishReason

string

Why the model stopped. Common values: "STOP" (natural end), "MAX_TOKENS" (token limit), "SAFETY" (safety filter), "RECITATION".

candidates[].index

integer

The index of this candidate.

candidates[].safetyRatings

object[]

Safety ratings for each harm category.

usageMetadata

object

Token usage for the request.

Show usageMetadata properties

usageMetadata.promptTokenCount

integer

Tokens in the input contents.

usageMetadata.candidatesTokenCount

integer

Tokens in the generated candidates.

usageMetadata.totalTokenCount

integer

Total tokens used.

usageMetadata.thoughtsTokenCount

integer

Tokens used for thinking, when extended thinking is enabled.

promptFeedback

object

Feedback about the prompt, including safety ratings and any block reason if the prompt was blocked.

Examples

Non-streaming
Streaming
Multi-turn

curl "https://openopen8.ai/v1beta/models/gemini-2.0-flash:generateContent" \
  -H "x-goog-api-key: YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {
        "role": "user",
        "parts": [{"text": "Explain the difference between RAM and ROM."}]
      }
    ],
    "generationConfig": {
      "temperature": 0.7,
      "maxOutputTokens": 512
    }
  }'

Example response:

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "RAM (Random Access Memory) is volatile memory used for temporary storage while your computer is running..."
          }
        ]
      },
      "finishReason": "STOP",
      "index": 0,
      "safetyRatings": []
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 11,
    "candidatesTokenCount": 142,
    "totalTokenCount": 153
  }
}

cURL

curl "https://openopen8.ai/v1beta/models/gemini-2.0-flash:streamGenerateContent" \
  -H "x-goog-api-key: YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [
      {"role": "user", "parts": [{"text": "Count from 1 to 5."}]}
    ]
  }'

The server returns a newline-delimited stream of JSON objects, each representing a partial response chunk. Each chunk has the same structure as a non-streaming response, with partial candidates[].content.parts[].text.

cURL

curl "https://openopen8.ai/v1beta/models/gemini-2.0-flash:generateContent" \
  -H "x-goog-api-key: YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "systemInstruction": {
      "parts": [{"text": "You are a geography expert."}]
    },
    "contents": [
      {"role": "user", "parts": [{"text": "What is the longest river in Africa?"}]},
      {"role": "model", "parts": [{"text": "The Nile is the longest river in Africa."}]},
      {"role": "user", "parts": [{"text": "How long is it in kilometers?"}]}
    ]
  }'

Thinking models

OpenOpen8 supports Gemini thinking models, which perform additional reasoning before generating a response. You have three ways to enable thinking: 1. Thinking model suffix — append -thinking to any supported model name:

POST /v1beta/models/gemini-2.5-flash-thinking:generateContent
POST /v1beta/models/gemini-2.5-pro-thinking:generateContent

2. Effort suffix — append -low, -medium, or -high for fine-grained control:

POST /v1beta/models/gemini-2.5-flash-high:generateContent

3. thinkingConfig in generationConfig — pass the configuration explicitly:

{
  "contents": [...],
  "generationConfig": {
    "thinkingConfig": {
      "includeThoughts": true,
      "thinkingBudget": 8192
    }
  }
}

When thinking is active, parts with "thought": true in the response contain the model’s reasoning. These parts are not shown to end users by default — your application decides whether to display them.

If you only need text output and do not want to process thinking parts, set includeThoughts: false and let the model reason internally without including those tokens in the response body.

Overview

Chat & Completions

Media & Multimodal

Other Endpoints

Gemini generateContent and streaming via OpenOpen8

Endpoints

Authentication

Request parameters

Response fields

Examples

Thinking models

Overview

Chat & Completions

Media & Multimodal

Other Endpoints

​Endpoints

​Authentication

​Request parameters

​Response fields

​Examples

​Thinking models

Endpoints

Authentication

Request parameters

Response fields

Examples

Thinking models