name: openrouter-generations description: Retrieve detailed metadata and stored content for individual OpenRouter generations. Use when the user wants to inspect a specific request — its cost, latency, token usage, provider routing, or the actual prompt/completion text — or is debugging a failed or unexpected generation. version: 0.1.0
openrouter-generations
Retrieve detailed metadata and stored content for individual OpenRouter generations. Use this skill when you need to inspect a specific request — its cost, latency, token usage, provider routing, or the actual prompt/completion text.
Prerequisites
- Any valid OpenRouter API key (regular or management key). Get one at openrouter.ai/settings/keys.
- Pass it via
--api-key <key>or set theOPENROUTER_API_KEYenvironment variable - Generation IDs look like
gen-1234567890orgen-aBcDeFgHiJkLmNoPqRsT.
First-Time Setup
cd <skill-path>/scripts && npm install
Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
/api/v1/generation |
GET | Request metadata and usage (tokens, cost, latency, model, provider) |
/api/v1/generation/content |
GET | Stored prompt and completion text |
Both take a single query parameter: id (the generation ID).
Full API reference: openrouter.ai/docs/api/api-reference/generations/get-generation
Workflow
1. Get generation metadata
Retrieves everything about a generation except the actual prompt/completion text:
cd <skill-path>/scripts && npx tsx get-generation.ts gen-1234567890
npx tsx get-generation.ts --id gen-1234567890 --json
What you get back:
- Model & routing:
model,provider_name,router,service_tier - Tokens:
tokens_prompt,tokens_completion,native_tokens_reasoning,native_tokens_cached - Cost:
total_cost,usage,upstream_inference_cost,cache_discount - Performance:
latency,generation_time,moderation_latency - Status:
finish_reason,streamed,cancelled,is_byok - Context:
created_at,app_id,external_user,session_id,request_id - Provider chain:
provider_responsesarray showing fallback attempts with per-provider latency and status
2. Get generation content
Retrieves the stored prompt and completion:
cd <skill-path>/scripts && npx tsx get-generation-content.ts gen-1234567890
npx tsx get-generation-content.ts --id gen-1234567890 --json
What you get back:
- Input:
prompt(raw text) and/ormessages(array of{role, content}) - Output:
completion(the model's response) andreasoning(chain-of-thought, if applicable)
Note: Content is only available if the generation was not made with Zero Data Retention (ZDR) enabled. If ZDR was on, this endpoint returns empty/null content.
Direct API Usage (curl)
Get metadata
curl -G https://openrouter.ai/api/v1/generation \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-d id=gen-1234567890
Get content
curl -G https://openrouter.ai/api/v1/generation/content \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-d id=gen-1234567890
Response Schemas
Metadata response (/api/v1/generation)
{
"data": {
"id": "gen-3bhGkxlo4XFrqiabUM7NDtwDzWwG",
"api_type": "completions",
"model": "openai/gpt-4o",
"provider_name": "OpenAI",
"created_at": "2024-07-15T23:33:19.433273+00:00",
"tokens_prompt": 10,
"tokens_completion": 25,
"native_tokens_reasoning": 5,
"native_tokens_cached": 3,
"total_cost": 0.0015,
"usage": 0.0015,
"upstream_inference_cost": 0.0012,
"latency": 1250,
"generation_time": 1200,
"finish_reason": "stop",
"streamed": true,
"is_byok": false,
"cancelled": false,
"router": "openrouter/auto",
"service_tier": "priority",
"provider_responses": [
{
"provider_name": "OpenAI",
"model_permaslug": "openai/gpt-4o",
"status": 200,
"latency": 1200,
"is_byok": false
}
]
}
}
Content response (/api/v1/generation/content)
{
"data": {
"input": {
"prompt": "What is the meaning of life?",
"messages": [
{
"content": "What is the meaning of life?",
"role": "user"
}
]
},
"output": {
"completion": "The meaning of life is a philosophical question...",
"reasoning": null
}
}
}
Common Use Cases
Debug a failed generation
# Check what happened — look at finish_reason, provider_responses, and cancelled
cd <skill-path>/scripts && npx tsx get-generation.ts gen-abc123 --json
Look for:
finish_reason="length"means the model hit max tokensfinish_reason="content_filter"means content was filteredcancelled=truemeans the request was cancelled by the clientprovider_responseswith multiple entries means fallbacks occurred
Check cost of a specific request
cd <skill-path>/scripts && npx tsx get-generation.ts gen-abc123
Check total_cost (what you were charged) vs upstream_inference_cost (what the provider charged OpenRouter).
Review what was actually sent/received
cd <skill-path>/scripts && npx tsx get-generation-content.ts gen-abc123
Useful for debugging unexpected outputs — verify the actual prompt sent and completion received.
Trace a multi-generation session
If you have a request_id or session_id from one generation, you can find related generations via the analytics query endpoint (see openrouter-analytics skill).
Error Handling
| Status | Meaning |
|---|---|
| 401 | Invalid or missing API key |
| 403 | You don't have access to this generation (belongs to another user) |
| 404 | Generation ID not found |
| 429 | Rate limited — wait and retry |
| 500 | Server error — retry |
| 502 | Upstream failure — retry |
Key Fields Reference
Metadata fields
| Field | Type | Description |
|---|---|---|
id |
string | Generation ID (gen-...) |
model |
string | Model permaslug (e.g., openai/gpt-4o) |
provider_name |
string|null | Provider that served the request |
api_type |
string | One of: completions, embeddings, rerank, tts, stt, video |
tokens_prompt |
int|null | Prompt token count |
tokens_completion |
int|null | Completion token count |
native_tokens_reasoning |
int|null | Reasoning/thinking tokens |
native_tokens_cached |
int|null | Cached input tokens |
total_cost |
number | Total cost in USD |
usage |
number | Usage amount in USD |
upstream_inference_cost |
number|null | Provider's cost in USD |
cache_discount |
number|null | Discount from caching |
latency |
number|null | Total latency in ms |
generation_time |
number|null | Model generation time in ms |
moderation_latency |
number|null | Moderation check time in ms |
finish_reason |
string|null | Why generation stopped (stop, length, content_filter, etc.) |
native_finish_reason |
string|null | Raw finish reason from provider |
streamed |
bool|null | Whether response was streamed |
is_byok |
bool | Whether user's own provider key was used |
cancelled |
bool|null | Whether request was cancelled |
app_id |
int|null | OAuth app ID |
external_user |
string|null | External user identifier (X-External-User header) |
session_id |
string|null | Session grouping ID |
request_id |
string|null | Request grouping ID (all gens from one API call) |
router |
string|null | Router used (e.g., openrouter/auto) |
service_tier |
string|null | Provider service tier |
web_search_engine |
string|null | Search engine used (e.g., exa, firecrawl) |
num_search_results |
int|null | Number of search results included |
provider_responses |
array|null | Provider attempt chain with per-provider latency/status |
Content fields
| Field | Type | Description |
|---|---|---|
data.input.prompt |
string|null | Raw prompt text |
data.input.messages |
array|null | Messages array ([{role, content}]) |
data.output.completion |
string|null | Model's completion text |
data.output.reasoning |
string|null | Chain-of-thought reasoning |