name: llm2api description: > LLM2API provides Gemma 4 E4B (gemma4:e4b) text generation via an OpenAI-compatible chat completions endpoint. Use this skill when the user needs LLM inference, text generation, summarization, translation, or question answering without calling the Anthropic or OpenAI APIs directly. This is the LLM backbone of OSS2API, developed within the URL2AI project. Payment is handled via Bankr x402 (USDC on Base) or JPYC x402 (JPYC on Polygon).
LLM2API — Gemma 4 E4B Inference
OpenAI-compatible LLM inference powered by Gemma 4 E4B (gemma4:e4b) via Ollama. The LLM backbone of OSS2API, developed within the URL2AI project.
GitHub: katsushi2441/url2ai
Endpoints
| Gateway | URL | Payment |
|---|---|---|
| Bankr x402 | https://x402.bankr.bot/0x444fadbd6e1fed0cfbf7613b6c9f91b9021eecbd/llm2api |
USDC on Base ($0.01/req) |
| JPYC x402 | https://exbridge.ddns.net:8020 |
JPYC on Polygon (1.5 JPYC/req) |
Use the Bankr endpoint by default.
Usage
POST {base}/v1/chat/completions
OpenAI-compatible. Drop-in replacement for openai.chat.completions.create calls.
Parameters
| Field | Type | Required | Description |
|---|---|---|---|
messages |
array | Yes | Array of {role, content} objects. Roles: system, user, assistant. |
stream |
boolean | No | Enable SSE streaming. Default false. |
temperature |
number | No | Sampling temperature 0.0–2.0. Default 0.7. |
max_tokens |
integer | No | Max tokens to generate. Hard cap: 2,048. |
Limits
- Input: total text across all messages must not exceed 4,000 characters; max 20 messages per request
- Output: hard cap of 2,048 tokens per request regardless of
max_tokens
Request example
{
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "日本語で自己紹介して" }
]
}
Response
{
"id": "chatcmpl-...",
"object": "chat.completion",
"model": "gemma4:e4b",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "..." },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 30,
"completion_tokens": 120,
"total_tokens": 150
}
}
Code Examples
curl (Bankr x402)
bankr x402 call https://x402.bankr.bot/0x444fadbd6e1fed0cfbf7613b6c9f91b9021eecbd/llm2api/v1/chat/completions \
-X POST -H 'content-type: application/json' \
-d '{"messages":[{"role":"user","content":"日本語で自己紹介して"}]}'
Python (JPYC endpoint — no payment header for local/free tier)
import httpx
resp = httpx.post(
"https://exbridge.ddns.net:8020/v1/chat/completions",
json={
"messages": [{"role": "user", "content": "Summarize this in one sentence."}],
"temperature": 0.5,
},
timeout=60,
)
result = resp.json()
print(result["choices"][0]["message"]["content"])
Python (OpenAI SDK — Bankr endpoint)
from openai import OpenAI
client = OpenAI(
api_key="<bankr-wallet-api-key>",
base_url="https://x402.bankr.bot/0x444fadbd6e1fed0cfbf7613b6c9f91b9021eecbd/llm2api/v1",
)
response = client.chat.completions.create(
model="gemma4:e4b",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
Workflow
- Build a
messagesarray withsystem(optional) anduserturns. - POST to the Bankr endpoint via
bankr x402 call, or use the JPYC endpoint directly. - Read
choices[0].message.contentfrom the response. - Keep total input under 4,000 characters to avoid a 400 error.
Schema discovery
bankr x402 schema https://x402.bankr.bot/0x444fadbd6e1fed0cfbf7613b6c9f91b9021eecbd/llm2api