name: api-runtime-monitor description: Monitors LTX API runtime performance, latency, error rates, and throughput. Alerts on performance degradation or errors. tags: [monitoring, api, performance, latency, errors]
API Runtime Monitor
When to use
- "Monitor API latency"
- "Alert on API errors"
- "Track API throughput"
- "Monitor inference time"
- "Alert on API performance degradation"
What it monitors
- Latency: Request processing time, inference time, queue time
- Error rates: % of failed requests, error types, error sources
- Throughput: Requests per hour/day, by endpoint/model
- Performance: P50/P95/P99 latency, success rate
- Utilization: API usage by org, model, resolution
Steps
Gather requirements from user:
- Which performance metric to monitor (latency, errors, throughput)
- Alert threshold (e.g., "P95 latency > 30s", "error rate > 5%", "throughput drops > 20%")
- Time window (hourly, daily)
- Scope (all requests, specific endpoint, specific org)
- Notification channel
Read shared files:
shared/bq-schema.md— GPU cost table (has API runtime data) and ltxvapi tablesshared/metric-standards.md— Performance metric patterns
Identify data source:
- For LTX API: Use
ltxvapi_api_requests_with_be_costsorgpu_request_attribution_and_cost - Key columns explained:
request_processing_time_ms: Total time from request submission to completionrequest_inference_time_ms: GPU processing time (actual model inference)request_queue_time_ms: Time waiting in queue before processing startsresult: Request outcome (success, failed, timeout, etc.)error_type: Classification of errors (infrastructure vs applicative)endpoint: API endpoint called (e.g., /generate, /upscale)model_type: Model used (ltxv2, retake, etc.)org_name: Customer organization making the request
- For LTX API: Use
Write monitoring SQL:
- Query relevant performance metric
- Calculate percentiles (P50, P95, P99) for latency
- Calculate error rate (failed / total requests)
- Compare against baseline
Present to user:
- Show SQL query
- Show example alert format with performance breakdown
- Confirm threshold values
Set up alert (manual for now):
- Document SQL
- Configure notification to engineering team
Reference files
| File | Read when |
|---|---|
shared/product-context.md |
LTX products and business context |
shared/bq-schema.md |
API tables and GPU cost table schema |
shared/metric-standards.md |
Performance metric patterns |
shared/event-registry.yaml |
Feature events (if analyzing event-driven metrics) |
shared/gpu-cost-query-templates.md |
GPU cost queries (if analyzing cost-related performance) |
shared/gpu-cost-analysis-patterns.md |
Cost analysis patterns (if analyzing cost-related performance) |
Rules
- DO use APPROX_QUANTILES for percentile calculations (P50, P95, P99)
- DO separate errors by error_source (infrastructure vs applicative)
- DO filter by result = 'success' for success rate calculations
- DO break down by endpoint, model, and resolution for detailed analysis
- DO compare current performance against historical baseline
- DO alert engineering team for infrastructure errors, product team for applicative errors
- DO partition by dt for performance