api-runtime-monitor - SKILL.md Agent Skill

name: api-runtime-monitor description: Monitors LTX API runtime performance, latency, error rates, and throughput. Alerts on performance degradation or errors. tags: [monitoring, api, performance, latency, errors]

API Runtime Monitor

When to use

"Monitor API latency"
"Alert on API errors"
"Track API throughput"
"Monitor inference time"
"Alert on API performance degradation"

What it monitors

Latency: Request processing time, inference time, queue time
Error rates: % of failed requests, error types, error sources
Throughput: Requests per hour/day, by endpoint/model
Performance: P50/P95/P99 latency, success rate
Utilization: API usage by org, model, resolution

Steps

Gather requirements from user:
- Which performance metric to monitor (latency, errors, throughput)
- Alert threshold (e.g., "P95 latency > 30s", "error rate > 5%", "throughput drops > 20%")
- Time window (hourly, daily)
- Scope (all requests, specific endpoint, specific org)
- Notification channel
Read shared files:
- shared/bq-schema.md — GPU cost table (has API runtime data) and ltxvapi tables
- shared/metric-standards.md — Performance metric patterns
Identify data source:
- For LTX API: Use ltxvapi_api_requests_with_be_costs or gpu_request_attribution_and_cost
- Key columns explained:
  - request_processing_time_ms: Total time from request submission to completion
  - request_inference_time_ms: GPU processing time (actual model inference)
  - request_queue_time_ms: Time waiting in queue before processing starts
  - result: Request outcome (success, failed, timeout, etc.)
  - error_type: Classification of errors (infrastructure vs applicative)
  - endpoint: API endpoint called (e.g., /generate, /upscale)
  - model_type: Model used (ltxv2, retake, etc.)
  - org_name: Customer organization making the request
Write monitoring SQL:
- Query relevant performance metric
- Calculate percentiles (P50, P95, P99) for latency
- Calculate error rate (failed / total requests)
- Compare against baseline
Present to user:
- Show SQL query
- Show example alert format with performance breakdown
- Confirm threshold values
Set up alert (manual for now):
- Document SQL
- Configure notification to engineering team

Reference files

File	Read when
`shared/product-context.md`	LTX products and business context
`shared/bq-schema.md`	API tables and GPU cost table schema
`shared/metric-standards.md`	Performance metric patterns
`shared/event-registry.yaml`	Feature events (if analyzing event-driven metrics)
`shared/gpu-cost-query-templates.md`	GPU cost queries (if analyzing cost-related performance)
`shared/gpu-cost-analysis-patterns.md`	Cost analysis patterns (if analyzing cost-related performance)

Rules

DO use APPROX_QUANTILES for percentile calculations (P50, P95, P99)
DO separate errors by error_source (infrastructure vs applicative)
DO filter by result = 'success' for success rate calculations
DO break down by endpoint, model, and resolution for detailed analysis
DO compare current performance against historical baseline
DO alert engineering team for infrastructure errors, product team for applicative errors
DO partition by dt for performance