pipecat-dashboard - SKILL.md Agent Skill

name: pipecat-dashboard description: "Real-time latency monitoring and performance dashboard for PipeCat voice agent. Use when: (1) Monitoring active calls, (2) Debugging slow responses, (3) Analyzing latency patterns, (4) Identifying bottlenecks, (5) Reviewing historical performance"

PipeCat Dashboard

Real-time latency monitoring, bottleneck identification, and performance optimization.

Commands

/latency-report — Summary of recent call latencies
/latency-report [call_id] — Latency breakdown for specific call
/latency-report --live — Real-time latency during active call
/latency-report --daily — Daily latency summary
/latency-bottleneck — Identify current bottleneck component
/latency-tune — Get tuning recommendations

Reference Files

references/latency-thresholds.md — Alert thresholds and optimization targets

Latency Components

The voice agent pipeline has four main latency components:

User speaks → [STT] → [LLM] → [TTS] → Bot speaks
              ↓        ↓        ↓
           ~150ms    ~500ms   ~100ms

Component	Service	Target	Warning	Fail
STT	AssemblyAI Universal-2	<150ms	<250ms	>400ms
LLM	Claude Opus 4.5	<500ms	<800ms	>1200ms
TTS	ElevenLabs Flash v2.5	<100ms	<200ms	>350ms
Total	End-to-end	<1000ms	<1500ms	>2000ms

Usage Examples

Quick Latency Summary

User: /latency-report
Claude: Fetching latency data from last 24 hours...

PipeCat Latency Report (Last 24 Hours)
═══════════════════════════════════════

Total Calls: 87
Avg Response: 847ms

Component Breakdown:
  STT (AssemblyAI)    142ms avg   [██████████░░░░░░░░░░]  PASS
  LLM (Claude)        512ms avg   [████████████████░░░░]  PASS
  TTS (ElevenLabs)     98ms avg   [████████░░░░░░░░░░░░]  PASS
  VAD/Other            95ms avg   [████████░░░░░░░░░░░░]

Percentiles:
  P50:  725ms (Excellent)
  P90: 1,102ms (Good)
  P99: 1,892ms (Warning)

No latency alerts in the last 24 hours.

Specific Call Analysis

User: /latency-report call_abc123
Claude: Analyzing latency for call_abc123...

Call Latency Report: call_abc123
════════════════════════════════

Duration: 2:34
Turns: 12
Avg Response: 892ms

Per-Turn Latency:
  Turn 1:  1,245ms [STT: 165  LLM: 890  TTS: 108  Other: 82]  ← First response
  Turn 2:    712ms [STT: 132  LLM: 445  TTS:  89  Other: 46]
  Turn 3:    856ms [STT: 148  LLM: 567  TTS: 102  Other: 39]
  Turn 4:    698ms [STT: 128  LLM: 432  TTS:  94  Other: 44]
  Turn 5:  1,105ms [STT: 142  LLM: 812  TTS:  98  Other: 53]  ← Spike
  Turn 6:    723ms [STT: 135  LLM: 456  TTS:  91  Other: 41]
  ...

Anomalies Detected:
  ⚠ Turn 1: First response latency expected to be higher
  ⚠ Turn 5: LLM spike (812ms vs 485ms avg)

First Response: 1,245ms (Target: <1,200ms) ⚠

Component Summary:
  STT:   142ms avg (PASS - target <150ms)
  LLM:   572ms avg (WARN - target <500ms)
  TTS:    96ms avg (PASS - target <100ms)

Recommendation: LLM latency slightly elevated. Consider:
1. Reducing max_tokens from 300 to 200
2. Simplifying system prompt (currently 3,456 chars)

Live Monitoring

User: /latency-report --live
Claude: Starting live latency monitor...

🔴 LIVE - Call: call_xyz789
═══════════════════════════

Turn 1 [0:03]  Response: 1,156ms
  STT: 145ms  LLM: 865ms  TTS: 98ms

Turn 2 [0:12]  Response: 734ms
  STT: 138ms  LLM: 467ms  TTS: 91ms

Turn 3 [0:20]  Response: 812ms
  STT: 152ms  LLM: 523ms  TTS: 94ms

Avg so far: 901ms
Status: GOOD

[Press Ctrl+C to stop monitoring]

Daily Summary

User: /latency-report --daily
Claude: Generating daily latency summary...

Daily Latency Summary: 2024-01-15
═════════════════════════════════

Calls: 127
Total Minutes: 312

Hourly Breakdown:
  08:00  23 calls  Avg: 823ms  [████████░░]
  09:00  31 calls  Avg: 856ms  [████████░░]
  10:00  28 calls  Avg: 912ms  [█████████░]
  11:00  19 calls  Avg: 798ms  [████████░░]
  12:00  12 calls  Avg: 745ms  [███████░░░]
  13:00   8 calls  Avg: 921ms  [█████████░]
  14:00   6 calls  Avg: 1,045ms [██████████] ← Peak

Peak Hour: 14:00 (1,045ms avg)
  Likely cause: Claude API load

Best Hour: 12:00 (745ms avg)

Trend: Stable (no significant change from yesterday)

Bottleneck Analysis

User: /latency-bottleneck
Claude: Analyzing bottleneck patterns...

Bottleneck Analysis (Last 100 Calls)
════════════════════════════════════

Primary Bottleneck: LLM (Claude)
  Accounts for 58% of total latency
  Average: 512ms (target: 500ms)

Secondary: STT (AssemblyAI)
  Accounts for 17% of total latency
  Average: 142ms (target: 150ms)

Distribution:
  LLM:    [██████████████████████████████]  58%
  TTS:    [████████████░░░░░░░░░░░░░░░░░░]  12%
  STT:    [█████████░░░░░░░░░░░░░░░░░░░░░]  17%
  Other:  [███████░░░░░░░░░░░░░░░░░░░░░░░]  13%

LLM Latency Breakdown:
  First token (TTFT):  380ms avg
  Token generation:    132ms avg

Optimization Priority:
1. Reduce system prompt length
2. Lower max_tokens if possible
3. Consider model downgrade for non-critical calls

Tuning Recommendations

User: /latency-tune
Claude: Analyzing current configuration for optimization...

Tuning Recommendations
══════════════════════

Current Config:
  Model: claude-opus-4-5-20251101
  Max tokens: 300
  System prompt: 3,456 chars
  VAD stop_secs: 0.3

Recommendations:

1. SYSTEM PROMPT LENGTH (Medium Impact)
   Current: 3,456 chars → Suggested: <2,500 chars
   Expected savings: 50-100ms per turn
   Action: Remove redundant examples, condense phase guidance

2. MAX_TOKENS (Low Impact)
   Current: 300 → Suggested: 200
   Expected savings: 20-50ms per turn
   Risk: May truncate longer responses
   Action: Test with 200, monitor truncation rate

3. VAD STOP_SECS (UX Impact)
   Current: 0.3s → Keep current
   Note: Lower values cause more interruptions
   Recommendation: Keep at 0.3 for natural turn-taking

4. MODEL SELECTION (Trade-off)
   Current: Opus 4.5 → Could use Sonnet 4
   Expected savings: 200-300ms per turn
   Trade-off: Lower reasoning quality for complex scenarios
   Recommendation: A/B test with Sonnet for simple phases

Estimated Total Savings: 70-150ms per turn
Current: 847ms avg → Projected: 700-770ms avg

Alerts

The dashboard monitors for latency issues and generates alerts:

Alert Type	Condition	Action
Latency Spike	3+ consecutive >1500ms	Notify, log
Component Failure	Service timeout >5s	Notify, failover
Degradation	P90 increases >30%	Notify, investigate
First Response	>2500ms	Log for review

Script Integration

# Quick summary
python skills/pipecat-dashboard/scripts/latency_report.py

# Specific call
python skills/pipecat-dashboard/scripts/latency_report.py --call-id call_abc123

# Daily report
python skills/pipecat-dashboard/scripts/latency_report.py --daily

# JSON output
python skills/pipecat-dashboard/scripts/latency_report.py --json

# Bottleneck analysis
python skills/pipecat-dashboard/scripts/latency_report.py --bottleneck

Metrics Storage

Latency metrics are stored in conversation_transcripts.metrics:

{
  "avg_response_latency_ms": 847,
  "first_response_latency_ms": 1245,
  "response_latencies_ms": [1245, 712, 856, ...],
  "stt_latencies_ms": [165, 132, 148, ...],
  "llm_latencies_ms": [890, 445, 567, ...],
  "tts_latencies_ms": [108, 89, 102, ...],
  "interruption_count": 2
}