name: pipecat-call-tester description: "Automated testing for PipeCat voice agent pipeline. Use when: (1) Testing after code changes, (2) Validating new prompt versions, (3) Running pre-deployment checks, (4) Debugging call issues, (5) Measuring latency performance"
PipeCat Call Tester
Orchestrates 3-tier testing for the PipeCat voice agent: local validation, Daily.co room testing, and full end-to-end calls.
Commands
/test-pipeline— Run Tier 1 (local) tests/test-pipeline tier-2— Create Daily.co room for browser testing/test-pipeline tier-3 [phone]— Run full end-to-end call/test-pipeline all— Run all tiers sequentially/test-pipeline scenario [name]— Run specific test scenario
Reference Files
references/test-scenarios.md— Pre-built test cases for different call flowsreferences/baseline-thresholds.md— Pass/fail criteria for latency and quality
Testing Tiers
Tier 1: Local Pipeline Test (Zero Cost)
Validates components without external API calls:
Checks:
- All imports work correctly
- Required API keys are configured
- NEPQ system prompt generates correctly
- Metrics tracking functions properly
- Phase detection logic works
- Service objects can be instantiated
Run:
python skills/pipecat-call-tester/scripts/run_tests.py --tier 1
Output Example:
TEST 1: Local Pipeline Validation
=================================
1.1 Testing imports...
[PASS] Core imports
1.2 Checking API keys...
[PASS] ASSEMBLYAI_API_KEY (length: 40)
[PASS] ANTHROPIC_API_KEY (length: 64)
[PASS] ELEVENLABS_API_KEY (length: 32)
[PASS] ELEVENLABS_VOICE_ID (length: 24)
1.3 Testing NEPQ system prompt generation...
[PASS] NEPQ mentioned
[PASS] Lead name included
[PASS] Property address included
[PASS] Phase guidance included
System prompt length: 3,456 characters
1.4 Testing metrics tracking...
[PASS] Turn tracking
[PASS] Micro-commitment tracking
[PASS] Phase tracking
1.5 Testing phase detection...
[PASS] CONNECTING phase triggers
[PASS] PROBLEM_AWARENESS triggers
[PASS] Objection detection
TEST 1 PASSED: All 15 checks passed
Tier 2: Daily.co Room Test (~$0.01/min)
Creates a real Daily.co room for browser-based testing:
What it does:
- Creates a temporary Daily.co room
- Starts the PipeCat bot in the room
- Provides URL for you to join in browser
- Records latency metrics during conversation
- Generates test report when done
- Cleans up room
Run:
python skills/pipecat-call-tester/scripts/run_tests.py --tier 2
Browser Testing Flow:
- Open the provided room URL
- Allow microphone access
- Bot says: "Hey, did I catch you at a bad time?"
- Respond naturally to test the conversation flow
- Press Enter when done to see metrics
Tier 3: Full End-to-End Call (~$0.075/min)
Makes a real outbound call through the full stack:
What it does:
- Creates Daily.co room
- Initiates Twilio call to target phone
- Bridges call audio to Daily.co room
- PipeCat bot runs full NEPQ conversation
- Records all metrics and transcript
- Generates detailed test report
Run:
python skills/pipecat-call-tester/scripts/run_tests.py --tier 3 --phone +15551234567
Output Example:
TEST 3: Full End-to-End Call
============================
Call initiated:
Call ID: call_abc123
Room URL: https://your-domain.daily.co/test-room-xyz
Phone: +15551234567
Call Metrics:
Duration: 45.2s
Turns: 8
Phase reached: COMMITMENT_CHECK
Latency Breakdown:
STT (AssemblyAI): 142ms avg
LLM (Claude): 485ms avg
TTS (ElevenLabs): 98ms avg
Total response: 725ms avg
Quality Metrics:
Micro-commitments: 3
Objections handled: 1
Pain discovered: Yes
RESULT: PASS (all thresholds met)
Test Scenarios
Pre-built scenarios for testing specific flows:
| Scenario | Description | Duration |
|---|---|---|
happy-path |
Cooperative prospect, reaches commitment | ~60s |
objection-price |
Price objection early in call | ~45s |
objection-realtor |
"I have a realtor" objection | ~45s |
not-interested |
Quick rejection, graceful exit | ~20s |
busy-callback |
"Call me back later" | ~15s |
wrong-number |
Not the property owner | ~15s |
Run scenario:
python skills/pipecat-call-tester/scripts/run_tests.py --scenario happy-path
Latency Thresholds
Default pass/fail criteria (configurable):
| Metric | Target | Warning | Fail |
|---|---|---|---|
| Total response | <1000ms | <1500ms | >1500ms |
| STT latency | <200ms | <300ms | >400ms |
| LLM latency | <600ms | <900ms | >1200ms |
| TTS latency | <150ms | <250ms | >350ms |
| First response | <1500ms | <2000ms | >2500ms |
Usage Examples
Quick Validation After Code Change
User: /test-pipeline
Claude: Running Tier 1 local validation...
Results:
Imports: PASS
API keys: PASS (10/10)
System prompt: PASS (3,456 chars)
Metrics: PASS
Phase detection: PASS
All 15 checks passed. Ready for Tier 2 testing.
Full Pre-Deployment Test
User: /test-pipeline all
Claude: Running all test tiers...
Tier 1 (Local): PASS (15/15 checks)
Tier 2 (Daily Room): PASS (avg latency 725ms)
Tier 3 (Full Call): PASS (reached COMMITMENT_CHECK)
Deployment readiness: APPROVED
Debug Slow Response
User: /test-pipeline tier-2
Claude: Starting Daily.co room test...
Room URL: https://your-domain.daily.co/test-12345
Join in browser and speak to the bot.
[After testing]
Latency Analysis:
STT: 145ms (PASS)
LLM: 1,250ms (FAIL - threshold 900ms)
TTS: 102ms (PASS)
BOTTLENECK IDENTIFIED: LLM latency
Recommendations:
1. Check Claude API status: https://status.anthropic.com
2. Consider reducing max_tokens from 300 to 200
3. Simplify system prompt (currently 3,456 chars)
Integration with CI/CD
Run automated tests in CI pipeline:
# .github/workflows/test-pipeline.yml
- name: Test PipeCat Pipeline
run: |
python skills/pipecat-call-tester/scripts/run_tests.py --tier 1 --json
env:
ASSEMBLYAI_API_KEY: ${{ secrets.ASSEMBLYAI_API_KEY }}
# ... other keys
Test Report Format
Reports are saved to test-reports/ with detailed metrics:
{
"timestamp": "2024-01-15T10:30:00Z",
"tier": 1,
"duration_ms": 1234,
"results": {
"imports": "pass",
"api_keys": "pass",
"system_prompt": "pass",
"metrics": "pass",
"phase_detection": "pass"
},
"overall": "pass"
}