pipecat-call-tester

star 0

Automated testing for PipeCat voice agent pipeline. Use when: (1) Testing after code changes, (2) Validating new prompt versions, (3) Running pre-deployment checks, (4) Debugging call issues, (5) Measuring latency performance

isethius By isethius schedule Updated 1/21/2026

name: pipecat-call-tester description: "Automated testing for PipeCat voice agent pipeline. Use when: (1) Testing after code changes, (2) Validating new prompt versions, (3) Running pre-deployment checks, (4) Debugging call issues, (5) Measuring latency performance"

PipeCat Call Tester

Orchestrates 3-tier testing for the PipeCat voice agent: local validation, Daily.co room testing, and full end-to-end calls.

Commands

  • /test-pipeline — Run Tier 1 (local) tests
  • /test-pipeline tier-2 — Create Daily.co room for browser testing
  • /test-pipeline tier-3 [phone] — Run full end-to-end call
  • /test-pipeline all — Run all tiers sequentially
  • /test-pipeline scenario [name] — Run specific test scenario

Reference Files

  • references/test-scenarios.md — Pre-built test cases for different call flows
  • references/baseline-thresholds.md — Pass/fail criteria for latency and quality

Testing Tiers

Tier 1: Local Pipeline Test (Zero Cost)

Validates components without external API calls:

Checks:

  • All imports work correctly
  • Required API keys are configured
  • NEPQ system prompt generates correctly
  • Metrics tracking functions properly
  • Phase detection logic works
  • Service objects can be instantiated

Run:

python skills/pipecat-call-tester/scripts/run_tests.py --tier 1

Output Example:

TEST 1: Local Pipeline Validation
=================================

1.1 Testing imports...
  [PASS] Core imports

1.2 Checking API keys...
  [PASS] ASSEMBLYAI_API_KEY (length: 40)
  [PASS] ANTHROPIC_API_KEY (length: 64)
  [PASS] ELEVENLABS_API_KEY (length: 32)
  [PASS] ELEVENLABS_VOICE_ID (length: 24)

1.3 Testing NEPQ system prompt generation...
  [PASS] NEPQ mentioned
  [PASS] Lead name included
  [PASS] Property address included
  [PASS] Phase guidance included
  System prompt length: 3,456 characters

1.4 Testing metrics tracking...
  [PASS] Turn tracking
  [PASS] Micro-commitment tracking
  [PASS] Phase tracking

1.5 Testing phase detection...
  [PASS] CONNECTING phase triggers
  [PASS] PROBLEM_AWARENESS triggers
  [PASS] Objection detection

TEST 1 PASSED: All 15 checks passed

Tier 2: Daily.co Room Test (~$0.01/min)

Creates a real Daily.co room for browser-based testing:

What it does:

  1. Creates a temporary Daily.co room
  2. Starts the PipeCat bot in the room
  3. Provides URL for you to join in browser
  4. Records latency metrics during conversation
  5. Generates test report when done
  6. Cleans up room

Run:

python skills/pipecat-call-tester/scripts/run_tests.py --tier 2

Browser Testing Flow:

  1. Open the provided room URL
  2. Allow microphone access
  3. Bot says: "Hey, did I catch you at a bad time?"
  4. Respond naturally to test the conversation flow
  5. Press Enter when done to see metrics

Tier 3: Full End-to-End Call (~$0.075/min)

Makes a real outbound call through the full stack:

What it does:

  1. Creates Daily.co room
  2. Initiates Twilio call to target phone
  3. Bridges call audio to Daily.co room
  4. PipeCat bot runs full NEPQ conversation
  5. Records all metrics and transcript
  6. Generates detailed test report

Run:

python skills/pipecat-call-tester/scripts/run_tests.py --tier 3 --phone +15551234567

Output Example:

TEST 3: Full End-to-End Call
============================

Call initiated:
  Call ID: call_abc123
  Room URL: https://your-domain.daily.co/test-room-xyz
  Phone: +15551234567

Call Metrics:
  Duration: 45.2s
  Turns: 8
  Phase reached: COMMITMENT_CHECK

Latency Breakdown:
  STT (AssemblyAI): 142ms avg
  LLM (Claude): 485ms avg
  TTS (ElevenLabs): 98ms avg
  Total response: 725ms avg

Quality Metrics:
  Micro-commitments: 3
  Objections handled: 1
  Pain discovered: Yes

RESULT: PASS (all thresholds met)

Test Scenarios

Pre-built scenarios for testing specific flows:

Scenario Description Duration
happy-path Cooperative prospect, reaches commitment ~60s
objection-price Price objection early in call ~45s
objection-realtor "I have a realtor" objection ~45s
not-interested Quick rejection, graceful exit ~20s
busy-callback "Call me back later" ~15s
wrong-number Not the property owner ~15s

Run scenario:

python skills/pipecat-call-tester/scripts/run_tests.py --scenario happy-path

Latency Thresholds

Default pass/fail criteria (configurable):

Metric Target Warning Fail
Total response <1000ms <1500ms >1500ms
STT latency <200ms <300ms >400ms
LLM latency <600ms <900ms >1200ms
TTS latency <150ms <250ms >350ms
First response <1500ms <2000ms >2500ms

Usage Examples

Quick Validation After Code Change

User: /test-pipeline
Claude: Running Tier 1 local validation...

Results:
  Imports: PASS
  API keys: PASS (10/10)
  System prompt: PASS (3,456 chars)
  Metrics: PASS
  Phase detection: PASS

All 15 checks passed. Ready for Tier 2 testing.

Full Pre-Deployment Test

User: /test-pipeline all
Claude: Running all test tiers...

Tier 1 (Local): PASS (15/15 checks)
Tier 2 (Daily Room): PASS (avg latency 725ms)
Tier 3 (Full Call): PASS (reached COMMITMENT_CHECK)

Deployment readiness: APPROVED

Debug Slow Response

User: /test-pipeline tier-2
Claude: Starting Daily.co room test...

Room URL: https://your-domain.daily.co/test-12345
Join in browser and speak to the bot.

[After testing]

Latency Analysis:
  STT: 145ms (PASS)
  LLM: 1,250ms (FAIL - threshold 900ms)
  TTS: 102ms (PASS)

BOTTLENECK IDENTIFIED: LLM latency

Recommendations:
1. Check Claude API status: https://status.anthropic.com
2. Consider reducing max_tokens from 300 to 200
3. Simplify system prompt (currently 3,456 chars)

Integration with CI/CD

Run automated tests in CI pipeline:

# .github/workflows/test-pipeline.yml
- name: Test PipeCat Pipeline
  run: |
    python skills/pipecat-call-tester/scripts/run_tests.py --tier 1 --json
  env:
    ASSEMBLYAI_API_KEY: ${{ secrets.ASSEMBLYAI_API_KEY }}
    # ... other keys

Test Report Format

Reports are saved to test-reports/ with detailed metrics:

{
  "timestamp": "2024-01-15T10:30:00Z",
  "tier": 1,
  "duration_ms": 1234,
  "results": {
    "imports": "pass",
    "api_keys": "pass",
    "system_prompt": "pass",
    "metrics": "pass",
    "phase_detection": "pass"
  },
  "overall": "pass"
}
Install via CLI
npx skills add https://github.com/isethius/RE-App --skill pipecat-call-tester
Repository Details
star Stars 0
call_split Forks 0
navigation Branch main
article Path SKILL.md
More from Creator