pipecat-call-tester - SKILL.md Agent Skill

name: pipecat-call-tester description: "Automated testing for PipeCat voice agent pipeline. Use when: (1) Testing after code changes, (2) Validating new prompt versions, (3) Running pre-deployment checks, (4) Debugging call issues, (5) Measuring latency performance"

PipeCat Call Tester

Orchestrates 3-tier testing for the PipeCat voice agent: local validation, Daily.co room testing, and full end-to-end calls.

Commands

/test-pipeline — Run Tier 1 (local) tests
/test-pipeline tier-2 — Create Daily.co room for browser testing
/test-pipeline tier-3 [phone] — Run full end-to-end call
/test-pipeline all — Run all tiers sequentially
/test-pipeline scenario [name] — Run specific test scenario

Reference Files

references/test-scenarios.md — Pre-built test cases for different call flows
references/baseline-thresholds.md — Pass/fail criteria for latency and quality

Testing Tiers

Tier 1: Local Pipeline Test (Zero Cost)

Validates components without external API calls:

Checks:

All imports work correctly
Required API keys are configured
NEPQ system prompt generates correctly
Metrics tracking functions properly
Phase detection logic works
Service objects can be instantiated

Run:

python skills/pipecat-call-tester/scripts/run_tests.py --tier 1

Output Example:

TEST 1: Local Pipeline Validation
=================================

1.1 Testing imports...
  [PASS] Core imports

1.2 Checking API keys...
  [PASS] ASSEMBLYAI_API_KEY (length: 40)
  [PASS] ANTHROPIC_API_KEY (length: 64)
  [PASS] ELEVENLABS_API_KEY (length: 32)
  [PASS] ELEVENLABS_VOICE_ID (length: 24)

1.3 Testing NEPQ system prompt generation...
  [PASS] NEPQ mentioned
  [PASS] Lead name included
  [PASS] Property address included
  [PASS] Phase guidance included
  System prompt length: 3,456 characters

1.4 Testing metrics tracking...
  [PASS] Turn tracking
  [PASS] Micro-commitment tracking
  [PASS] Phase tracking

1.5 Testing phase detection...
  [PASS] CONNECTING phase triggers
  [PASS] PROBLEM_AWARENESS triggers
  [PASS] Objection detection

TEST 1 PASSED: All 15 checks passed

Tier 2: Daily.co Room Test (~$0.01/min)

Creates a real Daily.co room for browser-based testing:

What it does:

Creates a temporary Daily.co room
Starts the PipeCat bot in the room
Provides URL for you to join in browser
Records latency metrics during conversation
Generates test report when done
Cleans up room

Run:

python skills/pipecat-call-tester/scripts/run_tests.py --tier 2

Browser Testing Flow:

Open the provided room URL
Allow microphone access
Bot says: "Hey, did I catch you at a bad time?"
Respond naturally to test the conversation flow
Press Enter when done to see metrics

Tier 3: Full End-to-End Call (~$0.075/min)

Makes a real outbound call through the full stack:

What it does:

Creates Daily.co room
Initiates Twilio call to target phone
Bridges call audio to Daily.co room
PipeCat bot runs full NEPQ conversation
Records all metrics and transcript
Generates detailed test report

Run:

python skills/pipecat-call-tester/scripts/run_tests.py --tier 3 --phone +15551234567

Output Example:

TEST 3: Full End-to-End Call
============================

Call initiated:
  Call ID: call_abc123
  Room URL: https://your-domain.daily.co/test-room-xyz
  Phone: +15551234567

Call Metrics:
  Duration: 45.2s
  Turns: 8
  Phase reached: COMMITMENT_CHECK

Latency Breakdown:
  STT (AssemblyAI): 142ms avg
  LLM (Claude): 485ms avg
  TTS (ElevenLabs): 98ms avg
  Total response: 725ms avg

Quality Metrics:
  Micro-commitments: 3
  Objections handled: 1
  Pain discovered: Yes

RESULT: PASS (all thresholds met)

Test Scenarios

Pre-built scenarios for testing specific flows:

Scenario	Description	Duration
`happy-path`	Cooperative prospect, reaches commitment	~60s
`objection-price`	Price objection early in call	~45s
`objection-realtor`	"I have a realtor" objection	~45s
`not-interested`	Quick rejection, graceful exit	~20s
`busy-callback`	"Call me back later"	~15s
`wrong-number`	Not the property owner	~15s

Run scenario:

python skills/pipecat-call-tester/scripts/run_tests.py --scenario happy-path

Latency Thresholds

Default pass/fail criteria (configurable):

Metric	Target	Warning	Fail
Total response	<1000ms	<1500ms	>1500ms
STT latency	<200ms	<300ms	>400ms
LLM latency	<600ms	<900ms	>1200ms
TTS latency	<150ms	<250ms	>350ms
First response	<1500ms	<2000ms	>2500ms

Usage Examples

Quick Validation After Code Change

User: /test-pipeline
Claude: Running Tier 1 local validation...

Results:
  Imports: PASS
  API keys: PASS (10/10)
  System prompt: PASS (3,456 chars)
  Metrics: PASS
  Phase detection: PASS

All 15 checks passed. Ready for Tier 2 testing.

Full Pre-Deployment Test

User: /test-pipeline all
Claude: Running all test tiers...

Tier 1 (Local): PASS (15/15 checks)
Tier 2 (Daily Room): PASS (avg latency 725ms)
Tier 3 (Full Call): PASS (reached COMMITMENT_CHECK)

Deployment readiness: APPROVED

Debug Slow Response

User: /test-pipeline tier-2
Claude: Starting Daily.co room test...

Room URL: https://your-domain.daily.co/test-12345
Join in browser and speak to the bot.

[After testing]

Latency Analysis:
  STT: 145ms (PASS)
  LLM: 1,250ms (FAIL - threshold 900ms)
  TTS: 102ms (PASS)

BOTTLENECK IDENTIFIED: LLM latency

Recommendations:
1. Check Claude API status: https://status.anthropic.com
2. Consider reducing max_tokens from 300 to 200
3. Simplify system prompt (currently 3,456 chars)

Integration with CI/CD

Run automated tests in CI pipeline:

# .github/workflows/test-pipeline.yml
- name: Test PipeCat Pipeline
  run: |
    python skills/pipecat-call-tester/scripts/run_tests.py --tier 1 --json
  env:
    ASSEMBLYAI_API_KEY: ${{ secrets.ASSEMBLYAI_API_KEY }}
    # ... other keys

Test Report Format

Reports are saved to test-reports/ with detailed metrics:

{
  "timestamp": "2024-01-15T10:30:00Z",
  "tier": 1,
  "duration_ms": 1234,
  "results": {
    "imports": "pass",
    "api_keys": "pass",
    "system_prompt": "pass",
    "metrics": "pass",
    "phase_detection": "pass"
  },
  "overall": "pass"
}