debug-timeouts

star 10

Systematically analyze and debug timeout hierarchies across application layers to identify conflicts and root causes

Olshansk By Olshansk schedule Updated 3/5/2026

name: debug-timeouts description: Systematically analyze and debug timeout hierarchies across application layers to identify conflicts and root causes disable-model-invocation: true context: fork agent: general-purpose

Debug Timeouts Command

Overview

Systematically analyze and debug timeout configurations across application layers to identify timeout hierarchies, conflicts, and root causes of unexpected delays.

Usage

/debug-timeouts [--layer=all|http|transport|context|server] [--search-pattern=<pattern>] [--trace-path] [--suggest-fixes]

Parameters

  • --layer: Focus on specific timeout layer (default: all)
  • --search-pattern: Custom regex pattern to find timeout configurations
  • --trace-path: Trace timeout chain from entry point to network
  • --suggest-fixes: Provide specific recommendations for timeout issues

What It Does

1. Timeout Discovery

  • HTTP Client Timeouts: Timeout, ResponseHeaderTimeout, TLSHandshakeTimeout, ExpectContinueTimeout
  • Transport Timeouts: DialTimeout, KeepAlive, IdleConnTimeout
  • Context Timeouts: WithTimeout, WithDeadline calls
  • Server Timeouts: ReadTimeout, WriteTimeout, IdleTimeout
  • Application Timeouts: Custom timeout constants and configurations

2. Timeout Hierarchy Analysis

  • Chain of Command: Map timeout inheritance from request entry to network I/O
  • Conflict Detection: Identify timeouts that override each other
  • Effective Timeout: Calculate which timeout actually triggers first
  • Gap Analysis: Find layers missing timeout protection

3. Pattern Recognition

  • Common Anti-patterns: Identify configurations that cause unexpected behavior
  • Timeout Leaks: Find contexts that bypass timeout controls
  • Starvation Patterns: Detect timeouts that are too aggressive/lenient
  • Recovery Delays: Identify post-timeout connection recovery times

4. Debugging Output

Timeout Map

REQUEST TIMEOUT CHAIN
├── HTTP Server ReadTimeout: 60s
├── Gateway parallelRequestTimeout: 30s  ⚠️  LONGER THAN HTTP SERVER
├── Shannon sendRelayTimeout: 5s
├── HTTP Client earlyCtx: 1s            ⭐ EFFECTIVE TIMEOUT
│   ├── ResponseHeaderTimeout: 1s       ⭐ TRANSPORT OVERRIDE
│   ├── TLS HandshakeTimeout: 5s
│   └── Dial Timeout: 5s
└── HTTP Client Timeout: 80s            ❌ NEVER REACHED

Conflict Analysis

⚠️  TIMEOUT CONFLICTS DETECTED:
1. Gateway parallelRequestTimeout (30s) > Server ReadTimeout (60s)
   → Gateway timeout never triggers, server closes connection first

2. HTTP Client Timeout (80s) > ResponseHeaderTimeout (1s)
   → ResponseHeaderTimeout will trigger first, HTTP Client timeout unused

3. Multiple 1s timeouts competing:
   - context.WithTimeout(1s) vs ResponseHeaderTimeout(1s)
   → Both will trigger simultaneously, creating race condition

Root Cause Suggestions

🔍 LIKELY ROOT CAUSE:
Based on your 15-second delays after 0.7s timeouts:

1. ✅ 1s timeout IS working (explains 0.7s response times)
2. ❌ 15s delay is NOT from configured timeouts
3. 🔍 Check: TCP connection recovery, OS socket timeouts, DNS caching

INVESTIGATION STEPS:
1. Add connection pool monitoring: transport.CloseIdleConnections() timing
2. Check OS-level TCP settings: net.core.netdev_max_backlog
3. Verify DNS resolution delays: add DNSStart/DNSDone logging
4. Monitor connection reuse: log connectionReused in httptrace

5. Fix Suggestions

Immediate Fixes

// BEFORE: Competing 1s timeouts
earlyCtx, earlyCancel := context.WithTimeout(debugCtx, 1*time.Second)
transport.ResponseHeaderTimeout = 1*time.Second

// AFTER: Coordinated timeout strategy
earlyCtx, earlyCancel := context.WithTimeout(debugCtx, 500*time.Millisecond)
transport.ResponseHeaderTimeout = 1*time.Second  // Backup timeout

Architecture Improvements

// Add timeout tracing
type TimeoutTracer struct {
    timeouts map[string]time.Time
}

func (t *TimeoutTracer) Record(name string, timeout time.Duration) {
    t.timeouts[name] = time.Now().Add(timeout)
}

// Coordinated timeout cancellation
func NewCascadingContext(base context.Context, timeouts ...namedTimeout) context.Context {
    // Implementation that cancels contexts in proper order
}

6. Monitoring Integration

  • Metrics: Generate timeout-specific metrics for observability
  • Alerts: Suggest alerting rules for timeout anomalies
  • Dashboards: Provide Grafana/Prometheus queries for timeout monitoring

Implementation Notes

Search Patterns

# Common timeout patterns
timeout.*=.*\d+.*time\.(Second|Millisecond|Minute)
WithTimeout\(.*,.*\d+.*time\.
ReadTimeout|WriteTimeout|IdleTimeout.*=
ResponseHeaderTimeout|TLSHandshakeTimeout
Dial.*Timeout|KeepAlive.*=

Language-Specific Searches

  • Go: time.Duration, context.WithTimeout, http.Transport fields
  • JavaScript/Node: setTimeout, fetch timeout, axios timeout
  • Python: socket.settimeout, requests.timeout, asyncio.wait_for
  • Java: SocketTimeout, ConnectionTimeout, ReadTimeout

Integration Points

  • Git blame: Show who introduced problematic timeout configurations
  • Dependency analysis: Find timeout configurations in external libraries
  • Environment scanning: Check environment variables affecting timeouts
  • Configuration files: Parse YAML/JSON for timeout settings

Example Output for PATH Debugging

🔍 TIMEOUT DEBUG ANALYSIS FOR PATH

📊 DISCOVERED TIMEOUTS (8 found):
├── protocol/shannon/context.go:28    defaultShannonSendRelayTimeoutMillisec = 5_000
├── protocol/shannon/http_client.go:25 responseHeaderTimeout = 1 * time.Second
├── protocol/shannon/http_client.go:98 TLSHandshakeTimeout: 5 * time.Second
├── protocol/shannon/http_client.go:87 DialContext Timeout: 5 * time.Second
├── gateway/request_context.go:36     parallelRequestTimeout = 30 * time.Second
├── config/router.go:26              defaultHTTPServerReadTimeout = 60 * time.Second
├── config/router.go:27              defaultHTTPServerWriteTimeout = 120 * time.Second
└── protocol/shannon/http_client.go:116 HTTP Client Timeout: 80 * time.Second

⚡ EFFECTIVE TIMEOUT CHAIN:
Request → ResponseHeaderTimeout(1s) → Shannon(5s) → Parallel(30s) → Server(60s) → Client(80s)

🎯 ROOT CAUSE ANALYSIS:
✅ Your 0.7s response times confirm ResponseHeaderTimeout(1s) is working
❌ 15s delays are NOT from any configured timeout
🔍 LIKELY CAUSE: Connection pool recovery after forced connection closure

💡 INVESTIGATION COMMANDS:
1. Add connection pool monitoring:
   transport.CloseIdleConnections() // Add timing around this call

2. Check for connection reuse patterns:
   httptrace.GotConn callback // Log connectionReused field

3. Monitor DNS resolution:
   httptrace.DNSStart/DNSDone // Check for DNS caching issues

4. OS-level TCP investigation:
   ss -tulpn | grep :3069  // Check socket states
   netstat -i  // Check network interface stats

This slash command would have immediately identified that your 15-second delay was NOT from any configured timeout, pointed you toward connection pool recovery as the likely cause, and provided specific debugging steps to investigate the real root cause.

Install via CLI
npx skills add https://github.com/Olshansk/personal-agent-skills --skill debug-timeouts
Repository Details
star Stars 10
call_split Forks 1
navigation Branch main
article Path SKILL.md
More from Creator