name: implementing-go-semaphore-pools description: Use when implementing bounded concurrency in Go with semaphores and worker pools - rate limiting external APIs, high-throughput parallel processing (40K+ items/hour), variable-cost task scheduling. Complements go-errgroup-concurrency. Handles "too many goroutines", "rate limit exceeded", "out of memory", semaphore.NewWeighted, errgroup.SetLimit, runtime.NumCPU patterns allowed-tools: Read, Grep, Glob, LSP
Implementing Go Semaphore Pools
Bounded concurrency patterns using semaphores and worker pools for production Go applications.
When to Use
Use this skill when:
- Rate limiting API calls to external services
- Processing high-throughput workloads (40K+ items/hour)
- Preventing resource exhaustion (memory, goroutines, connections)
- Implementing variable-cost task scheduling (weighted semaphores)
- Pre-Go 1.20 projects needing concurrency limits
- Fine-grained control over concurrent execution
Complements: go-errgroup-concurrency skill (error aggregation + concurrency control)
Important: You MUST use TodoWrite before starting to track all steps
Quick Decision Matrix
| Scenario | Use | Concurrency Limit |
|---|---|---|
| Simple rate limiting (Go 1.20+) | errgroup.SetLimit() |
Fixed worker count |
| API rate limits + backpressure | semaphore.NewWeighted |
Dynamic, context-aware |
| Variable-cost tasks | semaphore.NewWeighted |
Weighted (acquire N) |
| Maximum performance | Worker pool pattern | Explicit resource bound |
| CPU-bound processing | Any, limit=runtime.NumCPU() |
Matches CPU cores |
| I/O-bound processing | Any, limit=10-100 | High concurrency OK |
Core API: golang.org/x/sync/semaphore
Basic Usage
import "golang.org/x/sync/semaphore"
// Create weighted semaphore with max 10 concurrent operations
sem := semaphore.NewWeighted(10)
// Acquire 1 unit (blocks if at capacity)
if err := sem.Acquire(ctx, 1); err != nil {
return err // Context canceled
}
defer sem.Release(1)
// Do work with bounded concurrency
processItem(item)
Weighted Semaphores
Use for variable-cost tasks:
sem := semaphore.NewWeighted(100) // 100 "cost units" available
// Small task: acquire 1 unit
sem.Acquire(ctx, 1)
defer sem.Release(1)
processSmallItem()
// Large task: acquire 10 units
sem.Acquire(ctx, 10)
defer sem.Release(10)
processLargeItem()
See: references/weighted-semaphore-patterns.md
Pattern 1: errgroup + Semaphore (Recommended)
Best of both worlds: Error aggregation + bounded concurrency
import (
"context"
"golang.org/x/sync/errgroup"
"golang.org/x/sync/semaphore"
)
func ProcessItems(ctx context.Context, items []Item, maxConcurrent int) error {
g, ctx := errgroup.WithContext(ctx)
sem := semaphore.NewWeighted(int64(maxConcurrent))
for _, item := range items {
item := item // Capture loop variable
// Acquire before spawning goroutine
if err := sem.Acquire(ctx, 1); err != nil {
return err
}
g.Go(func() error {
defer sem.Release(1)
return scanItem(ctx, item)
})
}
return g.Wait() // Wait for all goroutines, aggregate errors
}
Why this pattern:
- ✅ Bounded concurrency (max N goroutines)
- ✅ Error aggregation (first error stops all)
- ✅ Context cancellation (propagates to all workers)
- ✅ Clean resource management (defer release)
Pattern 2: errgroup.SetLimit (Go 1.20+)
Simpler for basic cases:
func ProcessItems(ctx context.Context, items []Item, maxConcurrent int) error {
g, ctx := errgroup.WithContext(ctx)
g.SetLimit(maxConcurrent) // Built-in rate limiting
for _, item := range items {
item := item
g.Go(func() error {
return scanItem(ctx, item)
})
}
return g.Wait()
}
When to use:
- ✅ Go 1.20+ projects
- ✅ Fixed worker count
- ✅ Uniform task costs
- ❌ NOT for weighted semaphores
- ❌ NOT for fine-grained control
Pattern 3: Worker Pool (Maximum Performance)
Explicit worker management for high-throughput:
func ProcessItems(ctx context.Context, items []Item, numWorkers int) error {
itemChan := make(chan Item, numWorkers*2) // Buffered queue
g, ctx := errgroup.WithContext(ctx)
// Start workers
for i := 0; i < numWorkers; i++ {
g.Go(func() error {
for item := range itemChan {
if err := scanItem(ctx, item); err != nil {
return err
}
}
return nil
})
}
// Feed work
g.Go(func() error {
defer close(itemChan)
for _, item := range items {
select {
case itemChan <- item:
case <-ctx.Done():
return ctx.Err()
}
}
return nil
})
return g.Wait()
}
Performance: 2.4x faster than sequential, 20-40% less memory than unbounded concurrency
See: references/worker-pool-patterns.md
Tuning Concurrency Limits
CPU-Bound Workloads
import "runtime"
maxWorkers := runtime.NumCPU() // Match CPU cores
// Example: 8-core machine → 8 workers
Rationale: CPU-bound tasks can't exceed physical core count without contention.
I/O-Bound Workloads
maxWorkers := 50 // 10-100 typical range
// Tasks spend most time waiting on network/disk
Rationale: High concurrency OK - workers block on I/O, not CPU.
Network with Rate Limits
// External API allows 150 req/sec
maxWorkers := 100
rateLimit := rate.NewLimiter(150, 10) // 150/sec, burst 10
sem := semaphore.NewWeighted(int64(maxWorkers))
for _, item := range items {
rateLimit.Wait(ctx) // Rate limiter
sem.Acquire(ctx, 1) // Concurrency limiter
g.Go(func() error {
defer sem.Release(1)
return callAPI(item)
})
}
See: references/rate-limiting-patterns.md
Performance Benchmarks
Test scenario: Process 1000 items with variable processing time
| Pattern | ns/op | Speedup | Memory |
|---|---|---|---|
| Sequential (baseline) | 111,483 | 1.0x | 100% |
sync/errgroup (unbounded) |
65,826 | 1.7x | 300% |
| errgroup + semaphore (20) | 52,104 | 2.1x | 120% |
| Worker pool (20) | 46,867 | 2.4x | 80% |
Key findings:
- Worker pools: Best performance + lowest memory
- Semaphores: Good balance of control + performance
- Unbounded: Fastest but 3x memory usage (risk OOM)
See: references/benchmark-analysis.md
Production Examples
TruffleHog (24K stars)
Multi-stage worker pools with stage-specific multipliers:
// Stage 1: Source enumeration (I/O-bound)
sourceWorkers := runtime.NumCPU() * 4
// Stage 2: Secret detection (CPU-bound)
detectorWorkers := runtime.NumCPU()
// Stage 3: Verification (network-bound)
verifyWorkers := runtime.NumCPU() * 10
Pattern: Tune each pipeline stage independently based on workload type.
Nuclei (26K stars)
Rate limiting at 150 req/sec using semaphore patterns:
rateLimiter := rate.NewLimiter(rate.Limit(rateLimit), burst)
sem := semaphore.NewWeighted(int64(maxConcurrent))
for target := range targets {
rateLimiter.Wait(ctx)
sem.Acquire(ctx, 1)
go func(t string) {
defer sem.Release(1)
executeTemplate(t)
}(target)
}
Pattern: Combine rate limiter + semaphore for external API safety.
Trivy (30K stars)
pkg/parallel/parallel.go - bounded concurrency:
func Run(workers int, fn func(context.Context, int) error) error {
g, ctx := errgroup.WithContext(ctx)
g.SetLimit(workers) // Go 1.20+ built-in
for i := 0; i < totalTasks; i++ {
i := i
g.Go(func() error {
return fn(ctx, i)
})
}
return g.Wait()
}
Pattern: Abstract worker pool into reusable utility function.
See: references/production-case-studies.md
Common Pitfalls
1. Forgetting defer Release
// ❌ BAD: Leaks semaphore slot on error
sem.Acquire(ctx, 1)
if err := process(); err != nil {
return err // Release never called!
}
sem.Release(1)
// ✅ GOOD: Always defer
sem.Acquire(ctx, 1)
defer sem.Release(1)
return process()
2. Acquiring Inside Goroutine
// ❌ BAD: Unbounded goroutine creation
for _, item := range items {
g.Go(func() error {
sem.Acquire(ctx, 1) // All goroutines created before acquire!
defer sem.Release(1)
return process(item)
})
}
// ✅ GOOD: Acquire before spawning
for _, item := range items {
sem.Acquire(ctx, 1) // Blocks when at capacity
g.Go(func() error {
defer sem.Release(1)
return process(item)
})
}
3. Ignoring Context Cancellation
// ❌ BAD: Acquire blocks forever on cancel
sem.Acquire(context.Background(), 1)
// ✅ GOOD: Respect context
if err := sem.Acquire(ctx, 1); err != nil {
return err // Context canceled
}
See: references/common-pitfalls.md
Testing Strategies
Test Concurrent Behavior
func TestBoundedConcurrency(t *testing.T) {
const maxConcurrent = 5
sem := semaphore.NewWeighted(maxConcurrent)
var current, peak atomic.Int32
g, ctx := errgroup.WithContext(context.Background())
for i := 0; i < 100; i++ {
sem.Acquire(ctx, 1)
g.Go(func() error {
defer sem.Release(1)
n := current.Add(1)
if n > peak.Load() {
peak.Store(n)
}
time.Sleep(10 * time.Millisecond)
current.Add(-1)
return nil
})
}
g.Wait()
assert.Equal(t, maxConcurrent, int(peak.Load()))
}
See: references/testing-patterns.md
Quick Reference
| Need | Pattern | Code Snippet |
|---|---|---|
| Basic bounded concurrency | errgroup + semaphore | sem.Acquire(ctx, 1); defer Release |
| Simple rate limiting (Go 1.20+) | errgroup.SetLimit |
g.SetLimit(N) |
| Variable-cost tasks | Weighted semaphore | sem.Acquire(ctx, weight) |
| Maximum performance | Worker pool | Channel + N goroutines |
| Rate limit external API | Rate limiter + semaphore | limiter.Wait() + sem.Acquire() |
| CPU-bound tuning | runtime.NumCPU() |
maxWorkers := runtime.NumCPU() |
| I/O-bound tuning | 10-100 workers | maxWorkers := 50 |
References
Go packages:
- golang.org/x/sync/semaphore - Official semaphore API
- golang.org/x/sync/errgroup - Error group with SetLimit
Articles:
- Encore.dev - Advanced Go Concurrency
- Research:
.claude/.output/research/2026-01-01-go-scanner-architecture-patterns/github.md
Related Skills:
go-errgroup-concurrency- Error aggregation and concurrent error handlingdesigning-go-interfaces- Interface design for concurrent systems
Related Skills
go-errgroup-concurrency- Error aggregation with concurrent operationsdesigning-go-interfaces- Design patterns for concurrent APIsadhering-to-dry- DRY principles for reusable concurrency abstractionsdebugging-systematically- Debug concurrent race conditions and deadlocksgateway-backend- Go backend patterns and architectures